# eugeneyan/applied-ml

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/eugeneyan-applied-ml).**

28,691 stars · 3,842 forks · mit

## Links

- GitHub: https://github.com/eugeneyan/applied-ml
- awesome-repositories: https://awesome-repositories.com/repository/eugeneyan-applied-ml.md

## Topics

`applied-data-science` `applied-machine-learning` `computer-vision` `data-discovery` `data-engineering` `data-quality` `data-science` `deep-learning` `machine-learning` `natural-language-processing` `production` `recsys` `reinforcement-learning` `search`

## Description

This project is a comprehensive, curated knowledge base designed to support the development and maintenance of production-grade machine learning systems. It serves as a centralized repository of industry-standard technical literature, engineering case studies, and research papers, providing a structured reference for practitioners navigating the complexities of modern data science and machine learning engineering.

The resource distinguishes itself through a cross-domain approach that bridges the gap between academic research and practical implementation. By synthesizing proven industry architectures and operational strategies, it offers a unified framework for managing the entire machine learning lifecycle, from initial data infrastructure and pipeline development to model deployment, versioning, and continuous monitoring.

The collection covers a broad spectrum of technical domains, including data quality management, feature engineering, and the application of various machine learning tasks such as natural language processing, computer vision, and reinforcement learning. It also addresses critical operational concerns like system efficiency, privacy-preserving techniques, and the ethical considerations inherent in automated decision-making systems.

The repository is maintained through a community-driven model, ensuring that the documentation remains aligned with evolving industry standards. All content is delivered via static markdown files, providing a highly accessible and version-controlled format for long-form technical research.

## Tags

### Artificial Intelligence & ML

- [Lifecycle Management](https://awesome-repositories.com/f/artificial-intelligence-ml/lifecycle-management.md) — Standardizes processes for model training, versioning, validation, and monitoring to ensure consistent performance.
- [Machine Learning Operations Platforms](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-operations-platforms.md) — Coordinate the end-to-end lifecycle of machine learning models, including development, deployment, monitoring, and continuous improvement within production systems. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [MLOps Best Practices](https://awesome-repositories.com/f/artificial-intelligence-ml/mlops-best-practices.md) — Documents proven strategies for managing data quality, model validation, and team workflows.
- [Production Engineering](https://awesome-repositories.com/f/artificial-intelligence-ml/production-engineering.md) — Provides industry-proven architectures and operational strategies for reliable machine learning deployments.
- [Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/embeddings.md) — Convert complex data types like text or images into dense numerical vectors to capture semantic relationships for similarity searches and clustering. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Feature Stores](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-stores.md) — Organize and serve curated features to models, ensuring consistent data definitions across both training and real-time inference environments. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Generative Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-models.md) — Create new data, text, or media outputs by training models to learn the underlying structure and distribution of existing information. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Machine Learning Knowledge Bases](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-knowledge-bases.md) — Compiles industry papers and technical articles detailing real-world machine learning implementations.
- [Model Versioning Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/model-versioning-systems.md) — Track and organize different iterations of machine learning models to ensure reproducibility, auditability, and easy rollback during the deployment process. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Natural Language Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing.md) — Analyze and interpret human language to enable machines to understand, generate, and respond to text or speech inputs. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Production Machine Learning Guides](https://awesome-repositories.com/f/artificial-intelligence-ml/production-machine-learning-guides.md) — Synthesizes proven industry architectures and operational strategies for reliable machine learning systems.
- [Research Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/research-synthesis.md) — Bridging the gap between academic research and practical implementation by studying how leading technology companies solve complex problems with machine learning.
- [Forecasting](https://awesome-repositories.com/f/artificial-intelligence-ml/forecasting.md) — Predict future values or events by analyzing historical data patterns and identifying trends within time-series information. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [MLOps Indexes](https://awesome-repositories.com/f/artificial-intelligence-ml/mlops-indexes.md) — Provides a structured directory of methodologies for managing model lifecycles and deployment workflows.
- [Reinforcement Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning.md) — Train agents to make sequences of decisions by rewarding actions that lead to desired outcomes in dynamic environments. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Classification](https://awesome-repositories.com/f/artificial-intelligence-ml/classification.md) — Assign categorical labels to data based on learned patterns to support automated decision-making and organizational tasks. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Computer Vision](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision.md) — Extract meaningful information from digital images and video streams to enable tasks like object detection, image segmentation, and scene understanding. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Graph Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/graph-learning.md) — Represent and analyze relationships between entities as nodes and edges to uncover complex connections and network structures within datasets. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Recommendation Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/recommendation-systems.md) — Suggest relevant items or content to users by analyzing their past behavior, preferences, and interactions with similar entities. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Anomaly Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/anomaly-detection.md) — Identify unusual patterns or outliers in datasets that may indicate errors, fraud, or significant changes in underlying system behavior. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Audio Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-processing.md) — Analyze and transform sound signals into digital representations for tasks like speech recognition, classification, or generative audio synthesis. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Information Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/information-extraction.md) — Identify and pull structured data points from unstructured text sources to populate databases or support automated knowledge discovery. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Machine Learning Resource Indexes](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning-resource-indexes.md) — Provides a structured index of technical resources covering the entire lifecycle of data science projects.
- [Optimization Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms.md) — Improve the efficiency and effectiveness of algorithms or processes by fine-tuning parameters to achieve better results with fewer resources. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Search and Ranking Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/search-and-ranking-algorithms.md) — Order results based on relevance and importance to help users quickly find the most useful information within large datasets. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Sequence Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-modeling.md) — Analyze ordered data points to predict future elements or classify patterns based on the historical context of the sequence. ([source](https://github.com/eugeneyan/applied-ml#readme))

### Data & Databases

- [Data Pipelines](https://awesome-repositories.com/f/data-databases/data-pipelines.md) — Designing robust pipelines for data discovery, quality management, and feature engineering to support scalable machine learning workflows in production environments. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Data Quality Frameworks](https://awesome-repositories.com/f/data-databases/data-quality-frameworks.md) — Monitor and enforce standards for data accuracy, completeness, and consistency to ensure reliable inputs for downstream analytical and machine learning processes. ([source](https://github.com/eugeneyan/applied-ml#readme))
- [Data Discovery Tools](https://awesome-repositories.com/f/data-databases/data-discovery-tools.md) — Locate and explore available datasets using specialized tools designed to catalog, index, and search for information across distributed storage systems. ([source](https://github.com/eugeneyan/applied-ml#readme))

### Repository Format

- [Awesome List](https://awesome-repositories.com/f/repository-format/awesome-list.md) — A community-curated directory that catalogs and links out to other open-source projects, rather than a standalone tool you run yourself.

### Testing & Quality Assurance

- [Model Validation Frameworks](https://awesome-repositories.com/f/testing-quality-assurance/model-validation-frameworks.md) — Test and compare different model versions using controlled experiments to ensure they meet performance requirements before full-scale deployment. ([source](https://github.com/eugeneyan/applied-ml#readme))

### Education & Learning Resources

- [Applied Data Science Guides](https://awesome-repositories.com/f/education-learning-resources/applied-data-science-guides.md) — Offers a categorized directory of expert-led documentation for practical machine learning and data engineering.
- [Collaborative Knowledge Bases](https://awesome-repositories.com/f/education-learning-resources/collaborative-knowledge-bases.md) — A collaborative maintenance model that relies on external contributions to keep technical documentation aligned with evolving industry practices.

### Security & Cryptography

- [Privacy-Preserving Machine Learning](https://awesome-repositories.com/f/security-cryptography/privacy-preserving-machine-learning.md) — Implement techniques that allow for the analysis and training of models on sensitive information without exposing individual user data. ([source](https://github.com/eugeneyan/applied-ml#readme))

### DevOps & Infrastructure

- [Infrastructure Provisioning](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-provisioning.md) — Provision and maintain the underlying hardware and software environments required to support scalable data processing and model deployment. ([source](https://github.com/eugeneyan/applied-ml#readme))

### Software Engineering & Architecture

- [Development Methodologies](https://awesome-repositories.com/f/software-engineering-architecture/development-methodologies.md) — Follow established industry standards and proven methodologies for developing, deploying, and maintaining high-quality machine learning solutions. ([source](https://github.com/eugeneyan/applied-ml#readme))
