Machine Learning
Comprehensive machine learning skill covering the full ML lifecycle from experimentation to production deployment.
When to Use This Skill
-
Building machine learning pipelines
-
Feature engineering and data preprocessing
-
Model training, evaluation, and selection
-
Hyperparameter tuning and optimization
-
Model deployment and serving
-
ML experiment tracking and versioning
-
Production ML monitoring and maintenance
ML Development Lifecycle
- Problem Definition
Classification Types:
-
Binary classification (spam/not spam)
-
Multi-class classification (image categories)
-
Multi-label classification (document tags)
-
Regression (price prediction)
-
Clustering (customer segmentation)
-
Ranking (search results)
-
Anomaly detection (fraud detection)
Success Metrics by Problem Type:
Problem Type Primary Metrics Secondary Metrics
Binary Classification AUC-ROC, F1 Precision, Recall, PR-AUC
Multi-class Macro F1, Accuracy Per-class metrics
Regression RMSE, MAE R², MAPE
Ranking NDCG, MAP MRR
Clustering Silhouette, Calinski-Harabasz Davies-Bouldin
- Data Preparation
Data Quality Checks:
-
Missing value analysis and imputation strategies
-
Outlier detection and handling
-
Data type validation
-
Distribution analysis
-
Target leakage detection
Feature Engineering Patterns:
-
Numerical: scaling, binning, log transforms, polynomial features
-
Categorical: one-hot, target encoding, frequency encoding, embeddings
-
Temporal: lag features, rolling statistics, cyclical encoding
-
Text: TF-IDF, word embeddings, transformer embeddings
-
Geospatial: distance features, clustering, grid encoding
Train/Test Split Strategies:
-
Random split (standard)
-
Stratified split (imbalanced classes)
-
Time-based split (temporal data)
-
Group split (prevent data leakage)
-
K-fold cross-validation
- Model Selection
Algorithm Selection Guide:
Data Size Problem Recommended Models
Small (<10K) Classification Logistic Regression, SVM, Random Forest
Small (<10K) Regression Linear Regression, Ridge, SVR
Medium (10K-1M) Classification XGBoost, LightGBM, Neural Networks
Medium (10K-1M) Regression XGBoost, LightGBM, Neural Networks
Large (>1M) Any Deep Learning, Distributed training
Tabular Any Gradient Boosting (XGBoost, LightGBM, CatBoost)
Images Classification CNN, ResNet, EfficientNet, Vision Transformers
Text NLP Transformers (BERT, RoBERTa, GPT)
Sequential Time Series LSTM, Transformer, Prophet
- Model Training
Hyperparameter Tuning:
-
Grid Search: exhaustive, good for small spaces
-
Random Search: efficient, good for large spaces
-
Bayesian Optimization: smart exploration (Optuna, Hyperopt)
-
Early stopping: prevent overfitting
Common Hyperparameters:
Model Key Parameters
XGBoost learning_rate, max_depth, n_estimators, subsample
LightGBM num_leaves, learning_rate, n_estimators, feature_fraction
Random Forest n_estimators, max_depth, min_samples_split
Neural Networks learning_rate, batch_size, layers, dropout
- Model Evaluation
Evaluation Best Practices:
-
Always use held-out test set for final evaluation
-
Use cross-validation during development
-
Check for overfitting (train vs validation gap)
-
Evaluate on multiple metrics
-
Analyze errors qualitatively
Handling Imbalanced Data:
-
Resampling: SMOTE, undersampling
-
Class weights: weighted loss functions
-
Threshold tuning: optimize decision threshold
-
Evaluation: use PR-AUC over ROC-AUC
- Production Deployment
Model Serving Patterns:
-
REST API (Flask, FastAPI, TF Serving)
-
Batch inference (scheduled jobs)
-
Streaming (real-time predictions)
-
Edge deployment (mobile, IoT)
Production Considerations:
-
Latency requirements (p50, p95, p99)
-
Throughput (requests per second)
-
Model size and memory footprint
-
Fallback strategies
-
A/B testing framework
- Monitoring & Maintenance
What to Monitor:
-
Prediction latency
-
Input feature distributions (data drift)
-
Prediction distributions (concept drift)
-
Model performance metrics
-
Error rates and types
Retraining Triggers:
-
Performance degradation below threshold
-
Significant data drift detected
-
Scheduled retraining (daily, weekly)
-
New training data available
MLOps Best Practices
Experiment Tracking
Track for every experiment:
-
Code version (git commit)
-
Data version (hash or version ID)
-
Hyperparameters
-
Metrics (train, validation, test)
-
Model artifacts
-
Environment (packages, versions)
Model Versioning
models/ ├── model_v1.0.0/ │ ├── model.pkl │ ├── metadata.json │ ├── requirements.txt │ └── metrics.json ├── model_v1.1.0/ └── model_v2.0.0/
CI/CD for ML
Continuous Integration:
-
Data validation tests
-
Model training tests
-
Performance regression tests
Continuous Deployment:
-
Staging environment validation
-
Shadow mode testing
-
Gradual rollout (canary)
-
Automatic rollback
Reference Files
For detailed patterns and code examples, load reference files as needed:
-
references/preprocessing.md
-
Data preprocessing patterns and feature engineering techniques
-
references/model_patterns.md
-
Model architecture patterns and implementation examples
-
references/evaluation.md
-
Comprehensive evaluation strategies and metrics
Integration with Other Skills
-
performance - For optimizing inference latency
-
testing - For ML-specific testing patterns
-
database-optimization - For feature store queries
-
debugging - For model debugging and error analysis