senior-data-scientist

Senior Data Scientist

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "senior-data-scientist" with this command: npx skills add borghei/claude-skills/borghei-claude-skills-senior-data-scientist

Senior Data Scientist

Expert data science for statistical modeling, experimentation, ML deployment, and data-driven decision making.

Keywords

data-science, machine-learning, statistics, a-b-testing, causal-inference, feature-engineering, mlops, experiment-design, model-deployment, python, scikit-learn, pytorch, tensorflow, spark, airflow

Quick Start

Design an experiment with power analysis

python scripts/experiment_designer.py --input data/ --output results/

Run feature engineering pipeline

python scripts/feature_engineering_pipeline.py --target project/ --analyze

Evaluate model performance

python scripts/model_evaluation_suite.py --config config.yaml --deploy

Statistical analysis

python scripts/statistical_analyzer.py --data input.csv --test ttest --output report.json

Tools

Script Purpose

scripts/experiment_designer.py

A/B test design, power analysis, sample size calculation

scripts/feature_engineering_pipeline.py

Automated feature generation, correlation analysis, feature selection

scripts/statistical_analyzer.py

Hypothesis testing, causal inference, regression analysis

scripts/model_evaluation_suite.py

Model comparison, cross-validation, deployment readiness checks

Tech Stack

Category Tools

Languages Python, SQL, R, Scala

ML Frameworks PyTorch, TensorFlow, Scikit-learn, XGBoost

Data Processing Spark, Airflow, dbt, Kafka, Databricks

Deployment Docker, Kubernetes, AWS SageMaker, GCP Vertex AI

Experiment Tracking MLflow, Weights & Biases

Databases PostgreSQL, BigQuery, Snowflake, Pinecone

Workflow 1: Design and Analyze an A/B Test

  • Define hypothesis -- State the null and alternative hypotheses. Identify the primary metric (e.g., conversion rate, revenue per user).

  • Calculate sample size -- python scripts/experiment_designer.py --input data/ --output results/

  • Specify minimum detectable effect (MDE), significance level (alpha=0.05), and power (0.80).

  • Example: For baseline conversion 5%, MDE 10% relative lift, need ~31,000 users per variant.

  • Randomize assignment -- Use hash-based assignment on user ID for deterministic, reproducible splits.

  • Run experiment -- Monitor for sample ratio mismatch (SRM) daily. Flag if observed ratio deviates >1% from expected.

  • Analyze results: from scipy import stats

Two-proportion z-test for conversion rates

control_conv = control_successes / control_total treatment_conv = treatment_successes / treatment_total z_stat, p_value = stats.proportions_ztest( [treatment_successes, control_successes], [treatment_total, control_total], alternative='two-sided' )

Reject H0 if p_value < 0.05

  • Validate -- Check for novelty effects, Simpson's paradox across segments, and pre-experiment balance on covariates.

Workflow 2: Build a Feature Engineering Pipeline

  • Profile raw data -- python scripts/feature_engineering_pipeline.py --target project/ --analyze

  • Identify null rates, cardinality, distributions, and data types.

  • Generate candidate features:

  • Temporal: day-of-week, hour, recency, frequency, monetary (RFM)

  • Aggregation: rolling means/sums over 7d/30d/90d windows

  • Interaction: ratio features, polynomial combinations

  • Text: TF-IDF, embedding vectors

  • Select features -- Remove features with >95% null rate, near-zero variance, or >0.95 pairwise correlation. Use recursive feature elimination or SHAP importance.

  • Validate -- Confirm no target leakage (no features derived from post-outcome data). Check train/test distribution alignment.

  • Register -- Store features in feature store with versioning and lineage metadata.

Workflow 3: Train and Evaluate a Model

  • Split data -- Stratified train/validation/test split (70/15/15). For time series, use temporal split (no future leakage).

  • Train baseline -- Start with a simple model (logistic regression, gradient boosted trees) to establish a benchmark.

  • Tune hyperparameters -- Use Optuna or cross-validated grid search. Log all runs to MLflow.

  • Evaluate on held-out test set: from sklearn.metrics import classification_report, roc_auc_score

y_pred = model.predict(X_test) y_prob = model.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred)) print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")

  • Validate -- Check calibration (predicted probabilities match observed rates). Evaluate fairness metrics across protected groups. Confirm no overfitting (train vs test gap <5%).

Workflow 4: Deploy a Model to Production

  • Containerize -- Package model with inference dependencies in Docker: docker build -t model-service:v1 .

  • Set up serving -- Deploy behind a REST API with health check, input validation, and structured error responses.

  • Configure monitoring:

  • Input drift: compare incoming feature distributions to training baseline (KS test, PSI)

  • Output drift: monitor prediction distribution shifts

  • Performance: track latency P50/P95/P99 targets (<50ms / <100ms / <200ms)

  • Enable canary deployment -- Route 5% traffic to new model, compare metrics against baseline for 24-48 hours.

  • Validate -- python scripts/model_evaluation_suite.py --config config.yaml --deploy confirms serving latency, error rate <0.1%, and model outputs match offline evaluation.

Workflow 5: Perform Causal Inference

  • Assess assignment mechanism -- Determine if treatment was randomized (use experiment analysis) or observational (use causal methods below).

  • Select method based on data structure:

  • Propensity Score Matching: when treatment is binary, many covariates available

  • Difference-in-Differences: when pre/post data available for treatment and control groups

  • Regression Discontinuity: when treatment assigned by threshold on running variable

  • Instrumental Variables: when unobserved confounding present but valid instrument exists

  • Check assumptions -- Parallel trends (DiD), overlap/positivity (PSM), continuity (RDD).

  • Estimate treatment effect and compute confidence intervals.

  • Validate -- Run placebo tests (apply method to pre-treatment period, expect null effect). Sensitivity analysis for unobserved confounding.

Performance Targets

Metric Target

P50 latency < 50ms

P95 latency < 100ms

P99 latency < 200ms

Throughput

1,000 req/s

Availability 99.9%

Error rate < 0.1%

Common Commands

Development

python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/

Training

python scripts/train.py --config prod.yaml python scripts/evaluate.py --model best.pth

Deployment

docker build -t service:v1 . kubectl apply -f k8s/ helm upgrade service ./charts/

Monitoring

kubectl logs -f deployment/service python scripts/health_check.py

Reference Documentation

Document Path

Statistical Methods references/statistical_methods_advanced.md

Experiment Design Frameworks references/experiment_design_frameworks.md

Feature Engineering Patterns references/feature_engineering_patterns.md

Automation Scripts scripts/ directory

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

product-designer

No summary provided by upstream source.

Repository SourceNeeds Review
2.2K-borghei
General

business-intelligence

No summary provided by upstream source.

Repository SourceNeeds Review
General

brand-strategist

No summary provided by upstream source.

Repository SourceNeeds Review
General

senior-mobile

No summary provided by upstream source.

Repository SourceNeeds Review