Senior Data Scientist

Expert data science for statistical modeling, experimentation, ML deployment, and data-driven decision making.

Keywords

data-science, machine-learning, statistics, a-b-testing, causal-inference, feature-engineering, mlops, experiment-design, model-deployment, python, scikit-learn, pytorch, tensorflow, spark, airflow

Quick Start

Design an experiment with power analysis

python scripts/experiment_designer.py --input data/ --output results/

Run feature engineering pipeline

python scripts/feature_engineering_pipeline.py --target project/ --analyze

Evaluate model performance

python scripts/model_evaluation_suite.py --config config.yaml --deploy

Statistical analysis

python scripts/statistical_analyzer.py --data input.csv --test ttest --output report.json

Tools

Script Purpose

scripts/experiment_designer.py

A/B test design, power analysis, sample size calculation

scripts/feature_engineering_pipeline.py

Automated feature generation, correlation analysis, feature selection

scripts/statistical_analyzer.py

Hypothesis testing, causal inference, regression analysis

scripts/model_evaluation_suite.py

Model comparison, cross-validation, deployment readiness checks

Tech Stack

Category Tools

Languages Python, SQL, R, Scala

ML Frameworks PyTorch, TensorFlow, Scikit-learn, XGBoost

Data Processing Spark, Airflow, dbt, Kafka, Databricks

Deployment Docker, Kubernetes, AWS SageMaker, GCP Vertex AI

Experiment Tracking MLflow, Weights & Biases

Databases PostgreSQL, BigQuery, Snowflake, Pinecone

Workflow 1: Design and Analyze an A/B Test

Define hypothesis -- State the null and alternative hypotheses. Identify the primary metric (e.g., conversion rate, revenue per user).
Calculate sample size -- python scripts/experiment_designer.py --input data/ --output results/
Specify minimum detectable effect (MDE), significance level (alpha=0.05), and power (0.80).
Example: For baseline conversion 5%, MDE 10% relative lift, need ~31,000 users per variant.
Randomize assignment -- Use hash-based assignment on user ID for deterministic, reproducible splits.
Run experiment -- Monitor for sample ratio mismatch (SRM) daily. Flag if observed ratio deviates >1% from expected.
Analyze results: from scipy import stats

Two-proportion z-test for conversion rates

control_conv = control_successes / control_total treatment_conv = treatment_successes / treatment_total z_stat, p_value = stats.proportions_ztest( [treatment_successes, control_successes], [treatment_total, control_total], alternative='two-sided' )

Reject H0 if p_value < 0.05

Validate -- Check for novelty effects, Simpson's paradox across segments, and pre-experiment balance on covariates.

Workflow 2: Build a Feature Engineering Pipeline

Profile raw data -- python scripts/feature_engineering_pipeline.py --target project/ --analyze
Identify null rates, cardinality, distributions, and data types.
Generate candidate features:
Temporal: day-of-week, hour, recency, frequency, monetary (RFM)
Aggregation: rolling means/sums over 7d/30d/90d windows
Interaction: ratio features, polynomial combinations
Text: TF-IDF, embedding vectors
Select features -- Remove features with >95% null rate, near-zero variance, or >0.95 pairwise correlation. Use recursive feature elimination or SHAP importance.
Validate -- Confirm no target leakage (no features derived from post-outcome data). Check train/test distribution alignment.
Register -- Store features in feature store with versioning and lineage metadata.

Workflow 3: Train and Evaluate a Model

Split data -- Stratified train/validation/test split (70/15/15). For time series, use temporal split (no future leakage).
Train baseline -- Start with a simple model (logistic regression, gradient boosted trees) to establish a benchmark.
Tune hyperparameters -- Use Optuna or cross-validated grid search. Log all runs to MLflow.
Evaluate on held-out test set: from sklearn.metrics import classification_report, roc_auc_score

y_pred = model.predict(X_test) y_prob = model.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred)) print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")

Validate -- Check calibration (predicted probabilities match observed rates). Evaluate fairness metrics across protected groups. Confirm no overfitting (train vs test gap <5%).

Workflow 4: Deploy a Model to Production

Containerize -- Package model with inference dependencies in Docker: docker build -t model-service:v1 .
Set up serving -- Deploy behind a REST API with health check, input validation, and structured error responses.
Configure monitoring:
Input drift: compare incoming feature distributions to training baseline (KS test, PSI)
Output drift: monitor prediction distribution shifts
Performance: track latency P50/P95/P99 targets (<50ms / <100ms / <200ms)
Enable canary deployment -- Route 5% traffic to new model, compare metrics against baseline for 24-48 hours.
Validate -- python scripts/model_evaluation_suite.py --config config.yaml --deploy confirms serving latency, error rate <0.1%, and model outputs match offline evaluation.

Workflow 5: Perform Causal Inference

Assess assignment mechanism -- Determine if treatment was randomized (use experiment analysis) or observational (use causal methods below).
Select method based on data structure:
Propensity Score Matching: when treatment is binary, many covariates available
Difference-in-Differences: when pre/post data available for treatment and control groups
Regression Discontinuity: when treatment assigned by threshold on running variable
Instrumental Variables: when unobserved confounding present but valid instrument exists
Check assumptions -- Parallel trends (DiD), overlap/positivity (PSM), continuity (RDD).
Estimate treatment effect and compute confidence intervals.
Validate -- Run placebo tests (apply method to pre-treatment period, expect null effect). Sensitivity analysis for unobserved confounding.

Performance Targets

Metric Target

P50 latency < 50ms

P95 latency < 100ms

P99 latency < 200ms

Throughput

1,000 req/s

Availability 99.9%

Error rate < 0.1%

Common Commands

Development

python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/

Training

python scripts/train.py --config prod.yaml python scripts/evaluate.py --model best.pth

Deployment

docker build -t service:v1 . kubectl apply -f k8s/ helm upgrade service ./charts/

Monitoring

kubectl logs -f deployment/service python scripts/health_check.py

Reference Documentation

Document Path

Statistical Methods references/statistical_methods_advanced.md

Experiment Design Frameworks references/experiment_design_frameworks.md

Feature Engineering Patterns references/feature_engineering_patterns.md

Automation Scripts scripts/ directory

senior-data-scientist

Safety Notice

Copy this and send it to your AI assistant to learn

Design an experiment with power analysis

Run feature engineering pipeline

Evaluate model performance

Statistical analysis

Two-proportion z-test for conversion rates

Reject H0 if p_value < 0.05

Development

Training

Deployment

Monitoring

Source Transparency

Related Skills

product-designer

business-intelligence

brand-strategist

senior-mobile