data-scientist

Expert in statistical analysis, experimentation, and business insights.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-scientist" with this command: npx skills add anton-abyzov/specweave/anton-abyzov-specweave-data-scientist

Data Scientist

Expert in statistical analysis, experimentation, and business insights.

⚠️ Chunking Rule

Large analyses (EDA + modeling + visualization) = 800+ lines. Generate ONE phase per response: EDA → Feature Engineering → Modeling → Evaluation → Recommendations

Core Capabilities

Statistical Modeling

  • Hypothesis testing (t-test, chi-square, ANOVA)

  • Regression analysis (linear, logistic, GLMs)

  • Bayesian inference

  • Causal inference (propensity score matching, DiD)

Experimentation

  • A/B test design and analysis

  • Sample size calculation

  • Statistical power analysis

  • Multi-armed bandits

Customer Analytics

  • Customer Lifetime Value (CLV) prediction

  • Churn prediction and prevention

  • Cohort analysis

  • RFM segmentation

Anomaly Detection

  • Isolation Forest for outliers

  • DBSCAN clustering

  • Statistical process control

  • Time series anomaly detection

Experiment Tracking

  • MLflow integration for experiment logging

  • Weights & Biases (W&B) support

  • Experiment comparison and visualization

  • Model versioning and registry

Data Visualization

  • Exploratory data analysis (EDA)

  • Distribution plots and correlations

  • Time series visualization

  • Interactive dashboards (Plotly, Streamlit)

Best Practices

A/B Test Analysis

from scipy import stats

def analyze_ab_test(control, treatment, metric='conversion'): # Check sample size n_control, n_treatment = len(control), len(treatment)

# Statistical test
t_stat, p_value = stats.ttest_ind(control[metric], treatment[metric])

# Effect size (Cohen's d)
pooled_std = np.sqrt((control[metric].var() + treatment[metric].var()) / 2)
effect_size = (treatment[metric].mean() - control[metric].mean()) / pooled_std

return {
    'p_value': p_value,
    'significant': p_value < 0.05,
    'effect_size': effect_size,
    'lift': (treatment[metric].mean() / control[metric].mean() - 1) * 100
}

Experiment Tracking with MLflow

import mlflow

with mlflow.start_run(run_name="experiment-001"): mlflow.log_param("model_type", "xgboost") mlflow.log_params(model.get_params())

# Train and evaluate
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test, predictions))
mlflow.log_metric("f1", f1_score(y_test, predictions))

# Log model
mlflow.sklearn.log_model(model, "model")

When to Use

  • Business analytics and insights

  • A/B test design and analysis

  • Customer segmentation and CLV

  • Anomaly and fraud detection

  • Experiment tracking and comparison

  • Data visualization and EDA

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

technical-writing

No summary provided by upstream source.

Repository SourceNeeds Review
General

spec-driven-brainstorming

No summary provided by upstream source.

Repository SourceNeeds Review
General

kafka-architecture

No summary provided by upstream source.

Repository SourceNeeds Review
General

docusaurus

No summary provided by upstream source.

Repository SourceNeeds Review