Data Analytics & Reporting Guide
Overview
This guide covers the process of transforming raw data into actionable business insights: from data quality validation through statistical analysis, dashboard creation, and strategic reporting. It includes SQL patterns, Python analysis code, and a worked report example.
Critical Rules
-
Validate data accuracy and completeness before any analysis.
-
Document data sources, transformations, and assumptions.
-
Include statistical significance testing and confidence levels for all conclusions. Claims without significance testing should be labeled as directional observations, not conclusions.
-
Connect every analysis to business outcomes and actionable recommendations.
-
Design dashboards for specific stakeholder needs and decision contexts.
-
Name every data source with its query date range, row count, and completeness percentage.
-
Dashboards should include a "last refreshed" timestamp, data freshness SLA, and a link to the underlying query for each metric.
Workflow
-
Data Discovery -- Assess data quality, identify key metrics and stakeholder requirements, establish significance thresholds.
-
Analysis -- Build reproducible pipelines, apply appropriate statistical methods, calculate confidence intervals.
-
Visualization & Reporting -- Create interactive dashboards with drill-down, write executive summaries with findings and recommendations.
-
Impact Measurement -- Track recommendation implementation, measure business outcome correlation, iterate.
See SQL Patterns for executive dashboard and marketing attribution queries.
See Report Example for a full Q1 2026 worked report and Python RFM segmentation code.
Report Structure
Report Quality Checklist
-
All statistical claims include confidence intervals at a minimum 95% confidence level.
-
Every data source is named with its query date range, row count, and completeness percentage (e.g., "Snowflake analytics.orders, Jan 1 - Mar 31 2026, 142,300 rows, 99.7% complete").
-
Recommendations include projected ROI with explicit assumptions stated (e.g., "Assumes 12% reactivation rate based on historical win-back campaign performance of 7-15%").
-
Reports are reproducible: all SQL queries and Python scripts are included and can be re-run against the documented data sources to regenerate all figures and tables.
Reference
Capabilities Reference
-
Statistical analysis: regression, forecasting, A/B testing, segmentation, correlation
-
Dashboard and report creation (Tableau, Power BI, Looker, custom)
-
SQL query optimization and data warehouse management
-
Python/R for analysis, modeling, and automation
-
Customer analytics: lifetime value, churn prediction, RFM segmentation
-
Marketing attribution and campaign ROI measurement
-
Financial modeling and business performance analysis
-
Data quality assurance and GDPR/CCPA compliance in analytics
Scripts
The following scripts are available in the scripts/ directory for data analysis:
scripts/analyze_csv.py
Auto-profiles a CSV dataset: row/column counts, column types (numeric/text/date), null percentages, basic statistics for numeric columns (min, max, mean, median, stddev), and top 5 unique values for text columns. Outputs markdown or JSON.
python scripts/analyze_csv.py data.csv python scripts/analyze_csv.py data.csv --json
scripts/check_data_quality.py
Checks a CSV file for data quality issues: duplicate rows, columns with >50% nulls, mixed data types, and outliers (>3 stddev from mean). Reports a quality score (0-100) and specific issues found.
python scripts/check_data_quality.py data.csv python scripts/check_data_quality.py data.csv --stddev 2.5 --json