data-analysis

When to use this skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-analysis" with this command: npx skills add supercent-io/skills-template/supercent-io-skills-template-data-analysis

Data Analysis

When to use this skill

  • Data exploration: Understand a new dataset

  • Report generation: Derive data-driven insights

  • Quality validation: Check data consistency

  • Decision support: Make data-driven recommendations

Instructions

Step 1: Load and explore data

Python (Pandas):

import pandas as pd import numpy as np

Load CSV

df = pd.read_csv('data.csv')

Basic info

print(df.info()) print(df.describe()) print(df.head(10))

Check missing values

print(df.isnull().sum())

Data types

print(df.dtypes)

SQL:

-- Inspect table schema DESCRIBE table_name;

-- Sample data SELECT * FROM table_name LIMIT 10;

-- Basic stats SELECT COUNT(*) as total_rows, COUNT(DISTINCT column_name) as unique_values, MIN(numeric_column) as min_val, MAX(numeric_column) as max_val, AVG(numeric_column) as avg_val FROM table_name;

Step 2: Data cleaning

Handle missing values

df['column'].fillna(df['column'].mean(), inplace=True) df.dropna(subset=['required_column'], inplace=True)

Remove duplicates

df.drop_duplicates(inplace=True)

Type conversions

df['date'] = pd.to_datetime(df['date']) df['category'] = df['category'].astype('category')

Remove outliers (IQR method)

Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 df = df[(df['value'] >= Q1 - 1.5IQR) & (df['value'] <= Q3 + 1.5IQR)]

Step 3: Statistical analysis

Descriptive statistics

print(df['numeric_column'].describe())

Grouped analysis

grouped = df.groupby('category').agg({ 'value': ['mean', 'sum', 'count'], 'other': 'nunique' }) print(grouped)

Correlation

correlation = df[['col1', 'col2', 'col3']].corr() print(correlation)

Pivot table

pivot = pd.pivot_table(df, values='sales', index='region', columns='month', aggfunc='sum' )

Step 4: Visualization

import matplotlib.pyplot as plt import seaborn as sns

Histogram

plt.figure(figsize=(10, 6)) df['value'].hist(bins=30) plt.title('Distribution of Values') plt.savefig('histogram.png')

Boxplot

plt.figure(figsize=(10, 6)) sns.boxplot(x='category', y='value', data=df) plt.title('Value by Category') plt.savefig('boxplot.png')

Heatmap (correlation)

plt.figure(figsize=(10, 8)) sns.heatmap(correlation, annot=True, cmap='coolwarm') plt.title('Correlation Matrix') plt.savefig('heatmap.png')

Time series

plt.figure(figsize=(12, 6)) df.groupby('date')['value'].sum().plot() plt.title('Time Series of Values') plt.savefig('timeseries.png')

Step 5: Derive insights

Top/bottom analysis

top_10 = df.nlargest(10, 'value') bottom_10 = df.nsmallest(10, 'value')

Trend analysis

df['month'] = df['date'].dt.to_period('M') monthly_trend = df.groupby('month')['value'].sum() growth = monthly_trend.pct_change() * 100

Segment analysis

segments = df.groupby('segment').agg({ 'revenue': 'sum', 'customers': 'nunique', 'orders': 'count' }) segments['avg_order_value'] = segments['revenue'] / segments['orders']

Output format

Analysis report structure

Data Analysis Report

1. Dataset overview

  • Dataset: [name]
  • Records: X,XXX
  • Columns: XX
  • Date range: YYYY-MM-DD ~ YYYY-MM-DD

2. Key findings

  • Insight 1
  • Insight 2
  • Insight 3

3. Statistical summary

MetricValue
MeanX.XX
MedianX.XX
Std devX.XX

4. Recommendations

  1. [Recommendation 1]
  2. [Recommendation 2]

Best practices

  • Understand the data first: Learn structure and meaning before analysis

  • Incremental analysis: Move from simple to complex analyses

  • Use visualization: Use a variety of charts to spot patterns

  • Validate assumptions: Always verify assumptions about the data

  • Reproducibility: Document analysis code and results

Constraints

Required rules (MUST)

  • Preserve raw data (work on a copy)

  • Document the analysis process

  • Validate results

Prohibited (MUST NOT)

  • Do not expose sensitive personal data

  • Do not draw unsupported conclusions

References

  • Pandas Documentation

  • Matplotlib Gallery

  • Seaborn Tutorial

Examples

Example 1: Basic usage

Example 2: Advanced usage

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

log-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

autoresearch

No summary provided by upstream source.

Repository SourceNeeds Review
Research

data-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
data-analysis | V50.AI