data-analysis

You are a data analysis expert. When this skill is loaded, follow these guidelines for analyzing data.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-analysis" with this command: npx skills add vstorm-co/pydantic-deepagents/vstorm-co-pydantic-deepagents-data-analysis

Data Analysis Skill

You are a data analysis expert. When this skill is loaded, follow these guidelines for analyzing data.

Workflow

  • Load the data: Use pandas to read CSV files

  • Explore the data: Check shape, dtypes, missing values, and basic statistics

  • Clean if needed: Handle missing values, duplicates, and outliers

  • Analyze: Perform requested analysis (aggregations, correlations, trends)

  • Visualize: Create charts using matplotlib when appropriate

  • Report: Summarize findings clearly

Code Templates

Loading Data

import pandas as pd import matplotlib.pyplot as plt

Load CSV

df = pd.read_csv('/uploads/filename.csv')

Basic info

print(f"Shape: {df.shape}") print(f"Columns: {list(df.columns)}") print(df.dtypes) print(df.describe())

Handling Missing Values

Check missing values

print(df.isnull().sum())

Fill or drop

df = df.dropna() # or df = df.fillna(df.mean()) # for numeric columns

Basic Analysis

Group by and aggregate

summary = df.groupby('category').agg({ 'value': ['mean', 'sum', 'count'], 'other_col': 'first' })

Correlation

correlation = df.select_dtypes(include='number').corr()

Visualization with Matplotlib

Always save charts to /workspace/ directory so they can be viewed in the app.

import matplotlib.pyplot as plt import seaborn as sns

Set style for better looking charts

plt.style.use('seaborn-v0_8-darkgrid') sns.set_palette("husl")

Bar Chart

plt.figure(figsize=(10, 6)) df.groupby('category')['value'].sum().plot(kind='bar', color='steelblue', edgecolor='black') plt.title('Value by Category', fontsize=14, fontweight='bold') plt.xlabel('Category') plt.ylabel('Total Value') plt.xticks(rotation=45, ha='right') plt.tight_layout() plt.savefig('/workspace/bar_chart.png', dpi=150, bbox_inches='tight') plt.close()

Line Chart (Time Series)

plt.figure(figsize=(12, 6)) plt.plot(df['date'], df['value'], marker='o', linewidth=2, markersize=4) plt.title('Value Over Time', fontsize=14, fontweight='bold') plt.xlabel('Date') plt.ylabel('Value') plt.grid(True, alpha=0.3) plt.tight_layout() plt.savefig('/workspace/line_chart.png', dpi=150, bbox_inches='tight') plt.close()

Pie Chart

plt.figure(figsize=(8, 8)) data = df.groupby('category')['value'].sum() plt.pie(data, labels=data.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel')) plt.title('Distribution by Category', fontsize=14, fontweight='bold') plt.tight_layout() plt.savefig('/workspace/pie_chart.png', dpi=150, bbox_inches='tight') plt.close()

Histogram

plt.figure(figsize=(10, 6)) plt.hist(df['value'], bins=20, color='steelblue', edgecolor='black', alpha=0.7) plt.title('Value Distribution', fontsize=14, fontweight='bold') plt.xlabel('Value') plt.ylabel('Frequency') plt.axvline(df['value'].mean(), color='red', linestyle='--', label=f'Mean: {df["value"].mean():.2f}') plt.legend() plt.tight_layout() plt.savefig('/workspace/histogram.png', dpi=150, bbox_inches='tight') plt.close()

Scatter Plot

plt.figure(figsize=(10, 6)) plt.scatter(df['x'], df['y'], alpha=0.6, c=df['category'].astype('category').cat.codes, cmap='viridis') plt.title('X vs Y Relationship', fontsize=14, fontweight='bold') plt.xlabel('X') plt.ylabel('Y') plt.colorbar(label='Category') plt.tight_layout() plt.savefig('/workspace/scatter.png', dpi=150, bbox_inches='tight') plt.close()

Heatmap (Correlation Matrix)

plt.figure(figsize=(10, 8)) correlation = df.select_dtypes(include='number').corr() sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0, fmt='.2f', square=True, linewidths=0.5) plt.title('Correlation Matrix', fontsize=14, fontweight='bold') plt.tight_layout() plt.savefig('/workspace/heatmap.png', dpi=150, bbox_inches='tight') plt.close()

Multiple Subplots

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

Plot 1: Bar chart

df.groupby('category')['value'].sum().plot(kind='bar', ax=axes[0, 0], color='steelblue') axes[0, 0].set_title('Total by Category') axes[0, 0].tick_params(axis='x', rotation=45)

Plot 2: Line chart

df.groupby('date')['value'].mean().plot(ax=axes[0, 1], marker='o') axes[0, 1].set_title('Average Over Time')

Plot 3: Histogram

axes[1, 0].hist(df['value'], bins=15, color='green', alpha=0.7) axes[1, 0].set_title('Value Distribution')

Plot 4: Box plot

df.boxplot(column='value', by='category', ax=axes[1, 1]) axes[1, 1].set_title('Value by Category') plt.suptitle('') # Remove auto-generated title

plt.tight_layout() plt.savefig('/workspace/dashboard.png', dpi=150, bbox_inches='tight') plt.close()

Interactive HTML Charts (Plotly)

For interactive charts that can be viewed in the browser:

import plotly.express as px import plotly.graph_objects as go

Interactive bar chart

fig = px.bar(df, x='category', y='value', color='category', title='Value by Category') fig.write_html('/workspace/interactive_bar.html')

Interactive line chart

fig = px.line(df, x='date', y='value', title='Value Over Time', markers=True) fig.write_html('/workspace/interactive_line.html')

Interactive scatter with hover

fig = px.scatter(df, x='x', y='y', color='category', size='value', hover_data=['name'], title='Interactive Scatter') fig.write_html('/workspace/interactive_scatter.html')

Interactive pie chart

fig = px.pie(df, values='value', names='category', title='Distribution') fig.write_html('/workspace/interactive_pie.html')

Best Practices

  • Always show the first few rows with df.head() to verify data loaded correctly

  • Check data types before operations - convert if necessary

  • Handle edge cases - empty data, single values, etc.

  • Use descriptive variable names in analysis code

  • Save visualizations to /workspace/ directory

  • Print intermediate results so the user can follow along

Output Format

When presenting results:

  • Use clear section headers

  • Include relevant statistics

  • Explain what the numbers mean

  • Provide actionable insights when possible

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

code-review

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

test-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Research

System Data Intelligence — File · Analysis · Visualization

专为文件操作、数据分析、可视化、数据库连接、API 接入和敏感数据处理设计的系统级 Agent Skill。 【强制触发场景】: - 用户提及任何文件操作:Excel / WPS / Word / TXT / Markdown / RTZ / CSV / JSON - 「分析」「读取」「提取」「处理」「建模」「预...

Registry SourceRecently Updated
2690Profile unavailable
Research

It is designed for scenarios that require direct operating system application and in-depth data analysis. [Forced trigger scenario]: - User mentions reading/writing/manipulating Excel, WPS, Word, TXT, Markdown, RTZ, etc. - User wants to "grab", "extract", and "get" data from any application - User needs to perform "in-depth analysis", "trend research", "anomaly detection", and "prediction" on the data - User requests to generate "charts", "visualizations", "dashboards", "data reports" - users say, "Help me see in this document..." Analyze this data...", "Make a chart presentation..." - Any task involving cross-application data flow [Core Competencies]: System interface calls × Data in-depth analysis × Professional visualization IMPORTANT: As long as it involves any of the file operations, data analysis, and visualization, this skill must be used. Don't skip tasks just because they "look simple" - there are many pitfalls in the underlying interface calls, and there are pitfall avoidance guides in the skills.

专为需要直接操作系统应用并进行深度数据分析的场景设计。 【强制触发场景】: - 用户提及 Excel、WPS、Word、TXT、Markdown、RTZ 等文件的读取/写入/操控 - 用户想从任何应用中「抓取」「提取」「获取」数据 - 用户需要对数据进行「深度分析」「趋势研究」「异常检测」「预测」 - 用户要求生...

Registry SourceRecently Updated
3520Profile unavailable