data-science-notebooks

Interactive Notebooks

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-science-notebooks" with this command: npx skills add legout/data-platform-agent-skills/legout-data-platform-agent-skills-data-science-notebooks

Interactive Notebooks

Use this skill for creating reproducible, well-structured notebooks for data exploration, analysis, and communication.

When to use this skill

  • Exploratory analysis — interactively investigate data

  • Reproducible research — document methodology with code and results

  • Teaching/demos — explain concepts with executable examples

  • Stakeholder communication — share insights with narrative + visuals

  • Prototyping — quickly iterate on data transformations or models

Tool selection

Tool Best For Key Feature

JupyterLab Traditional data science, extensions ecosystem Full IDE experience

marimo Reproducible notebooks, reactive execution Python-native, version-control friendly

VS Code + Jupyter IDE-native notebook experience Intellisense, debugging, git integration

Google Colab Cloud GPUs, sharing, collaboration Free TPU/GPU, easy sharing

Core principles

  1. Structure for readability

Title: Clear project/question description

Setup

Imports and configuration

Data Loading

Load and validate data

Analysis

  • Subsection per question/hypothesis
  • Clear markdown explanations
  • Visualizations with interpretations

Conclusions

Key findings and next steps

  1. Ensure reproducibility

Set random seeds

import numpy as np import random

np.random.seed(42) random.seed(42)

Pin versions in requirements.txt or environment.yml

requirements.txt example:

pandas==2.1.0

scikit-learn==1.3.0

  1. Keep cells focused
  • One concept per cell

  • Avoid cells with >50 lines

  • Refactor helper functions to .py files

  1. Never hardcode secrets

✅ Use environment variables

import os

api_key = os.environ.get("OPENAI_API_KEY")

❌ Never do this

api_key = "sk-abc123..."

Jupyter best practices

Magic commands (Jupyter/IPython)

In a Jupyter cell (these are IPython magics, not standard Python)

Auto-reload modules during development

%load_ext autoreload

%autoreload 2

Timing

%timeit function_call()

Debugging

%debug

Environment info (requires watermark package)

%watermark -v -m -p numpy,pandas,sklearn

Clean outputs before git

Using nbstripout

pip install nbstripout nbstripout --install

Or pre-commit hook

pip install pre-commit pre-commit install

marimo advantages

Reactive execution

marimo notebook - cells auto-recompute when dependencies change

import marimo as mo

slider = mo.ui.slider(1, 100, value=50) slider # Display the slider

This cell re-runs automatically when slider changes

df_filtered = df[df['value'] > slider.value]

Version control friendly

  • Pure Python (.py files)

  • No output blobs in git

  • Readable diffs

Convert Jupyter to marimo

marimo convert notebook.ipynb -o notebook.py

Common anti-patterns

  • ❌ Running cells out of order (Jupyter)

  • ❌ Giant cells with mixed concerns

  • ❌ Hardcoded file paths

  • ❌ No markdown explanations

  • ❌ Committing large output files

  • ❌ Inline data (use data/ folder)

Progressive disclosure

  • ../references/jupyter-advanced.md — Widgets, extensions, debugging

  • ../references/marimo-guide.md — Reactive patterns, UI components

  • ../references/notebook-testing.md — Unit tests for notebook code

  • ../references/sharing-publishing.md — nbconvert, Quarto, Voilà

Related skills

  • @data-science-eda — Exploration patterns for notebooks

  • @data-science-interactive-apps — Convert notebooks to apps

  • @data-engineering-core — Production-ready code patterns

References

  • Jupyter Documentation

  • marimo Documentation

  • nbstripout

  • Quarto (publishing)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

data-science-eda

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

data-science-feature-engineering

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

data-engineering-core

No summary provided by upstream source.

Repository SourceNeeds Review