databricks-repl-consolidate

Consolidate a Databricks REPL session into a single, clean Python file. Use this skill when the user wants to finalize, export, or consolidate a REPL session into a committable script. Triggers on requests to consolidate session output, produce a final script from REPL commands, export session to Python, clean up REPL artifacts into production code, or finalize a Databricks workflow.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "databricks-repl-consolidate" with this command: npx skills add wedneyyuri/databricks-repl/wedneyyuri-databricks-repl-databricks-repl-consolidate

Session Consolidation

Produce a single, clean .py file from a Databricks REPL session by reading session.json and the .cmd.py files.

Workflow

  1. Read session.json — the steps array contains the ordered list of steps with status and command file paths.
  2. Read each .cmd.py file — in step order, skipping failed steps (only successful steps survive).
  3. Strip REPL boilerplate — remove or convert REPL-specific calls (see Boilerplate Rules).
  4. Deduplicate — if a step was retried after an error, only keep the final successful version.
  5. Resolve imports — collect all imports from across cells and deduplicate them at the top of the file.
  6. Write the output — a single .py file with a clear structure.

Output Structure

"""
Consolidated from session: <session_name>
Source: <session_file_path>
Steps: <N> (of <total> attempted)
"""

# --- Dependencies ---
# Requires: scikit-learn, xgboost

# --- Imports ---
import os
import json
from sklearn.ensemble import RandomForestClassifier
# ...

# --- Step 1: load_data ---
df = spark.read.table("catalog.schema.table")
# ...

# --- Step 2: feature_engineering ---
# ...

# --- Step 3: train ---
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")
# ...

# --- Step 4: evaluate ---
# ...

Boilerplate Rules

Transform REPL-specific code into clean Python:

REPL CodeConsolidated Form
%pip install xgboostMove to # Requires: xgboost in header
sub_llm(prompt, ...)Keep as-is (it's business logic)
sub_llm_batch(prompts, ...)Keep as-is (it's business logic)

Key distinctions:

  • %pip install → collect into a # Requires: header comment
  • sub_llm() / sub_llm_batch() → keep unchanged, these are meaningful business logic
  • print() statements used only for REPL feedback → remove
  • print() statements that display meaningful results → keep

Deduplication Rules

Sessions often contain retries after errors. When multiple steps share the same tag:

  1. Find all steps with the same tag in session.json
  2. Keep only the last one with status: "Finished"
  3. Discard earlier failed attempts

When adjacent steps do the same thing (e.g., loading the same table with slight variations), keep only the final version.

Import Resolution

  1. Scan all surviving steps for import and from ... import statements
  2. Deduplicate — same import appearing in multiple steps becomes one line
  3. Place all imports at the top of the file, after the docstring and dependencies comment
  4. Remove imports that are no longer used after boilerplate stripping

Before / After Example

Before (3 separate .cmd.py files)

001_install.cmd.py:

%pip install scikit-learn pandas

002_load.cmd.py:

import pandas as pd
df = spark.read.table("catalog.schema.customers").toPandas()
print(f"Loaded {len(df)} rows")

003_train.cmd.py:

from sklearn.ensemble import RandomForestClassifier
import joblib

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(df[features], df["label"])
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")
print("Training complete")

After (consolidated .py)

"""
Consolidated from session: customer-classifier
Source: ./session.json
Steps: 3 (of 3 attempted)
"""

# --- Dependencies ---
# Requires: scikit-learn, pandas

# --- Imports ---
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# --- Step 1: load ---
df = spark.read.table("catalog.schema.customers").toPandas()

# --- Step 2: train ---
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(df[features], df["label"])
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")

Usage

  1. Ensure session.json has a steps array with at least one successful step
  2. Read session.json to understand the session structure
  3. Read each .cmd.py file referenced in the steps
  4. Apply the boilerplate rules, deduplication, and import resolution
  5. Write the consolidated file (default: <session_name>.py in the repo root)
  6. Review the output for correctness — automated consolidation may miss nuances in variable dependencies across steps

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated
Coding

ai-dating

This skill enables dating and matchmaking workflows. Use it when a user asks to make friends, find a partner, run matchmaking, or provide dating preferences/profile updates. The skill should execute `dating-cli` commands to complete profile setup, task creation/update, match checking, contact reveal, and review.

Archived SourceRecently Updated