transcript-pipeline

This skill should be used when the user asks to "process this transcript", "convert lecture to notes", "run transcript pipeline", "generate class tutorial from Zoom captions", "validate transcript coverage", or "enrich class resources" (Notion/Canva/Drive links) for bootcamp notes.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "transcript-pipeline" with this command: npx skills add prakharmnnit/skills-and-personas/prakharmnnit-skills-and-personas-transcript-pipeline

Transcript Pipeline Skill

Run a deterministic, auditable transcript-to-tutorial workflow with optional resource enrichment.

Purpose

Use this skill to convert raw class captions into high-quality study notes while preserving accountability through ledger + validation artifacts.

Use scripts for deterministic work. Use chat/stage prompts for language-heavy transformation.

Core Contract

  1. Keep stage order: ingest -> refine -> synthesize -> enhance -> validate -> publish.
  2. Run deterministic gates with scripts, never with LLM self-certification.
  3. Preserve traceability in .pipeline/* artifacts.
  4. Keep learner-facing notes readable and sanitized.
  5. Treat validation status as PASS/FAIL source of truth.

Scripts

Use these scripts from scripts/:

  • ingest_zoom_captions.py - deterministic ingestion and segment ledger creation
  • run_chat_pipeline.py - guided orchestration for stage handoffs and validation
  • validate_coverage.py - hard-gate coverage validation
  • publish_tutorial_notes.py - learner-facing file naming and sanitization
  • merge_chunks.py - merge chunk outputs for large transcripts
  • run_colab_notebook_pipeline.py - AI/ML Colab appendix and code explainer pipeline
  • update_ai_notes_with_resources_and_colab.py - AI/ML notes enrichment utility
  • resource_enrichment.py - authenticated enrichment for Notion/Canva/Drive resources

Stage Workflow

Stage 0: Ingest (Deterministic)

Run:

python scripts/ingest_zoom_captions.py "<transcript_or_session_path>"

Required outputs:

  • .pipeline/segment_ledger.jsonl
  • .pipeline/segment_manifest.jsonl

Stage 1: Refine (Chat Stage)

Load references/stage1-refine.md.

Produce:

  • .pipeline/refined_transcript.md
  • .pipeline/topic_inventory.json
  • .pipeline/corrections_log.csv
  • .pipeline/uncertainty_report.json

Stage 2: Synthesize (Chat Stage)

Load references/stage2-synthesize.md.

Produce:

  • .pipeline/structured_notes.md
  • .pipeline/coverage_matrix.json

Stage 3: Enhance (Chat Stage)

Load:

  • references/stage3-enhance.md
  • references/tutorial-tech-bar-raiser.md

Produce:

  • .pipeline/enhanced_notes.md
  • final_notes.md
  • bootcamp_index.md

Stage 4: Validate (Deterministic)

Run:

python scripts/validate_coverage.py --pipeline-dir .pipeline

Validation guidance: references/stage4-validate.md.

Hard gates:

  1. Segment coverage accountability
  2. Uncertainty retention
  3. No orphan claims

Stage 5: Publish

Run:

python scripts/publish_tutorial_notes.py --root "<sessions_root>" --session-dir "<session_dir>"

Result:

  • Published tutorial filename in canonical format
  • Learner-safe note without noisy source tags
  • Updated course index links

One-Command Guided Mode

Use guided runner for chat-window workflows:

python scripts/run_chat_pipeline.py run "<transcript_or_session_path>" --deep-pass

This enforces required handoffs and deep quality gates.

Optional Resource Enrichment Stage

Run when class notes include external links (Notion/Canva/Drive):

python scripts/resource_enrichment.py --all-sessions

Single session:

python scripts/resource_enrichment.py --session-dir "<session_dir>"

Auth options:

  • Notion: NOTION_TOKEN_V2, NOTION_ACTIVE_USER
  • Canva: RESOURCE_PLAYWRIGHT_STORAGE_STATE

Reference: references/resource-enrichment-authenticated-flow.md.

Optional AI/ML Colab Enrichment

Run for Colab-backed AI/ML classes:

python scripts/run_colab_notebook_pipeline.py

Reference: references/colab-notebook-explainer-pipeline.md.

Large Transcript Handling

If input exceeds context comfort:

  1. Run Stage 1 by chunks.
  2. Merge chunk artifacts:
python scripts/merge_chunks.py --chunk-dirs "<chunkA/.pipeline>" "<chunkB/.pipeline>" --output-dir "<session/.pipeline>"
  1. Continue Stage 2 onward on merged artifacts.

Required Outputs Checklist

Learner-facing:

  • final_notes.md
  • <Domain> Class <NN> [DD-MM-YYYY] - <Topic>.md
  • bootcamp_index.md

Pipeline/audit:

  • .pipeline/segment_ledger.jsonl
  • .pipeline/segment_manifest.jsonl
  • .pipeline/refined_transcript.md
  • .pipeline/topic_inventory.json
  • .pipeline/corrections_log.csv
  • .pipeline/uncertainty_report.json
  • .pipeline/structured_notes.md
  • .pipeline/coverage_matrix.json
  • .pipeline/enhanced_notes.md
  • .pipeline/validation_report.md
  • .pipeline/exceptions.json (if fail)

Quality gates:

  • .pipeline/deep_pass_report.md (when --deep-pass)
  • .pipeline/deep_pass_exceptions.json (when --deep-pass)

Resource enrichment (optional):

  • .resources/resource_enrichment_report.json

Execution Rules

  • Fail fast on missing required artifacts.
  • Report missing outputs explicitly by file path.
  • Retry only from earliest failing stage.
  • Keep resource extraction status explicit (success/fallback/blocked).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

lecture-alchemist

No summary provided by upstream source.

Repository SourceNeeds Review
General

transcribe-refiner

No summary provided by upstream source.

Repository SourceNeeds Review
General

constellation-team

No summary provided by upstream source.

Repository SourceNeeds Review
General

backend-pe

No summary provided by upstream source.

Repository SourceNeeds Review