Edit Video
Conversational video editing in three phases: transcribe, plan the edit, render.
Prerequisites
- bun — TypeScript runtime
- ffmpeg / ffprobe — video processing (must be on PATH)
Environment variables (optional, have defaults):
| Variable | Purpose | Default |
|---|---|---|
AUDETIC_API_URL | Audetic transcription service URL | https://audio.audetic.link |
Commands
All paths are relative to this skill's directory. Run with bun run.
| Command | Purpose |
|---|---|
scripts/transcribe.ts <video> | Transcribe a single video file — produces JSON + markdown transcript + analysis alongside the file |
scripts/transcribe.ts <directory> | Transcribe all video files in a directory — produces merged JSON + markdown transcript + analysis in the directory |
scripts/preview.ts <edl.json> | Validate and preview an EDL before rendering |
scripts/render.ts <edl.json> | Render final video from EDL using ffmpeg stream copy |
scripts/caption.ts <video> <transcript.json> | Burn Shorts-style captions into video (optional --edl, --output) |
Workflow — Single File
1. Transcribe
bun run scripts/transcribe.ts <video-file>
Compresses audio to MP3, uploads to the audetic transcription service, and produces three files alongside the video:
.json— transcription result (used by tools)-transcript.md— readable transcript table-analysis.md— transcript with signal flags (gaps, speech rate)
2. Plan the Edit
Read both the transcript and the analysis. The analysis provides mechanical signals:
| Flag | Meaning |
|---|---|
gap:Xs | Silence of X seconds before this segment |
slow | Fewer than 0.5 words/second (typing, long pauses, dead air) |
silence | Zero words detected in segment |
| WPS column | Words per second (higher = denser speech) |
See references/signal-interpretation.md for detailed signal guidance.
Use these signals combined with your understanding of the content to decide what to keep/cut. The tool detects silence and pacing; you judge what's filler vs. substance.
When the edit plan is decided, write an EDL (Edit Decision List) JSON file:
{
"output": "/absolute/path/to/output.mp4",
"segments": [
{ "source": "/absolute/path/to/video.mp4", "start": "00:00:00.000", "end": "00:02:15.500", "label": "Introduction" },
{ "source": "/absolute/path/to/video.mp4", "start": "00:05:30.000", "end": "00:12:45.200", "label": "Main discussion" }
]
}
- Each segment requires a
sourcefield with the absolute path to its source video - Timestamps must be
HH:MM:SS.mmmformat (ffmpeg-native) - Segments are in playback order — rearranging segments reorders the output
labelis optional, helps communicate what each segment is- Only included segments appear in the output; everything else is cut
See references/edl-schema.md for the full schema reference.
3. Preview
bun run scripts/preview.ts <edl.json>
Always preview before rendering. Shows segment breakdown, kept/cut percentages, and validates the EDL.
4. Render
bun run scripts/render.ts <edl.json>
Uses ffmpeg stream copy (fast, cuts at nearest keyframe). Produces the final video.
5. Caption (Optional)
bun run scripts/caption.ts <edited-video.mp4> <transcript.json> --edl <edl.json>
Burns bold, centered captions (Hormozi style) into the video. Use --edl to remap transcript times to the edited video's timeline. Requires a full re-encode.
See references/caption-style.md for style defaults and customization.
Workflow — Multiple Clips (Directory)
Use this when the user has multiple short clips that should be edited into a single video.
1. Transcribe All Clips
bun run scripts/transcribe.ts <directory>
Finds all video files (*.mp4, *.mkv, *.mov, *.webm, *.ts), transcribes each one, and produces merged output in the directory:
transcript.json— merged transcript withsourcefield on each segmenttranscript.md— merged readable table with Source columnanalysis.md— merged analysis with Source column (gap detection resets at clip boundaries)
2. Plan the Edit
Same process as single-file, but the transcript and analysis include a Source column showing which clip each segment came from. Write an EDL with per-segment source paths:
{
"output": "/absolute/path/to/combined.mp4",
"segments": [
{ "source": "/absolute/path/to/clip001.mp4", "start": "00:00:02.000", "end": "00:00:12.000", "label": "Opening" },
{ "source": "/absolute/path/to/clip003.mp4", "start": "00:00:00.000", "end": "00:00:08.500", "label": "Key moment" },
{ "source": "/absolute/path/to/clip007.mp4", "start": "00:00:01.000", "end": "00:00:14.000", "label": "Closing" }
]
}
3. Preview + Render
Same as single-file workflow. Preview lists all sources with durations and shows source filename per segment.
4. Caption (Optional)
Same as single-file — run caption.ts with --edl on the rendered output.
Narrative Editing
Use narrative editing when the user provides a goal beyond "trim the filler" — a theme, tone, target duration, or audience.
See references/narrative-patterns.md for the full pattern catalog.
Thinking process:
- Read transcript + analysis. Identify distinct moments / content beats.
- Decide which moments serve the stated narrative goal.
- Determine the best order — chronological is one option, but also consider: Hook-first, Escalation, Question-answer, Bookend.
- Label each segment with its narrative role.
- Sum segment durations to verify against target.
Labels as narrative roles: Use the label field to document function (e.g., "HOOK: the punchline", "SETUP: context", "PAYOFF: resolution").
Narrative notes: Use the optional narrative_notes field in the EDL to document editorial reasoning.
Example:
{
"output": "/path/to/output.mp4",
"narrative_notes": "Goal: 60s punchy clip. Led with the reaction for hook, then backed into the setup.",
"segments": [
{ "source": "/path/to/video.mp4", "start": "00:05:30.000", "end": "00:05:55.000", "label": "HOOK: surprised reaction" },
{ "source": "/path/to/video.mp4", "start": "00:01:00.000", "end": "00:02:15.500", "label": "SETUP: reading the tweet" },
{ "source": "/path/to/video.mp4", "start": "00:06:00.000", "end": "00:06:30.000", "label": "PAYOFF: final take" }
]
}
Output File Naming
Single-file mode
The outputPaths function in scripts/lib/config.ts generates standard paths relative to the video:
| Output | Pattern |
|---|---|
| Transcript JSON | <name>.json |
| Transcript MD | <name>-transcript.md |
| Analysis MD | <name>-analysis.md |
| EDL | <name>-edl.json |
| Edited video | <name>-edited.mp4 |
| Captioned video | <name>-captioned.mp4 |
Directory mode
The directoryOutputPaths function generates paths inside the directory:
| Output | Pattern |
|---|---|
| Transcript JSON | transcript.json |
| Transcript MD | transcript.md |
| Analysis MD | analysis.md |
Session Flow
Single file
- User provides a video file path
- Run
transcribe.tson it - Read the
-analysis.mdand-transcript.mdfiles - Discuss with user what to keep/cut (or accept a narrative goal)
- Write the EDL JSON file (each segment has
sourcepointing to the video) - Run
preview.tsto validate — review with user - Run
render.tsto produce the final video - (Optional) Run
caption.tswith--edlif user wants Shorts-style captions - Report output path and final duration
Multiple clips
- User provides a directory of clips
- Run
transcribe.tson the directory - Read the merged
analysis.mdandtranscript.mdin the directory - Discuss with user which clips/segments to include
- Write the EDL JSON file (each segment has
sourcepointing to its clip) - Run
preview.tsto validate — review with user - Run
render.tsto produce the combined video - (Optional) Run
caption.tswith--edlif user wants Shorts-style captions - Report output path and final duration
Tips
- Keyframe imprecision: Stream copy cuts at the nearest keyframe, so cuts may be off by up to ~0.5s. This is the tradeoff for fast rendering without re-encoding.
- Large videos: Transcription time scales with video length. For videos over 30 minutes, warn the user it may take a while.
- Always preview first: Never render without previewing. The preview catches validation errors and lets the user confirm before committing.
- Duration targeting: When given a target duration, sum the
Durcolumn values from the analysis for selected segments. Iterate until the EDL fits. - Reinterpret signals: Gaps aren't just cut candidates — they mark topic boundaries. Slow segments aren't always boring — a pause before a realization can be dramatic.
- Mixed codecs: When combining clips from different sources, the preview tool warns about mixed file extensions. Clips from the same device/app are usually safe.
- Caption re-encoding: Burning captions requires a full video encode (not stream copy), so it takes longer than rendering. Mention this to the user before starting.
- Offer captions: When the user mentions Shorts, Reels, TikTok, or short-form content, offer to add Shorts-style captions after rendering.