audio to srt converter

Audio to SRT Converter

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "audio to srt converter" with this command: npx skills add dean9703111/ai-agent-skill-for-video-workflow/dean9703111-ai-agent-skill-for-video-workflow-audio-to-srt-converter

Audio to SRT Converter

This skill provides a Python-based workflow for converting audio files (MP3, WAV, M4A, FLAC, etc.) into SRT subtitle files with automatic speech recognition, customizable text formatting, and timeline optimization.

Purpose

Convert audio files (MP3, WAV, M4A, FLAC, etc.) into properly formatted SRT subtitle files with:

  • Automatic speech recognition and transcription

  • Support for multiple audio formats (MP3, WAV, M4A, FLAC, and more)

  • Customizable character limits per subtitle line (default: 22 characters, minimum: 4 characters)

  • Automatic timeline gap filling (gaps < 0.3s are merged)

  • Environment and dependency validation

  • Output naming convention: origin.srt

When to Use This Skill

Use this skill when:

  • Converting audio files to subtitle format

  • Generating transcriptions with timeline information

  • Creating SRT files for video editing or accessibility

  • Processing Chinese or multilingual audio content

Core Workflow

  1. Environment Validation

Before processing, validate:

  • Python 3.7+ is installed

  • Required packages are available (see Dependencies section)

  • Input MP3 file exists and is readable

  • Output directory is writable

  1. Audio Transcription

Process the audio file using speech recognition:

  • Load audio file (supports MP3, WAV, M4A, FLAC, etc.)

  • Perform speech-to-text conversion

  • Extract timestamps for each segment

  • Handle silence detection and word boundaries

  1. Text Formatting

Format transcribed text according to parameters:

  • Split text into lines based on character limit

  • Ensure minimum 4 characters per line

  • Respect word boundaries when possible

  • Handle Chinese character counting correctly

  1. Timeline Optimization

Adjust subtitle timing:

  • Identify gaps between subtitle segments

  • Merge segments when gap < 0.3 seconds

  • Extend previous subtitle end time to next subtitle start time

  • Maintain synchronization with audio

  1. SRT Generation

Create final SRT file:

  • Format according to SRT specification

  • Number subtitles sequentially

  • Use proper timestamp format (HH:MM:SS,mmm)

  • Save as origin.srt

Using the Conversion Script

The main conversion script is located at scripts/audio_to_srt.py .

Basic Usage

python scripts/audio_to_srt.py <audio_file> [--max-chars MAX_CHARS]

Parameters

  • audio_file (required): Path to the input audio file (MP3, WAV, M4A, FLAC, etc.)

  • --max-chars (optional): Maximum characters per subtitle line (default: 22, minimum: 4)

Examples

See examples/usage_example.sh for complete usage examples.

Dependencies

The script requires the following Python packages:

  • openai-whisper

  • For speech recognition

  • pydub

  • For audio processing

  • ffmpeg

  • System dependency for audio handling

Install with:

pip install openai-whisper pydub brew install ffmpeg # macOS

Output Format

The generated SRT file follows this format:

1 00:00:00,000 --> 00:00:03,500 這是第一行字幕

2 00:00:03,500 --> 00:00:07,200 這是第二行字幕

Additional Resources

Scripts

  • scripts/audio_to_srt.py

  • Main conversion script with environment validation

  • scripts/check_environment.py

  • Standalone environment checker

Examples

  • examples/usage_example.sh
  • Complete usage examples with different parameters

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Fast.io

Workspaces for agentic teams. Complete agent guide with all 19 consolidated tools using action-based routing — parameters, workflows, ID formats, and constra...

Registry SourceRecently Updated
3.6K1dbalve
Automation

Tozil

Track every AI dollar your agent spends. Per-model cost breakdown, daily budgets, and alerts.

Registry SourceRecently Updated
Automation

ComfyUI Controller Pro

支持批量生成10-100个修仙视频和图片,集成LTX2多版本模型与自动化浏览器及工作流管理功能。

Registry SourceRecently Updated
Automation

Baidu Yijian Vision

百度一见专业级视觉 AI Agent:支持图片/视频/及实时视频流分析。相比通用基模,在维持 95%+ 专业精度的同时,推理成本降低 50% 以上,是处理视觉巡检与监控分析任务的首选工具。主打 安全管理、SOP合规、工业质检、商业运营与物料盘点。覆盖:作业 SOP 合规与关键步骤完整性校验;工业质检与表面缺陷精密...

Registry SourceRecently Updated