audio to srt converter

Audio to SRT Converter

This skill provides a Python-based workflow for converting audio files (MP3, WAV, M4A, FLAC, etc.) into SRT subtitle files with automatic speech recognition, customizable text formatting, and timeline optimization.

Purpose

Convert audio files (MP3, WAV, M4A, FLAC, etc.) into properly formatted SRT subtitle files with:

Automatic speech recognition and transcription
Support for multiple audio formats (MP3, WAV, M4A, FLAC, and more)
Customizable character limits per subtitle line (default: 22 characters, minimum: 4 characters)
Automatic timeline gap filling (gaps < 0.3s are merged)
Environment and dependency validation
Output naming convention: origin.srt

When to Use This Skill

Use this skill when:

Converting audio files to subtitle format
Generating transcriptions with timeline information
Creating SRT files for video editing or accessibility
Processing Chinese or multilingual audio content

Core Workflow

Environment Validation

Before processing, validate:

Python 3.7+ is installed
Required packages are available (see Dependencies section)
Input MP3 file exists and is readable
Output directory is writable

Audio Transcription

Process the audio file using speech recognition:

Load audio file (supports MP3, WAV, M4A, FLAC, etc.)
Perform speech-to-text conversion
Extract timestamps for each segment
Handle silence detection and word boundaries

Text Formatting

Format transcribed text according to parameters:

Split text into lines based on character limit
Ensure minimum 4 characters per line
Respect word boundaries when possible
Handle Chinese character counting correctly

Timeline Optimization

Adjust subtitle timing:

Identify gaps between subtitle segments
Merge segments when gap < 0.3 seconds
Extend previous subtitle end time to next subtitle start time
Maintain synchronization with audio

SRT Generation

Create final SRT file:

Format according to SRT specification
Number subtitles sequentially
Use proper timestamp format (HH:MM:SS,mmm)
Save as origin.srt

Using the Conversion Script

The main conversion script is located at scripts/audio_to_srt.py .

Basic Usage

python scripts/audio_to_srt.py <audio_file> [--max-chars MAX_CHARS]

Parameters

audio_file (required): Path to the input audio file (MP3, WAV, M4A, FLAC, etc.)
--max-chars (optional): Maximum characters per subtitle line (default: 22, minimum: 4)

Examples

See examples/usage_example.sh for complete usage examples.

Dependencies

The script requires the following Python packages:

openai-whisper
For speech recognition
pydub
For audio processing
ffmpeg
System dependency for audio handling

Install with:

pip install openai-whisper pydub brew install ffmpeg # macOS

Output Format

The generated SRT file follows this format:

1 00:00:00,000 --> 00:00:03,500 這是第一行字幕

2 00:00:03,500 --> 00:00:07,200 這是第二行字幕

Additional Resources

Scripts

scripts/audio_to_srt.py
Main conversion script with environment validation
scripts/check_environment.py
Standalone environment checker

Examples

examples/usage_example.sh
Complete usage examples with different parameters

audio to srt converter

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

Fast.io

Tozil

ComfyUI Controller Pro

Baidu Yijian Vision