assembly-large-audio-transcriber

Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC, OGG, WEBM. Zero SDK dependency.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "assembly-large-audio-transcriber" with this command: npx skills add jiadong0723/jiadong-assembly-large-audio

AssemblyAI Large Audio Transcriber

Transcribe超大音频文件(100MB~1GB)专用方案,零SDK依赖,直接调HTTP API。

功能

  • 支持超大文件:最高 1GB / 12小时音频
  • 说话人分离(Speaker A/B/C…)
  • 词级时间戳
  • 100+语言,自动检测
  • MP3 / WAV / M4A / FLAC / OGG / WEBM 支持

安装依赖

服务器执行(只需一次):

pip install requests

设置 API Key

在环境变量中设置:

export ASSEMBLYAI_API_KEY="your-key"

或告知许霸天你的 AssemblyAI API Key,我来配置。

免费额度:每月100分钟;付费约 $0.01/分钟。

使用方式

告诉许霸天:

用 AssemblyAI 转录 [文件路径]

支持本地文件和 URL。

技术方案

第一步:上传文件(针对大文件)

AssemblyAI 要求先上传获取 upload_url,再提交转录任务:

import requests, os, time

API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
HEADERS = {"authorization": API_KEY}

# 1. 上传文件获取 upload_url
def upload_file(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.assemblyai.com/v2/upload",
            headers=HEADERS,
            data=f,
            timeout=300
        )
    response.raise_for_status()
    return response.json()["upload_url"]

# 2. 提交转录任务
def transcribe(upload_url, language="zh"):
    payload = {
        "audio_url": upload_url,
        "speaker_labels": True,
        "format_text": True,
        "language_code": language if language != "auto" else None,
    }
    if language == "auto":
        payload["language_detection"] = True
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS,
        json=payload,
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]

# 3. 轮询结果
def wait_for_result(transcript_id, poll_interval=5, max_wait=3600):
    start = time.time()
    while True:
        result = requests.get(
            f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
            headers=HEADERS,
            timeout=30
        )
        result.raise_for_status()
        data = result.json()
        status = data["status"]
        elapsed = time.time() - start
        if status == "completed":
            return data
        elif status == "error":
            raise Exception(f"Transcription error: {data.get('error')}")
        elif elapsed > max_wait:
            raise TimeoutError(f"Timeout after {max_wait}s")
        else:
            print(f"[{elapsed:.0f}s] Status: {status}...")
            time.sleep(poll_interval)

# 4. 完整流程
def transcribe_large_audio(file_path, language="auto"):
    print(f"上传中: {file_path}")
    upload_url = upload_file(file_path)
    print(f"提交转录任务...")
    tid = transcribe(upload_url, language)
    print(f"任务ID: {tid}")
    print("等待转录完成(可能需要数分钟)...")
    result = wait_for_result(tid)
    return result

处理结果

result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")

# 打印带说话人的转录
for utt in result.get("utterances", []):
    speaker = utt.get("speaker", "?")
    text = utt.get("text", "")
    start = utt.get("start", 0) / 1000  # 毫秒→秒
    print(f"[{speaker}] {start:.1f}s: {text}")

# 或打印纯文本
print(result.get("text", ""))

通过 URL 转录(如果文件已在网上)

如果文件可通过公网访问,直接提交 URL 更简单:

def transcribe_url(audio_url, language="zh"):
    payload = {
        "audio_url": audio_url,
        "speaker_labels": True,
        "language_detection": True,
    }
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    tid = response.json()["id"]
    result = wait_for_result(tid)
    return result

完整使用示例

import json, sys

file_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3"
language = sys.argv[2] if len(sys.argv) > 2 else "zh"

result = transcribe_large_audio(file_path, language)

output = {
    "file": file_path,
    "language": result.get("language_code"),
    "duration_s": result.get("audio_duration"),
    "transcript": result.get("text"),
    "utterances": [
        {
            "speaker": u.get("speaker"),
            "start_s": round(u.get("start", 0) / 1000, 2),
            "end_s": round(u.get("end", 0) / 1000, 2),
            "text": u.get("text"),
        }
        for u in result.get("utterances", [])
    ]
}

print(json.dumps(output, ensure_ascii=False, indent=2))

大文件处理流程(许霸天专用)

当用户提交超大音频文件时,按以下步骤执行:

  1. 确认文件路径和大小
  2. 确认 ASSEMBLYAI_API_KEY 已配置
  3. 执行上面的 transcribe_large_audio() 流程
  4. 轮询直到完成
  5. 整理输出:按时间顺序输出每句话,带说话人和时间戳
  6. 写文件存档:/workspace/memory/meetings/{日期}-{会议名}_原始转录.md

错误处理

错误原因解决
401 UnauthorizedAPI Key 无效或未设置检查 ASSEMBLYAI_API_KEY
413 Payload Too Large文件超 1GB需分割文件
422 Unprocessable Entity音频格式不支持用 ffmpeg 转换格式
429 Rate Limit超出并发限制等待后重试,降低轮询频率

文件分割(如果单文件超过1GB)

如遇 1GB 限制,用以下方式分割:

ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3

再分别转录,最后拼接结果。

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Speechmatics

Transcribe audio files (voice notes, recordings, podcasts) to text via the Speechmatics batch transcription API. Use when the user asks to transcribe audio,...

Registry SourceRecently Updated
780Profile unavailable
Automation

Whisper Voice Transcription (whisper.cpp)

Build and use whisper.cpp for local speech-to-text workflows, with optional cloud fallback when local transcription is not practical.

Registry SourceRecently Updated
930Profile unavailable
Automation

Cult Of Carcinization

Give your agent a voice — and ears. The Cult of Carcinization is the bot-first gateway to ScrappyLabs TTS and STT. Speak with 20+ voices, design your own from a text description, transcribe audio to text, and evolve into a permanent bot identity. No human signup required.

Registry Source
1.9K3Profile unavailable
General

Chinese Bedtime Story Generator

生成多角色中文睡前故事并合成语音。适用于用户想为孩子定制个性化睡前故事的场景:根据孩子姓名、年龄和兴趣,由 LLM 创建完整世界观和角色,生成分段故事文本(每段标注说话人),再由 TTS 以不同音色合成旁白、主角、小伙伴和长者的语音,最终拼接为完整 MP3 音频文件。支持连载模式(`--continue`)在多次...

Registry SourceRecently Updated
2811Profile unavailable