Category: provider
Model Studio Qwen ASR (Non-Realtime)
Validation
mkdir -p output/alicloud-ai-audio-asr python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.
Output And Evidence
-
Store transcripts and API responses under output/alicloud-ai-audio-asr/ .
-
Keep one command log or sample response per run.
Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.
Critical model names
Use one of these exact model strings:
-
qwen3-asr-flash
-
qwen-audio-asr
-
qwen3-asr-flash-filetrans
Selection guidance:
-
Use qwen3-asr-flash or qwen-audio-asr for short/normal recordings (sync).
-
Use qwen3-asr-flash-filetrans for long-file transcription (async task workflow).
Prerequisites
- Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv . .venv/bin/activate
- Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials .
Normalized interface (asr.transcribe)
Request
-
audio (string, required): public URL or local file path.
-
model (string, optional): default qwen3-asr-flash .
-
language_hints (array, optional): e.g. zh , en .
-
sample_rate (number, optional)
-
vocabulary_id (string, optional)
-
disfluency_removal_enabled (bool, optional)
-
timestamp_granularities (array, optional): e.g. sentence .
-
async (bool, optional): default false for sync models, true for qwen3-asr-flash-filetrans .
Response
-
text (string): normalized transcript text.
-
task_id (string, optional): present for async submission.
-
status (string): SUCCEEDED or submission status.
-
raw (object): original API response.
Quick start (official HTTP API)
Sync transcription (OpenAI-compatible protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions'
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
--header 'Content-Type: application/json'
--data '{
"model": "qwen3-asr-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
"stream": false,
"asr_options": {
"enable_itn": false
}
}'
Async long-file transcription (DashScope protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription'
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
--header 'X-DashScope-Async: enable'
--header 'Content-Type: application/json'
--data '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}'
Poll task result:
curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>"
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Local helper script
Use the bundled script for URL/local-file input and optional async polling:
python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
--model qwen3-asr-flash
--language-hints zh,en
--print-response
Long-file mode:
python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
--model qwen3-asr-flash-filetrans
--async
--wait
Operational guidance
-
For local files, use input_audio.data (data URI) when direct URL is unavailable.
-
Keep language_hints minimal to reduce recognition ambiguity.
-
For async tasks, use 5-20s polling interval with max retry guard.
-
Save normalized outputs under output/alicloud-ai-audio-asr/transcripts/ .
Output location
-
Default output: output/alicloud-ai-audio-asr/transcripts/
-
Override base dir with OUTPUT_DIR .
Workflow
-
Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
-
Run one minimal read-only query first to verify connectivity and permissions.
-
Execute the target operation with explicit parameters and bounded scope.
-
Verify results and save output/evidence files.
References
-
references/api_reference.md
-
references/sources.md
-
Realtime synthesis is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/ .