EchoForge Moss Voice
Use this skill to run voice interaction with user-preferred timbre.
Required runtime config
MOSI_API_KEY(required)MOSI_BASE_URL(optional, defaulthttps://studio.mosi.cn)
Always send:
Authorization: Bearer <MOSI_API_KEY>
Inputs
Collect:
text(required, what to speak)- Voice source (one of):
voice_id(preferred when available), orreference_audio(public URL), or- local audio path (upload first, then clone voice)
Optional:
expected_duration_secsampling_params:max_new_tokens(default 512)temperature(default 1.7)top_p(default 0.8)top_k(default 25)
meta_info(default false)
Workflow
- Resolve voice source.
- If
voice_idis available, use it directly. - If only local audio path is available:
- Upload file:
POST /api/v1/files/uploadwith multipart fieldfile. - Clone voice:
POST /api/v1/voice/clonewithfile_id(orurl). - If returned voice status is not active, poll
GET /api/v1/voices/{voice_id}untilACTIVEor timeout.
- Upload file:
- If
reference_audioURL is available, use it directly in TTS.
- If
- Run TTS:
POST /v1/audio/tts.- Required payload:
model: "moss-tts"text- one of
voice_idorreference_audio
- Required payload:
- Parse response:
- Decode
audio_data(base64) to WAV. - Read
duration_sandusagewhen present.
- Decode
- Return a concise result:
voice_idused- output file path
- duration
- brief status message
Error handling
- If
4010or4011: API key missing/invalid, ask user to fixMOSI_API_KEY. - If
4020: insufficient credits, ask user to recharge. - If
4029: rate limited, retry with exponential backoff. - If
5002: invalid audio URL or decode failed, ask user for another clip. - If
5004: timeout, shorten text and retry.
Operational constraints
- Keep request rate <= 5 RPM.
- Keep single request text short enough to avoid timeout.
- Never print or log raw API keys.
- Prefer reusing stable
voice_idfor multi-turn voice chat to reduce latency.