fal-generate Skill
Overview
This skill enables AI content generation through fal.ai's latest models using a queue-based system. It supports:
- Text-to-Image - Generate images from text prompts
- Image-to-Image - Edit and transform existing images
- Text-to-Video - Create videos from text descriptions
- Image-to-Video - Animate images into videos
- Text-to-Speech - Generate natural speech from text
- Speech-to-Text - Transcribe audio to text
- Text-to-3D - Create 3D models from text
- Image-to-3D - Convert images to 3D models
- LLM / VLM / ALM - Run any LLM, vision, audio or video model via OpenRouter
Scripts
| Script | Purpose |
|---|
scripts/generate.sh | Main generation tool with queue management |
scripts/upload.sh | Upload files to fal CDN (returns URL) |
scripts/poll.sh | Poll queue status until completion |
scripts/models.sh | Search and discover models |
Prerequisites
export FAL_KEY="your-api-key"
Output Format
All generation scripts output JSON to stdout when using --wait. The JSON contains URLs to the generated content:
- Images:
{"images": [{"url": "https://fal.media/files/...", "width": 1024, "height": 1024}]}
- Videos:
{"video": {"url": "https://fal.media/files/...mp4"}}
- Audio/TTS:
{"audio": {"url": "https://fal.media/files/...mp3"}} or {"audio_url": "https://..."}
- 3D Models:
{"model_mesh": {"url": "https://fal.media/files/...glb"}}
- Transcription:
{"text": "transcribed content..."}
- OpenRouter:
{"output": "LLM response text..."}
Without --wait, prints the request ID. With --async, prints only the request ID for later polling.
Examples by Category
Text-to-Image
# Basic image generation
./scripts/generate.sh -m fal-ai/kling-image/v3/text-to-image \
-p "A majestic mountain at sunrise, cinematic lighting" -w
# With aspect ratio and seed
./scripts/generate.sh -m fal-ai/flux-2/klein/9b \
-p "Professional headshot, studio lighting" \
--aspect-ratio "1:1" --seed 42 -w
# Ultra-fast generation
./scripts/generate.sh -m fal-ai/z-image/turbo \
-p "Quick concept sketch of a robot" -w
# With custom parameters (inference steps, guidance scale)
./scripts/generate.sh -m fal-ai/flux-2/klein/9b \
-p "Detailed portrait of a scientist" \
--param num_inference_steps=28 --param guidance_scale=3.5 -w
Image-to-Image (Edit/Transform)
# Upload local image first
IMAGE_URL=$(./scripts/upload.sh ~/photos/portrait.jpg)
# Edit with instructions
./scripts/generate.sh -m fal-ai/qwen-image-max/edit \
--image-url "$IMAGE_URL" \
-p "Make the background a sunset beach" -w
# Style transfer
./scripts/generate.sh -m fal-ai/glm-image/image-to-image \
--image-url "$IMAGE_URL" \
-p "Convert to oil painting style" -w
Text-to-Video
# Cinematic video with audio (Kling V3 Pro)
./scripts/generate.sh -m fal-ai/kling-video/v3/pro/text-to-video \
-p "A butterfly emerging from a cocoon in slow motion, macro lens" \
--duration 5 -w
# Fast video generation
./scripts/generate.sh -m fal-ai/ltx-2-19b/distilled/text-to-video \
-p "Drone shot flying over a city at golden hour" -w
# Google Veo 3.1 with sound
./scripts/generate.sh -m fal-ai/veo3.1 \
-p "A cat playing piano, realistic" -w
Image-to-Video (Animate Images)
IMAGE_URL=$(./scripts/upload.sh ~/photos/landscape.jpg)
# Animate a still photo
./scripts/generate.sh -m fal-ai/kling-video/o3/pro/image-to-video \
--image-url "$IMAGE_URL" \
-p "Gentle wind moving through the trees, clouds drifting" -w
# Lip-sync avatar from image + audio
AUDIO_URL=$(./scripts/upload.sh ~/audio/speech.mp3)
./scripts/generate.sh -m fal-ai/longcat-multi-avatar/image-audio-to-video \
--image-url "$IMAGE_URL" --audio-url "$AUDIO_URL" -w
Text-to-Speech
# High-quality TTS (MiniMax)
./scripts/generate.sh -m fal-ai/minimax/speech-2.8-hd \
-t "Hello! Welcome to the future of AI-generated content." -w
# Fast TTS
./scripts/generate.sh -m fal-ai/minimax/speech-2.8-turbo \
-t "This is a quick test of fast speech generation." -w
# Custom voice with Qwen-3 TTS
./scripts/generate.sh -m fal-ai/qwen-3-tts/text-to-speech/1.7b \
-t "Custom voice synthesis with natural intonation." -w
Voice Cloning
# Upload a voice sample (10+ seconds recommended)
VOICE_URL=$(./scripts/upload.sh ~/audio/voice-sample.wav)
# Clone and generate speech
./scripts/generate.sh -m fal-ai/qwen-3-tts/clone-voice/1.7b \
--audio-url "$VOICE_URL" \
-t "This sentence will be spoken in the cloned voice." -w
Speech-to-Text (Transcription)
AUDIO_URL=$(./scripts/upload.sh ~/recordings/meeting.mp3)
# Fast transcription
./scripts/generate.sh -m fal-ai/nemotron/asr \
--audio-url "$AUDIO_URL" -w
# Accurate transcription with timestamps
./scripts/generate.sh -m fal-ai/elevenlabs/speech-to-text/scribe-v2 \
--audio-url "$AUDIO_URL" -w
Text-to-3D
# Detailed 3D model from text
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/pro/text-to-3d \
-p "A detailed medieval sword with ornate handle" -w
# Fast 3D generation
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/rapid/text-to-3d \
-p "Simple wooden chair" -w
Image-to-3D
IMAGE_URL=$(./scripts/upload.sh ~/photos/object.jpg)
# Convert image to 3D model
./scripts/generate.sh -m fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d \
--image-url "$IMAGE_URL" -w
# High-fidelity geometry
./scripts/generate.sh -m fal-ai/ultrashape \
--image-url "$IMAGE_URL" -w
OpenRouter — Run Any LLM
# Text chat with any LLM (GPT-5, Claude, Gemini, Llama 4, etc.)
./scripts/generate.sh -m openrouter/router \
-p "Explain quantum computing in simple terms" \
--param model=google/gemini-2.5-flash -w
# Vision — analyze an image
IMAGE_URL=$(./scripts/upload.sh ~/photos/chart.png)
./scripts/generate.sh -m openrouter/router/vision \
--image-url "$IMAGE_URL" \
-p "Describe what you see in this image" \
--param model=google/gemini-2.5-flash -w
# Audio — process audio with an ALM
AUDIO_URL=$(./scripts/upload.sh ~/audio/podcast.mp3)
./scripts/generate.sh -m openrouter/router/audio \
--audio-url "$AUDIO_URL" \
-p "Summarize the key points discussed" \
--param model=google/gemini-2.5-flash -w
# Video — analyze a video
VIDEO_URL=$(./scripts/upload.sh ~/videos/demo.mp4)
./scripts/generate.sh -m openrouter/router/video \
--video-url "$VIDEO_URL" \
-p "Describe what happens in this video" \
--param model=google/gemini-2.5-flash -w
Usage Patterns
Queue Mode (Default) — submit and poll
./scripts/generate.sh -m fal-ai/flux-2/klein/9b -p "Portrait" --wait
Async Mode — get ID, poll later
REQUEST_ID=$(./scripts/generate.sh -m fal-ai/kling-video/v3/pro/text-to-video \
-p "Drone flying over a city" --async)
./scripts/poll.sh fal-ai/kling-video/v3/pro/text-to-video $REQUEST_ID
File Upload — local files to fal CDN
IMAGE_URL=$(./scripts/upload.sh ~/photos/portrait.jpg)
./scripts/generate.sh -m fal-ai/kling-video/o3/pro/image-to-video \
--image-url "$IMAGE_URL" -p "Gentle wind blowing through hair" -w
Common Parameters
| Parameter | Description | Example |
|---|
-m, --model | Model endpoint (required) | fal-ai/kling-image/v3/text-to-image |
-p, --prompt | Text description | "A sunset over mountains" |
-t, --text | Text for TTS models | "Hello world" |
--image-url | Input image URL | "https://..." |
--video-url | Input video URL | "https://..." |
--audio-url | Input audio URL | "https://..." |
--aspect-ratio | Output ratio | "16:9", "9:16", "1:1" |
--duration | Video length (sec) | 5, 10 |
--seed | Reproducibility | 12345 |
-w, --wait | Poll until done | (flag) |
-a, --async | Return ID only | (flag) |
--param | Extra param (repeatable) | num_inference_steps=28 |
Environment Variables
| Variable | Required | Description |
|---|
FAL_KEY | Yes | API authentication key |
FAL_WEBHOOK | No | Webhook URL for callbacks |
Tips
- Always use
--wait or --async — Without either, you get the request ID + a manual curl command
- Use
--param for advanced control — Pass any model-specific parameter: --param guidance_scale=7.5
- Check model schema —
./scripts/models.sh --schema <endpoint> to see all available params
- Upload files first — Use
./scripts/upload.sh for local images/audio/video before generation
- Use seeds — Same seed = same output for reproducible results
- Pro vs Standard — Pro = better quality + longer generation; Standard = cost-effective
- Flash/Turbo/Distilled — Best for previews and fast iterations
Model Catalog
Image Generation — February 2026
| Endpoint | Description |
|---|
fal-ai/kling-image/v3/text-to-image | Kling V3: Latest Kling image model |
fal-ai/kling-image/v3/image-to-image | Kling V3 image transformation |
fal-ai/kling-image/o3/text-to-image | Kling Omni 3: Top-tier consistency |
fal-ai/kling-image/o3/image-to-image | Kling Omni 3 image editing |
xai/grok-imagine-image | xAI Grok Imagine: Highly aesthetic |
xai/grok-imagine-image/edit | Grok Imagine editing |
fal-ai/hunyuan-image/v3/instruct/text-to-image | Hunyuan 3.0 Instruct |
fal-ai/hunyuan-image/v3/instruct/edit | Hunyuan 3.0 editing |
fal-ai/qwen-image-max/text-to-image | Qwen Image Max: Enhanced realism |
fal-ai/qwen-image-max/edit | Qwen Image Max editing |
fal-ai/z-image/base | Z-Image Base: 6B fast model |
Image Generation — January 2026
| Endpoint | Description |
|---|
fal-ai/flux-2/klein/9b | FLUX.2 Klein 9B: Photorealism & text |
fal-ai/flux-2/klein/9b/edit | FLUX.2 Klein 9B editing |
fal-ai/flux-2/klein/9b/base/lora | FLUX.2 Klein 9B with LoRA |
fal-ai/flux-2/klein/4b | FLUX.2 Klein 4B: Lightweight |
fal-ai/glm-image | GLM Image: Accurate text rendering |
bria/fibo-edit/edit | Bria Fibo Edit: Multi-tool editing |
bria/fibo-edit/blend | Bria Fibo composition |
bria/fibo-edit/relight | Bria Fibo relighting |
bria/fibo-edit/restyle | Bria Fibo artistic styles |
bria/fibo-lite/generate | Bria Fibo Lite: Fast generation |
imagineart/imagineart-1.5-pro-preview/text-to-image | ImagineArt 1.5 Pro: 4K |
Image Generation — December 2025
| Endpoint | Description |
|---|
fal-ai/flux-2-max | FLUX.2 Max: State-of-the-art |
fal-ai/flux-2/turbo | FLUX.2 Turbo: Fast generation |
fal-ai/flux-2/flash | FLUX.2 Flash: Ultra-fast |
fal-ai/gpt-image-1.5 | GPT Image 1.5: Strong prompt adherence |
fal-ai/bytedance/seedream/v4.5/text-to-image | Seedream 4.5: ByteDance |
fal-ai/z-image/turbo | Z-Image Turbo: 6B super fast |
fal-ai/qwen-image-2512 | Qwen Image 2512 |
Video Generation — February 2026
| Endpoint | Description |
|---|
fal-ai/kling-video/v3/pro/text-to-video | Kling 3.0 Pro: Cinematic + audio |
fal-ai/kling-video/v3/standard/text-to-video | Kling 3.0 Standard |
fal-ai/kling-video/v3/pro/image-to-video | Kling 3.0 Pro I2V |
fal-ai/kling-video/v3/standard/image-to-video | Kling 3.0 Standard I2V |
fal-ai/kling-video/o3/pro/text-to-video | Kling O3 Pro: Realistic |
fal-ai/kling-video/o3/pro/image-to-video | Kling O3 Pro I2V |
fal-ai/kling-video/o3/pro/reference-to-video | Kling O3 character consistency |
xai/grok-imagine-video/text-to-video | Grok Video with audio |
xai/grok-imagine-video/image-to-video | Grok Video I2V |
Video Generation — January 2026
| Endpoint | Description |
|---|
fal-ai/vidu/q3/text-to-video | Vidu Q3 T2V |
fal-ai/vidu/q3/image-to-video | Vidu Q3 I2V |
fal-ai/pixverse/v5.6/text-to-video | Pixverse V5.6 T2V |
fal-ai/pixverse/v5.6/image-to-video | Pixverse V5.6 I2V |
fal-ai/ltx-2-19b/text-to-video | LTX-2 19B: Video + audio |
fal-ai/ltx-2-19b/image-to-video | LTX-2 19B I2V |
fal-ai/ltx-2-19b/distilled/text-to-video | LTX-2 Distilled: Fast |
fal-ai/longcat-multi-avatar/image-audio-to-video | LongCat: Lip-sync avatar |
Video Generation — December 2025
| Endpoint | Description |
|---|
fal-ai/veo3.1 | Veo 3.1: Google's best + sound |
fal-ai/veo3.1/fast | Veo 3.1 Fast |
fal-ai/veo3.1/image-to-video | Veo 3.1 I2V |
fal-ai/veo3.1/extend-video | Veo 3.1 Extend: Up to 30s |
fal-ai/hunyuan-video-v1.5/text-to-video | Hunyuan Video 1.5 T2V |
fal-ai/bytedance/seedance/v1.5/pro/text-to-video | Seedance 1.5 Pro |
fal-ai/kandinsky5-pro/text-to-video | Kandinsky 5 Pro |
fal-ai/live-avatar | Live Avatar: Real-time |
clarityai/crystal-video-upscaler | Crystal Video Upscaler |
fal-ai/creatify/aurora | Creatify Aurora: Studio avatars |
Audio — February 2026
| Endpoint | Description |
|---|
fal-ai/minimax/speech-2.8-hd | MiniMax 2.8 HD: Best TTS |
fal-ai/minimax/speech-2.8-turbo | MiniMax 2.8 Turbo: Fast TTS |
Audio — January 2026
| Endpoint | Description |
|---|
fal-ai/qwen-3-tts/text-to-speech/1.7b | Qwen-3 TTS 1.7B: Custom voices |
fal-ai/qwen-3-tts/text-to-speech/0.6b | Qwen-3 TTS 0.6B: Lightweight |
fal-ai/qwen-3-tts/clone-voice/1.7b | Qwen-3 Voice Clone: Zero-shot |
fal-ai/qwen-3-tts/clone-voice/0.6b | Qwen-3 Voice Clone Light |
fal-ai/qwen-3-tts/voice-design/1.7b | Qwen-3 Voice Design |
fal-ai/nemotron/asr | Nemotron ASR: Fast STT |
fal-ai/nemotron/asr/stream | Nemotron ASR Streaming |
fal-ai/elevenlabs/voice-changer | ElevenLabs Voice Changer |
fal-ai/elevenlabs/speech-to-text/scribe-v2 | ElevenLabs Scribe V2 |
fal-ai/deepfilternet3 | DeepFilterNet3: Noise removal |
Audio — December 2025
| Endpoint | Description |
|---|
fal-ai/sam-audio/separate | SAM Audio: Text-guided separation |
fal-ai/elevenlabs/music | ElevenLabs Music |
fal-ai/maya/batch | Maya: Expressive voice |
fal-ai/demucs | Demucs: SOTA stemming |
fal-ai/index-tts-2/text-to-speech | Index TTS 2.0 |
3D Generation — February 2026
| Endpoint | Description |
|---|
fal-ai/hunyuan-3d/v3.1/pro/text-to-3d | Hunyuan 3D V3.1 Pro: Text to 3D |
fal-ai/hunyuan-3d/v3.1/pro/image-to-3d | Hunyuan 3D V3.1 Pro: Image to 3D |
fal-ai/hunyuan-3d/v3.1/rapid/text-to-3d | Hunyuan 3D Rapid: Fast |
fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d | Hunyuan 3D Rapid I2-3D |
fal-ai/ultrashape | UltraShape: High-fidelity geometry |
3D Generation — December 2025
| Endpoint | Description |
|---|
fal-ai/trellis-2 | Trellis 2: Versatile 3D |
fal-ai/hunyuan3d-v3/text-to-3d | Hunyuan 3D V3 |
fal-ai/hunyuan-motion | Hunyuan Motion: 3D animation |
fal-ai/meshy/v6-preview/text-to-3d | Meshy V6 Preview |
OpenRouter Endpoints
Access 100+ LLMs via OpenRouter. Use --param model=<provider/model> to select the model.
Text (LLM)
| Endpoint | Description |
|---|
openrouter/router | Any LLM: GPT-5, Claude, Gemini, Llama 4, Mistral |
openrouter/router/stream | LLM with streaming |
openrouter/router/enterprise | Enterprise LLM (enhanced SLA) |
openrouter/router/enterprise/stream | Enterprise LLM streaming |
Vision (VLM)
| Endpoint | Description |
|---|
openrouter/router/vision | Any VLM: Image analysis with GPT-5, Gemini, Claude |
openrouter/router/vision/stream | Vision streaming |
openrouter/router/vision/enterprise | Enterprise vision |
openrouter/router/vision/enterprise/stream | Enterprise vision streaming |
Audio (ALM)
| Endpoint | Description |
|---|
openrouter/router/audio | Any ALM: Audio analysis with Gemini |
openrouter/router/audio/stream | Audio streaming |
openrouter/router/audio/enterprise | Enterprise audio |
openrouter/router/audio/enterprise/stream | Enterprise audio streaming |
Video (VLM)
| Endpoint | Description |
|---|
openrouter/router/video | Any Video LM: Video analysis with Gemini |
openrouter/router/video/stream | Video streaming |
openrouter/router/video/enterprise | Enterprise video |
openrouter/router/video/enterprise/stream | Enterprise video streaming |
OpenAI-Compatible
| Endpoint | Description |
|---|
openrouter/router/openai/v1/chat/completions | OpenAI Chat Completions API |
openrouter/router/openai/v1/responses | OpenAI Responses API |
openrouter/router/openai/v1/embeddings | OpenAI Embeddings API |
Model Selection Guide
| Use Case | Recommended |
|---|
| Best image | fal-ai/kling-image/o3/text-to-image |
| Fastest image | fal-ai/z-image/turbo |
| Photorealistic | fal-ai/flux-2/klein/9b |
| Image editing | fal-ai/qwen-image-max/edit |
| Best video | fal-ai/kling-video/v3/pro/text-to-video |
| Fastest video | fal-ai/ltx-2-19b/distilled/text-to-video |
| Video + audio | xai/grok-imagine-video/text-to-video |
| Animate image | fal-ai/kling-video/o3/pro/image-to-video |
| Best TTS | fal-ai/minimax/speech-2.8-hd |
| Voice clone | fal-ai/qwen-3-tts/clone-voice/1.7b |
| Transcription | fal-ai/nemotron/asr |
| 3D from text | fal-ai/hunyuan-3d/v3.1/pro/text-to-3d |
| 3D from image | fal-ai/hunyuan-3d/v3.1/rapid/image-to-3d |
| Any LLM | openrouter/router |
| Vision/Audio | openrouter/router/vision or /audio |