Veo 3.1 Video Prompter
Transform ideas into professional Veo 3.1 prompts using cinematic structure, audio direction, and multi-shot choreography.
When to Use
Invoke when user:
-
Says "create a video prompt" or "generate a Veo prompt"
-
Wants to "make a video of..." or "animate this..."
-
Asks for help with "video generation" or "AI video"
-
Needs "Veo 3" or "Veo 3.1" prompt assistance
-
Wants to create "multi-shot" or "cinematic" video sequences
Core Prompt Formula
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Audio]
Every prompt should address these five elements for maximum control.
Prompt Density: Finding the Sweet Spot
Prompts fail in two directions:
-
Too sparse: Model fills gaps unpredictably, you lose creative control
-
Too dense: Model can't execute all instructions, produces confused output
The Priority Framework
Tier 1 - MUST INCLUDE (model needs these):
-
Shot size (wide/medium/close-up)
-
Subject identity (who/what is in frame)
-
Primary action (what happens)
-
One dominant mood/style word
Tier 2 - SHOULD INCLUDE (significant impact):
-
Camera movement OR angle (pick one, not both)
-
Lighting quality (natural/dramatic/soft)
-
One audio layer (dialogue OR SFX OR ambient)
-
Setting/environment
Tier 3 - NICE TO HAVE (diminishing returns):
-
Secondary audio layers
-
Specific lens type
-
Color palette details
-
Film stock/grain texture
-
Background action
Rule of thumb: Include all Tier 1, most of Tier 2, and 1-2 from Tier 3.
Density Comparison
TOO SPARSE (model guesses too much):
"A professor talking about philosophy"
TOO DENSE (model overloaded):
"Medium close-up shot at eye level with a 50mm lens at f/1.8 creating shallow depth of field with bokeh highlights, of a 52-year-old female professor with silver-streaked auburn hair pulled back in a loose bun, wearing an olive tweed jacket with leather elbow patches over a cream silk blouse with a small pearl brooch, standing in a contemporary lecture hall with tiered mahogany seating and brass fixtures visible in the soft background, natural diffused daylight streaming through floor-to-ceiling windows on the left side creating soft rembrandt lighting on her face with a gentle fill from reflected light on the right..."
OPTIMAL (directed but breathable):
"Medium close-up of a professor in her 50s, tweed jacket, standing in a university lecture hall. She gestures while speaking: 'Kant asked one question: could everyone do this?' Warm natural window light from left, soft academic atmosphere. SFX: marker on whiteboard."
Calibration Signals
Signs your prompt is too sparse:
-
Results vary wildly between generations
-
Key elements missing or wrong
-
Mood/tone inconsistent with intent
Signs your prompt is too dense:
-
Model ignores some instructions entirely
-
Unnatural or frozen-looking motion
-
Conflicting elements appear (e.g., both day and night)
-
Audio doesn't match visual action
Iteration Strategy
-
Start with Tier 1 only - generate test
-
Add Tier 2 elements that matter most to your vision
-
Add ONE Tier 3 detail if something specific is missing
-
Remove any element the model consistently ignores
See references/prompt-calibration.md for detailed examples and troubleshooting.
Cinematography Elements
Shot Composition
-
Wide shot, medium shot, close-up, extreme close-up
-
Single shot, two shot, over-the-shoulder shot
-
High angle, low angle, eye level, worm's eye, bird's eye
Camera Movement
-
Dolly (in/out), tracking shot, crane shot
-
Pan (left/right), tilt (up/down), zoom
-
Steadicam, handheld, aerial, POV
Lens & Focus
-
Shallow depth of field, deep focus
-
Wide-angle lens, telephoto, macro lens
-
Soft focus, rack focus, bokeh
Audio Direction
Veo 3.1 generates synchronized sound. Direct it explicitly:
Dialogue (use quotes):
"A man says, 'The storm is coming.'"
Sound Effects (label with SFX):
"SFX: Thunder rumbles in the distance, rain patters on glass"
Ambient Noise:
"Ambient noise: busy café chatter, clinking cups, soft jazz"
Music:
"A swelling orchestral score begins to play"
Timestamp Prompting
For multi-shot sequences within one generation (max 8 seconds):
[00:00-00:02] Medium shot of a detective at his desk, lighting a cigarette. SFX: Match strike, paper rustling.
[00:02-00:04] Close-up of his eyes narrowing as he reads a letter. Ambient: Rain against the window.
[00:04-00:06] Reverse shot of a shadowy figure in the doorway. A woman's voice: "You shouldn't have looked."
[00:06-00:08] Wide shot as the detective stands, reaching for his gun. SFX: Chair scraping, thunder crack.
Style Keywords
Visual Aesthetic:
-
Photorealistic, cinematic, documentary, animation
-
Retro (sepia, grainy film, 1980s vaporwave)
-
Noir, epic fantasy, sci-fi, romantic, horror
Mood & Lighting:
-
Warm golden hour, cool blue tones, moody shadows
-
Harsh fluorescent, soft morning light, dramatic chiaroscuro
-
Neon-lit, candlelit, overcast diffused
Film Grain Tip:
Add "slightly grainy, film-like" to avoid overly clean AI look
Output Formats
Quick Prompt: Single sentence for simple shots Structured Prompt: Multi-line with all five elements Timestamp Sequence: Choreographed multi-shot within 8s Storyboard Mode: Multiple prompts for full narrative
Example Prompts
Action Shot:
"Tracking shot following a parkour athlete sprinting across rooftops at sunset, warm orange light, urban cityscape background, cinematic, shallow depth of field. SFX: footsteps on concrete, wind rushing past."
Dialogue Scene:
"Medium two-shot in a dimly lit bar, a woman in red leans toward a man in a suit. She says quietly, 'I know what you did.' Ambient: jazz music, glasses clinking. Moody noir aesthetic, warm tungsten lighting."
Nature Documentary:
"Slow-motion close-up of a hummingbird drinking from a flower, macro lens with shallow focus, lush green garden background, soft morning light. SFX: gentle buzzing, birdsong."
Technical Specs
-
Duration: 4, 6, or 8 seconds
-
Resolution: 720p or 1080p
-
Aspect Ratio: 16:9 (landscape) or 9:16 (portrait)
-
Frame Rate: Configurable (default: 24 FPS)
Advanced API Options
When using Veo through API (not Flow), these additional parameters are available:
Parameter Description Default
negativePrompt
Elements to exclude from the video
seed
RNG seed for reproducible results (same prompt + seed = same video) Random
enhancePrompt
Let the model rewrite your prompt for better results false
generateAudio
Generate synchronized audio true
personGeneration
Control person generation: dont_allow or allow_adult
referenceImages
Up to 3 asset images OR 1 style image for consistency
Negative Prompts
Explicitly exclude unwanted elements:
"A forest at sunset" + negativePrompt: "people, animals, buildings"
Seed for Consistency
Use the same seed to reproduce similar results:
First generation: seed=12345 → video A Same prompt + seed=12345 → nearly identical video
Useful for:
-
Iterating on a specific "look"
-
Creating variations with controlled changes
-
A/B testing different prompts
Reference Images
Maintain visual consistency across shots using reference images:
Asset References (up to 3):
-
Character appearances
-
Locations/settings
-
Props or products
Style References (1):
-
Overall aesthetic
-
Color palette
-
Visual treatment
References
-
references/prompt-calibration.md
-
Finding the right detail level
-
references/cinematography-glossary.md
-
Full camera terms
-
references/prompt-examples.md
-
20+ categorized examples
-
references/advanced-workflows.md
-
Image-to-video, first/last frame