media-understand

AI-powered media understanding and analysis for images, videos, and audio. Use when users ask to describe, analyze, summarize, or extract text (OCR) from media files.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "media-understand" with this command: npx skills add maxgent-ai/maxgent-plugin/maxgent-ai-maxgent-plugin-media-understand

Media Understanding

Analyze multimedia content via Maxgent FAL API proxy, using the default route.

Supported Formats

TypeFormatsMax Size
Imagejpg, jpeg, png, gif, webp20MB
Videomp4, mpeg, mov, webm, YouTube URL100MB
Audiowav, mp3, aiff, aac, ogg, flac, m4a100MB

Prerequisites

  1. MAX_API_KEY environment variable (auto-injected by Max)
  2. Bun 1.0+ (built into Max)

Routing

  1. default
    • Endpoint: openrouter/router/openai/v1/chat/completions
    • Model: DEFAULT_MM_MODEL, defaults to google/gemini-2.5-pro (override with --model)

Usage

bun skills/media-understand/media-understand.js \
  --media PATH_OR_URL --prompt "PROMPT" \
  [--language chinese|english] [--model MODEL_ID] \
  [--max-tokens N] [--temperature X]

Parameters:

  • --media: local file path or YouTube URL
  • --prompt: analysis question
  • --language: chinese (default) or english
  • --model: override the default model
  • --max-tokens: max output tokens (default 4096)
  • --temperature: sampling temperature (default 0.2)

Examples

# Image OCR
bun skills/media-understand/media-understand.js --media ./screenshot.png --prompt "extract all text from this image" --language english

# Video summary (YouTube)
bun skills/media-understand/media-understand.js --media "https://youtube.com/watch?v=xxx" --prompt "summarize this video" --language english

# Local audio analysis
bun skills/media-understand/media-understand.js --media ./meeting.m4a --prompt "summarize key points and list action items" --language english

Instructions

  1. Check MAX_API_KEY.
  2. Identify media type and validate size limits.
  3. Analyze using the default route; override the model with --model if needed.
  4. Local images/videos/audio are auto-uploaded via FAL upload proxy before analysis.
  5. On success, return readable text.
  6. On failure:
    • HTTP 402 (insufficient credits): Stop immediately. Do NOT retry. Tell the user their API credits are exhausted.
    • Other errors: retry once with a different model. If it fails again, stop and clearly indicate whether it's an upload / proxy / model parameter issue.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

audio-transcribe

No summary provided by upstream source.

Repository SourceNeeds Review
General

youtube-download

No summary provided by upstream source.

Repository SourceNeeds Review
General

image-gen

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-gen

No summary provided by upstream source.

Repository SourceNeeds Review