agent-media

Agent-first media toolkit for image, video, and audio processing. Use when you need to resize, convert, generate, edit, upscale images, remove backgrounds, extend or crop canvases, extract audio, transcribe speech, or generate videos. All commands return deterministic JSON output.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-media" with this command: npx skills add agntswrm/agent-media/agntswrm-agent-media-agent-media

Agent Media

Agent Media is an agent-first media toolkit that provides CLI-accessible commands for image, video, and audio processing. All commands produce deterministic, machine-readable JSON output.

Available Commands

Image Commands

  • npx agent-media@latest image resize - Resize an image
  • npx agent-media@latest image convert - Convert image format
  • npx agent-media@latest image generate - Generate image from text
  • npx agent-media@latest image edit - Edit one or more images with text prompt
  • npx agent-media@latest image remove-background - Remove image background
  • npx agent-media@latest image upscale - Upscale image with AI super-resolution
  • npx agent-media@latest image extend - Extend image canvas with padding
  • npx agent-media@latest image crop - Crop image to dimensions around focal point

Audio Commands

  • npx agent-media@latest audio extract - Extract audio from video
  • npx agent-media@latest audio transcribe - Transcribe audio to text

Video Commands

  • npx agent-media@latest video generate - Generate video from text or image

Output Format

All commands return JSON to stdout:

{
  "ok": true,
  "media_type": "image",
  "action": "resize",
  "provider": "local",
  "output_path": "output_123.webp",
  "mime": "image/webp",
  "bytes": 12345
}

On error:

{
  "ok": false,
  "error": {
    "code": "INVALID_INPUT",
    "message": "input file not found"
  }
}

Providers

  • local - Default provider using Sharp (resize, convert, extend, crop) and Transformers.js (remove-background, upscale, transcribe)
  • fal - fal.ai provider (generate, edit, remove-background, upscale, transcribe, video)
  • replicate - Replicate API (generate, edit, remove-background, upscale, transcribe, video)
  • runpod - Runpod API (generate, edit, video)
  • ai-gateway - Vercel AI Gateway (generate, edit)

Provider Selection

  1. Explicit: --provider <name>
  2. Auto-detect from environment variables
  3. Fallback to local provider

Environment Variables

  • AGENT_MEDIA_DIR - Custom output directory
  • FAL_API_KEY - Enable fal provider
  • REPLICATE_API_TOKEN - Enable replicate provider
  • RUNPOD_API_KEY - Enable runpod provider
  • AI_GATEWAY_API_KEY - Enable ai-gateway provider

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

image-remove-background

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

audio-transcribe

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

video-generate

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

image-edit

No summary provided by upstream source.

Repository SourceNeeds Review