smolvlm

SmolVLM - Local Image Analysis

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "smolvlm" with this command: npx skills add tdimino/claude-code-minoan/tdimino-claude-code-minoan-smolvlm

SmolVLM - Local Image Analysis

Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm.

Quick Usage

Describe an Image

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png

Ask a Question About an Image

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"

Specific Tasks

Extract text (OCR)

python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"

UI analysis

python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"

Detailed description

python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed

Effective Prompts

General Description

  • "Describe this image"

  • Basic description

  • "Describe this image in detail, including colors, composition, and any text"

  • Comprehensive

Text Extraction (OCR)

  • "Extract all visible text from this image"

  • "What text appears in this screenshot?"

  • "Read the text in this document"

UI/Screenshot Analysis

  • "Describe the user interface elements"

  • "What buttons and controls are visible?"

  • "Identify the application and its current state"

Visual Question Answering

  • "How many [objects] are in this image?"

  • "What color is the [object]?"

  • "Is there a [object] in this image?"

Code/Technical

  • "What programming language is shown?"

  • "Describe what this code does"

  • "Identify any errors in this code screenshot"

Model Details

Spec Value

Model SmolVLM-2B-Instruct

Size ~4GB

Peak Memory 5.8GB

Speed ~94 tok/s (M-series)

Supported Formats PNG, JPG, JPEG, GIF, WebP

Requirements

  • macOS with Apple Silicon (M1/M2/M3)

  • Python 3.10+

  • mlx-vlm package: uv pip install mlx-vlm --system

Troubleshooting

"Model not found": First run downloads the model (~4GB). Wait for completion.

Out of memory: Close other applications. Model needs ~6GB free RAM.

Slow first inference: Model loading takes 10-15s on first use, subsequent calls are faster.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

travel-requirements-expert

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

twilio-api

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

twitter

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

figma-mcp

No summary provided by upstream source.

Repository SourceNeeds Review