omni-vu

omni.vu - Visual Understanding & Automation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "omni-vu" with this command: npx skills add eddiebe147/claude-settings/eddiebe147-claude-settings-omni-vu

omni.vu - Visual Understanding & Automation

Overview

omni.vu gives you eyes and hands on the user's macOS screen. Use it to:

  • See what the user sees (screen capture)

  • Understand UI state with AI vision

  • Detect changes and wait for events

  • Act with mouse and keyboard automation

When to Use This Skill

Proactively Use omni.vu When:

  • Debugging UI issues - "The button isn't working" → capture and analyze

  • Verifying changes - After modifying UI code, check if it rendered correctly

  • Waiting for operations - Build finishing, deployment completing, tests running

  • Understanding context - User describes something on screen you can't see

  • Automating repetitive tasks - Clicking through UI flows, filling forms

  • Documentation - Capturing screenshots of features

Do NOT Use When:

  • Reading/writing files (use Read/Write tools)

  • Running terminal commands (use Bash)

  • Making API calls (use appropriate tools)

  • User hasn't granted screen recording permission

Tool Reference

Capture Tools

vu

  • Full Screen Capture

vu(monitor=0, save_to_history=True)

Captures the entire screen. Returns base64 image.

Use when: You need to see everything on screen.

vu_window

  • Window Capture

vu_window(window_id=None, title="VS Code", include_frame=False)

Captures a specific window by ID or title (partial match).

Use when: You only need one application's content.

Workflow:

  • Call vu_list_windows to see available windows

  • Find the window_id or use title matching

  • Call vu_window with that ID/title

vu_region

  • Region Capture

vu_region(x=100, y=100, width=500, height=300)

Captures a specific rectangle of the screen.

Use when: You need a precise area (error message, specific component).

vu_list_windows

  • List Windows

vu_list_windows(filter_app="Chrome")

Lists all visible windows with metadata.

Returns: List of {window_id, title, owner, x, y, width, height}

vu_list_monitors

  • List Displays

vu_list_monitors()

Lists connected displays with resolution and scale factor.

Vision Tools

vu_describe

  • AI Vision Analysis

vu_describe( prompt="What errors are visible?", provider="claude", # or openai, gemini, ollama max_tokens=1024, capture_first=True )

Captures screen and analyzes with AI.

Best prompts:

  • "Describe what you see on screen"

  • "Are there any error messages visible?"

  • "What is the state of the build/test output?"

  • "Is the login form filled correctly?"

  • "What color is the status indicator?"

Providers:

Provider Speed Quality Cost

claude Medium Excellent $$

openai Fast Very Good $$

gemini Fast Good $

ollama Slow Varies Free

Utility Tools

vu_diff

  • Change Detection

vu_diff(threshold=0.02, monitor=0)

Detects if screen changed since last capture.

Returns: {changed: bool, diff_percentage: float}

Use for polling patterns:

Wait for build to finish

while True: result = vu_diff(threshold=0.05) if result["changed"]: # Screen changed, check what happened analysis = vu_describe(prompt="Did the build succeed or fail?") break # Wait before checking again

vu_history

  • Capture History

vu_history(limit=10, capture_type="full_screen")

Gets recent capture metadata.

vu_last

  • Last Capture

vu_last()

Returns the most recent capture with image data.

Use when: You need to re-analyze without re-capturing.

vu_status

  • System Status

vu_status()

Returns system info: monitors, providers, safety settings.

Automation Tools

vu_click

  • Mouse Click

vu_click(x=500, y=300, button="left", count=1)

Clicks at screen coordinates.

Buttons: left , right , middle

Count: 1 (single), 2 (double), 3 (triple)

Safety: Coordinates validated against screen bounds.

vu_move

  • Move Cursor

vu_move(x=500, y=300, duration_ms=0)

Moves cursor to position. Use duration_ms for animated movement.

vu_drag

  • Drag Operation

vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)

Drags from start to end position.

Safety: Maximum 2000px drag distance.

vu_scroll

  • Scroll

vu_scroll(direction="down", amount=3, x=None, y=None)

Scrolls at current or specified position.

Directions: up , down , left , right

Amount: Lines to scroll (1-20)

vu_type

  • Type Text

vu_type(text="Hello World", delay_between_ms=0)

Types text character by character.

Note: Click on target field first!

vu_hotkey

  • Keyboard Shortcut

vu_hotkey(keys="cmd+s")

Executes keyboard shortcuts.

Format: modifier+key (e.g., cmd+c , ctrl+shift+s , cmd+opt+i ) Modifiers: cmd , ctrl , alt /opt , shift

Blocked hotkeys (for safety):

  • cmd+q (quit)

  • cmd+shift+q (logout)

  • cmd+opt+esc (force quit)

Common Workflows

  1. Debug UI Issue

User: "The submit button doesn't do anything when I click it"

  1. vu_describe(prompt="Describe the form and submit button state")

  2. Analyze: Is button disabled? Is there validation error?

  3. If needed: vu_window(title="Chrome") for focused capture

  4. Check console: vu_describe(prompt="Are there any errors in the developer console?")

  5. Verify Build/Deploy

User: "Run the build and let me know when it's done"

  1. Run: npm run build (via Bash)

  2. Loop: vu_diff(threshold=0.05)

  3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")

  4. Report result

  5. Fill a Form

User: "Fill out the registration form with test data"

  1. vu_describe(prompt="What form fields are visible?")

  2. vu_click(x=field_x, y=field_y) # Click first field

  3. vu_type(text="testuser@example.com")

  4. vu_hotkey(keys="tab") # Move to next field

  5. vu_type(text="Test User")

  6. Continue for each field...

  7. vu_click(x=submit_x, y=submit_y) # Submit

  8. vu_describe(prompt="Was the form submitted successfully?")

  9. Navigate UI

User: "Open the settings panel in VS Code"

  1. vu_list_windows(filter_app="Code")

  2. vu_window(title="Code") # Capture VS Code

  3. vu_hotkey(keys="cmd+,") # Open settings

  4. vu_diff() # Wait for settings to open

  5. vu_describe(prompt="Are the VS Code settings now visible?")

  6. Monitor for Changes

User: "Watch the dashboard and tell me when the status changes"

  1. vu() # Initial capture
  2. Loop every 5 seconds:
    • vu_diff(threshold=0.02)
    • If changed: vu_describe(prompt="What changed on the dashboard?")
  3. Report changes to user

Best Practices

  1. Capture Before Acting

Always capture and analyze before clicking/typing:

❌ Bad: vu_click(x=500, y=300) # Hope it's the right spot

✅ Good:

  1. vu_describe(prompt="Where is the submit button?")

  2. Parse coordinates from description

  3. vu_click(x=parsed_x, y=parsed_y)

  4. Use Appropriate Capture Scope

Full screen → General context, multi-app workflows Window → Single app focus, cleaner output Region → Specific element, faster/smaller

  1. Specific Vision Prompts

❌ Vague: "What do you see?"

✅ Specific:

  • "Is there an error message visible? If so, what does it say?"
  • "What is the status of the test runner? Passing, failing, or running?"
  • "List all form fields visible and their current values"
  1. Rate Limit Automation

Don't spam clicks. Wait between actions:

vu_click(x=100, y=100)

Wait for UI response

vu_diff(threshold=0.01) vu_click(x=200, y=200)

  1. Verify After Actions

Always verify automation succeeded:

vu_type(text="hello@example.com") vu_describe(prompt="What text is now in the email field?")

Troubleshooting

"Permission denied" or blank captures

→ Grant Screen Recording permission: System Settings > Privacy > Screen Recording

Automation not working

→ Grant Accessibility permission: System Settings > Privacy > Accessibility

Vision analysis failing

→ Check API keys in environment variables → Try different provider: vu_describe(provider="openai")

Coordinates seem off

→ Retina display? Coordinates are in logical points, not pixels → Multi-monitor? Check vu_list_monitors() for display arrangement

Hotkey blocked

→ Some dangerous hotkeys (cmd+q) are blocked for safety → Unblock in config if absolutely necessary

Configuration

Environment variables:

OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider OMNI_VU_SAFETY_LEVEL=medium # low/medium/high

History stored at: ~/.omni.vu/captures/

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

chatbot designer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent workflow builder

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

git-workflow-designer

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

llc-ops

No summary provided by upstream source.

Repository SourceNeeds Review