gui-agent

GUI automation via visual detection. Clicking, typing, reading content, navigating menus, filling forms — all through screenshot → detect → act workflow. Supports macOS and Linux.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gui-agent" with this command: npx skills add alfredjamesli/gui-claw

GUI Agent

STEP 0: Activate Platform (MANDATORY FIRST STEP)

Before any GUI operation, run:

python3 {baseDir}/scripts/activate.py

This detects your OS, sets up the correct action commands, and outputs platform context. After running, {baseDir}/actions/_actions.yaml contains your platform's commands.

Workflow

OBSERVE → LEARN → ACT → VERIFY → SAVE
  1. OBSERVE — Take screenshot → run OCR + detector → understand current state → read {baseDir}/skills/gui-observe/SKILL.md

  2. LEARN — First time with an app? Save components to memory → read {baseDir}/skills/gui-learn/SKILL.mdlearn_from_screenshot() auto-outputs app tips if available

  3. ACT — Pick target → execute using _actions.yaml commands → verify → read {baseDir}/skills/gui-act/SKILL.mdread {baseDir}/actions/_actions.yaml for available commands

  4. VERIFY — Screenshot again → confirm action succeeded

  5. SAVE — Record state transitions to memory → read {baseDir}/skills/gui-memory/SKILL.md for memory structure

Core Rules

  • Coordinates from detection only — OCR or GPA-GUI-Detector, NEVER from guessing
  • Look before you act — every action must be justified by what you observed
  • image tool = understanding only — use it to decide WHAT to click, get WHERE from OCR/detector

Sub-Skills Reference

Sub-SkillWhen to read
skills/gui-observe/SKILL.mdBefore screenshots or detection
skills/gui-learn/SKILL.mdBefore learning a new app
skills/gui-act/SKILL.mdBefore any click/type action
skills/gui-memory/SKILL.mdFor memory structure details
skills/gui-workflow/SKILL.mdFor multi-step navigation
skills/gui-setup/SKILL.mdFor first-time machine setup
skills/gui-report/SKILL.mdFor task performance reporting

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Notebooklm Skill Factory

Orchestrate NotebookLM research into SKILL.md generation and Claude Code validation in a single automated pipeline. Use when user asks to create a new Claude...

Registry SourceRecently Updated
Coding

mobile-app-developer

Expert mobile app developer specializing in native and cross-platform development for iOS and Android. Masters performance optimization, platform guidelines,...

Registry SourceRecently Updated
Coding

mobile-developer

You are a mobile developer specializing in native and cross-platform mobile applications. Use when: ios development, android development, cross-platform fram...

Registry SourceRecently Updated
Coding

Video Editor Eddie

Turn a 2-minute unedited screen recording into 1080p polished edited clips just by typing what you need. Whether it's automatically editing raw footage into...

Registry SourceRecently Updated