mano-cua

Computer use for GUI automation tasks via VLA models. Use when the user describes a task in natural language that requires visual screen interaction and no API or CLI exists for the target app.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mano-cua" with this command: npx skills add hanningwang/mano-cua

mano-cua

Desktop GUI automation driven by natural language. Captures screenshots, sends them to a cloud-based hybrid vision model, and executes the returned actions on the local machine — click, type, scroll, drag, and more.

Requirements

  • A system with a graphical desktop (macOS / Windows / Linux)
  • mano-cua binary installed

Installation

macOS / Linux (Homebrew):

brew install Mininglamp-AI/tap/mano-cua

Windows:

Download the latest mano-cua-windows.zip from GitHub Releases, extract it, and add the folder to your PATH.

Usage

# Run a task (cloud mode, default)
mano-cua run "your task description"

# Run with options
mano-cua run "task" --minimize --max-steps 10

# Open a URL in the browser before starting the task
mano-cua run "task" --url "https://example.com"

# Open an app before starting the task (use the macOS app name, e.g. 'Notes', 'Safari', 'Google Chrome')
mano-cua run "task" --app "Notes"

# Run in local mode (on-device inference, macOS Apple Silicon only)
mano-cua run "task" --local

# Stop the current running task
mano-cua stop

Run mano-cua --help or mano-cua <command> --help for full flags and options.

Note: Only one task can run at a time per device. If you need to start a new task, first stop the current one with mano-cua stop.

--app vs --url: Use one or the other, not both. --app launches a desktop application by its macOS name (as shown in Spotlight search). --url opens a URL in the default browser. Both bring the target to the foreground before the agent starts.

Tip for local mode: Write task descriptions with explicit step-by-step instructions for best results. For example, instead of "search for iphone on Xiaohongshu", write "click the search box at the top, type iphone, click the search button, then click the first result". Explicit steps significantly improve local model accuracy.

Local Mode

Runs Mano-P entirely on-device via MLX. No data leaves the machine. Requires macOS with Apple Silicon (M1+).

Setup:

mano-cua check
mano-cua install-sdk
mano-cua install-model

Run:

mano-cua run "click the search box, type openai, click search, click the first result to open OpenAI homepage" --local --url "https://www.google.com"
mano-cua run "click the search box, type iphone, click the search button, open the first post" --local --url "https://www.xiaohongshu.com" --minimize --max-steps 15
mano-cua run "create a new note and type hello world" --local --app "Notes"

Examples

# Cloud mode (default — no setup needed)
mano-cua run "Open WeChat and tell FTY that the meeting is postponed"
mano-cua run "Search for AI news in Xiaohongshu and show the first post" --minimize --max-steps 20

# Cloud mode with --app or --url
mano-cua run "Create a calendar event for Friday 20:00 named Team Meeting" --app "Microsoft Outlook"
mano-cua run "Compare available plans for the AeroAPI" --url "https://www.flightaware.com/"

# Local mode — use explicit step-by-step task descriptions for best accuracy
mano-cua run "click the editor area, select all and delete, type hello world" --local --url "https://mano.mininglamp.com/md2wechat/golden.html"
mano-cua run "click the search box, type openai, click search, click the first result" --local --url "https://www.google.com" --minimize
mano-cua run "create a new note and type hello world" --local --app "Notes"

# Stop the current task (use before starting a new one)
mano-cua stop

How It Works

The current screenshot is captured and sent to the cloud at each step. A hybrid vision solution decides the next action:

  • Mano model — handles straightforward, lightweight tasks with rapid output.
  • Claude CUA model — handles complex tasks requiring deeper reasoning.

The system automatically selects the appropriate model based on task complexity.

In local mode (--local), a local Mano-P model runs on-device via MLX. No network calls for inference.

Supported Interactions

click · type · hotkey · scroll · drag · mouse move · screenshot · wait · app launch · url direction

Status Panel

A small UI panel is displayed on the top-right corner of the screen to track and manage the current session status.

Data, Privacy & Safety

  • What is sent: Screenshots of the primary display and the task description are sent to mano.mininglamp.com — these are the minimal inputs required for the vision model to determine the next action.
  • What is NOT sent: No local files, clipboard content, or system credentials are read or transmitted. All network calls are in a single module (task_model.py) for easy review.
  • Local mode: All inference runs on-device using Mano-P (model weights). No data leaves the machine.
  • Authentication: No API key or credentials are required. The client identifies itself with a locally generated device ID (~/.myapp_device_id) — no secrets are embedded in the binary.
  • Supply chain: The full client is open source. The Homebrew formula builds directly from this public source, ensuring the installed binary is fully auditable.
  • User control: Users can stop any session at any time via the UI panel or mano-cua stop.

Important Notes

  • Do not use the mouse or keyboard during the task. Manual input while mano-cua is running may cause unexpected behavior.
  • Multiple displays: only the primary display is used. All mouse movements, clicks, and screenshots are restricted to that display.

Platform Support

macOS is the preferred and most tested platform. Adaptations for Windows and Linux are not yet fully completed — minor issues are expected.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Personal Health Router

Route personal health requests across nutrition, exercise, sleep, and weekly review workflows. Use when the user asks to log calories, analyze a meal photo,...

Registry SourceRecently Updated
Automation

Agent Memory System v8

生产级 Agent 记忆系统 — 6维坐标编码 + RRF双路检索 + sqlite-vec统一存储 + 写入时因果检测 + 多Agent共享 + 记忆蒸馏 + 时间旅行 + 情感编码 + 元认知 + 内在动机 + 叙事自我 + 数字孪生 + 角色模板

Registry SourceRecently Updated
Automation

Web Gateway

Minimal Flask-based multi-user chat interface enabling OpenClaw HTTP integration with persistent UI state and optional Google Maps support.

Registry SourceRecently Updated
Automation

Futu Trading Bot

Use Futu Trade Bot Skills to run account, quote, and trade workflows with real HK market data.

Registry SourceRecently Updated