claw-text-and-pics

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten notes, or any image or PDF containing text.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "claw-text-and-pics" with this command: npx skills add photon78/claw-text-and-pics

claw-text-and-pics

Extract text and images from documents via Mistral OCR

Give your OpenClaw agent the ability to read scanned documents, PDFs, and images — extracting clean Markdown text and cropping out embedded images. Powered by Mistral's OCR API.

When to use

  • Extract text from scanned documents, invoices, receipts, contracts
  • Pull embedded images from PDFs or scans
  • Convert handwritten notes or photos to searchable text
  • Send extracted images directly to Telegram

Usage

# Extract text only
python3 ocr.py --input scan.jpg

# Extract text from PDF (3 pages)
python3 ocr.py --input document.pdf --pages 3

# Extract embedded images
python3 ocr.py --input scan.jpg --extract-images --output-dir ./images/

# Extract images and send to Telegram
python3 ocr.py --input scan.jpg --extract-images --send --target 123456789

# Works with URLs too
python3 ocr.py --input https://example.com/document.pdf

Output

  • stdout: Extracted text as Markdown
  • Files: Cropped images saved to --output-dir (only with --extract-images)

Configuration

Set in ~/.openclaw/.env or as environment variables:

VariableRequiredDescription
MISTRAL_API_KEYYesYour Mistral API key
TELEGRAM_BOT_TOKENOnly for --sendYour Telegram bot token
TELEGRAM_CHAT_IDOptionalDefault chat ID (overridable with --target)

Environment Variables

MISTRAL_API_KEY=required        # Mistral API key — get one at console.mistral.ai
TELEGRAM_BOT_TOKEN=optional     # Required only when using --send
TELEGRAM_CHAT_ID=optional       # Default target chat ID (overridable with --target)

This skill reads ~/.openclaw/.env as a fallback for credentials. Ensure the file has restricted permissions: chmod 600 ~/.openclaw/.env

Requirements

  • Python 3.11+
  • Mistral API key (console.mistral.ai)
  • Optional (only for --extract-images): pip install pillow

Parameters

ParameterRequiredDescription
--inputYesLocal path or URL to image/PDF
--extract-imagesNoCrop and save embedded images
--output-dirNoOutput directory (default: ./extracted-images)
--sendNoSend extracted images via Telegram
--targetNoTelegram chat ID (or TELEGRAM_CHAT_ID env var)
--pagesNoNumber of PDF pages to process
--debugNoPrint raw API response

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Url Images To Pdf

从URL提取图片并生成PDF(保持原文顺序,不排序)

Registry SourceRecently Updated
4990Profile unavailable
Automation

PDFExtract Pull Text from PDFs

Extract clean readable text from PDF files into agent-ready markdown. Multi-page, tables, headers. No external services.

Registry Source
3570Profile unavailable
General

File to Markdown Converter

Convert documents, spreadsheets, images, and structured files into clean, structured Markdown optimized for AI processing without authentication.

Registry SourceRecently Updated
6610Profile unavailable
General

WPS PDF Processing

当用户需要对 PDF 文件进行任何操作时,使用本技能。包括:读取或提取 PDF 中的文字/表格、合并多个 PDF、拆分 PDF、旋转页面、添加水印、创建新 PDF、填写 PDF 表单、加密/解密 PDF、提取图片,以及对扫描版 PDF 进行 OCR 识别使其可搜索。只要用户提到 .pdf 文件或希望生成 PDF,...

Registry SourceRecently Updated
600Profile unavailable