silicon-paddle-ocr

OCR skill using PaddleOCR model via SiliconFlow API. This skill should be used when the user asks to "recognize text from an image", "extract text from a photo", "OCR this image", "read text from screenshot", or mentions "PaddleOCR", "image text recognition", "text extraction from images".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "silicon-paddle-ocr" with this command: npx skills add aotenjou/silicon-paddleocr/aotenjou-silicon-paddleocr-silicon-paddle-ocr

OCR - Image Text Recognition

Use PaddleOCR to extract text content from images. Supports single image or batch processing.

Overview

This skill provides optical character recognition (OCR) capabilities using the PaddlePaddle/PaddleOCR-VL-1.5 model via the SiliconFlow API. Extract text from JPG, PNG, WebP, BMP, and GIF images.

When to Use

Invoke this skill when:

  • User wants to extract text from an image
  • User asks to OCR a screenshot or photo
  • User needs to read text from an image file
  • User mentions text recognition from images

How to Use

Prerequisites

Ensure the SILICONFLOW_API_KEY environment variable is set:

export SILICONFLOW_API_KEY="your_api_key"

Basic Usage

Execute the OCR script:

python3 scripts/ocr_skill.py [options] image_path

Arguments

ArgumentDescription
imagesImage file path(s) or glob pattern (required)
-k, --api-keyAPI key (default: from SILICONFLOW_API_KEY env)
-m, --modelOCR model name (default: PaddlePaddle/PaddleOCR-VL-1.5)
-p, --promptRecognition prompt for custom behavior
-j, --jsonOutput results in JSON format
-o, --outputSave results to specified file
--max-tokensMaximum tokens in response (default: 2000)

Examples

Single image:

python3 scripts/ocr_skill.py /path/to/image.jpg

Multiple images with glob:

python3 scripts/ocr_skill.py /path/to/images/*.png

JSON output format:

python3 scripts/ocr_skill.py --json /path/to/image.jpg

Custom prompt for table extraction:

python3 scripts/ocr_skill.py -p "Please identify and format table content as Markdown" /path/to/table.jpg

Save to file:

python3 scripts/ocr_skill.py --json --output results.json /path/to/images/*.jpg

Output Format

Text output (default):

--- image.jpg ---
识别到的文字内容
识别到 X 处文字区域

JSON output:

{
  "image.jpg": {
    "image_path": "/path/to/image.jpg",
    "image_size": [width, height],
    "texts": [
      {
        "text": "识别的文字",
        "box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
      }
    ],
    "full_text": "所有文本的组合"
  },
  "image2.png": { ... }
}

Coordinates Explanation:

  • LOC values are normalized coordinates converted to pixel coordinates
  • Conversion: pixel = LOC × (image_size / LOC_max_value)
  • LOC max_value is approximately 972 (may vary by model/image)
  • The box field provides the four corner coordinates of each text region in pixel format

Supported Image Formats

  • JPG/JPEG
  • PNG
  • WebP
  • BMP
  • GIF

Error Handling

If processing fails:

  • Check that the image file exists
  • Verify the SILICONFLOW_API_KEY is valid
  • Ensure the API endpoint is reachable

Images that fail to process will show an error message, and other images will continue processing.

Additional Resources

Reference Files

  • references/api-configuration.md - API configuration details

Example Files

  • examples/sample-usage.sh - Example usage script

Scripts

  • scripts/ocr_skill.py - The main OCR implementation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ll-feishu-audio

飞书语音交互技能。支持语音消息自动识别、AI 处理、语音回复全流程。需要配置 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量。使用 faster-whisper 进行语音识别,Edge TTS 进行语音合成,自动转换 OPUS 格式并通过飞书发送。适用于飞书平台的语音对话场景。

Archived SourceRecently Updated
General

test_skill

import json import tkinter as tk from tkinter import messagebox, simpledialog

Archived SourceRecently Updated
General

51mee-resume-profile

简历画像。触发场景:用户要求生成候选人画像;用户想了解候选人的多维度标签和能力评估。

Archived SourceRecently Updated
General

51mee-resume-parse

简历解析。触发场景:用户上传简历文件要求解析、提取结构化信息。

Archived SourceRecently Updated