read-source

Extract text from source documents (PDF, DOCX, PPTX, HTML, Markdown) for spreadsheet workflows. Use to understand source material before populating workbooks.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "read-source" with this command: npx skills add witanlabs/witan-cli/witanlabs-witan-cli-read-source

When to Use

Use witan read to convert source documents into LLM-ready text. This is for source material — PDFs, Word docs, presentations, HTML pages, and Markdown files that contain data you need to extract.

  • PDF → plain text
  • Word (.doc, .docx) → markdown
  • PowerPoint (.ppt, .pptx) → markdown
  • HTML → markdown
  • Markdown (.md) → outline support via --outline

This is not for reading spreadsheet data (.xlsx, .xls) — use spreadsheet-specific tools for that.

Setup

Files are cached server-side by content hash so repeated operations skip re-upload. If WITAN_STATELESS=1 is set (or --stateless is passed), files are processed but not stored.

The CLI automatically applies per-attempt request timeouts and retries transient API failures (408, 429, 500, 502, 503, 504, plus timeout/network errors). Non-retryable 4xx responses fail immediately.

Quick Reference

# Get document structure first
witan read report.pdf --outline
witan read slides.pptx --outline

# Read specific sections
witan read report.pdf --pages 1-5
witan read slides.pptx --slides 1-3
witan read notes.docx --offset 50 --limit 100

# Read from URLs
witan read https://example.com/report.pdf --outline
witan read https://example.com/data.csv

# JSON output for automation
witan read report.pdf --json
witan read report.pdf --outline --json

Exit Codes

CodeMeaning
0Success
1Error (bad arguments, network failure, unsupported format)

Navigation Strategy

Go directly with --pages, --slides, or --offset/--limit when you know where to look. Use --outline when you don't — it gives document structure to target the right section.

PDF workflow:

  1. witan read report.pdf --outline → see chapter/section structure with page ranges
  2. witan read report.pdf --pages 12-15 → read the section you need

PPTX workflow:

  1. witan read deck.pptx --outline → see slide titles
  2. witan read deck.pptx --slides 5-8 → read specific slides

Text/DOCX workflow:

  1. witan read notes.docx --outline → see heading structure with line offsets
  2. witan read notes.docx --offset 120 --limit 50 → read a section

Command Reference

witan read <file-or-url> [flags]
FlagDefaultDescription
--pagesPDF page range (e.g. 1-5, 1,3,5, 1-5,10-15)
--slidesPresentation slide range (e.g. 1-3)
--offset1Start line (1-indexed)
--limit2000Maximum lines to return
--outlinefalseShow document structure instead of content
--jsonfalseOutput full JSON response

Pagination Limits

ConstraintValue
Max PDF pages per read10
Max PPTX slides per read10
Default line limit2000
Max file size25 MB

Pipeline: Source → Spreadsheet

The typical flow for reading source material and populating a spreadsheet:

  1. Explorewitan read source.pdf --outline to understand structure
  2. Readwitan read source.pdf --pages 3-8 to get the data
  3. Parse — extract values from the text (LLM or regex)
  4. Writewitan xlsx exec model.xlsx --input-json '...' to populate the spreadsheet

Output Format

Content mode (default): line-numbered text to stdout, metadata to stderr.

     1	Revenue Summary
     2
     3	Q1: $1,250,000
     4	Q2: $1,380,000
text/plain  [15 pages, 10 read, 847 lines total, showing 1–847]

Outline mode (--outline): indented structure to stdout.

Introduction  [pages 1-2]
  Background  [pages 1-1]
  Methodology  [pages 2-2]
Results  [pages 3-8]
  Financial Summary  [pages 3-5]
  Projections  [pages 6-8]
Appendix  [pages 9-15]
[15 pages]

Error Guide

ErrorFix
cannot access fileCheck file path exists and is readable
downloading URL: HTTP 4xx/5xxCheck the URL is accessible
payload_too_largeFile exceeds 25 MB limit
missing_content_typeSet Content-Type header (API only)
Empty outlineDocument has no bookmarks/headings; use offset/limit to navigate
Truncated textUse --pages, --slides, or increase --limit

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

xlsx-code-mode

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

xlsx-verify

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

openclaw-version-monitor

监控 OpenClaw GitHub 版本更新,获取最新版本发布说明,翻译成中文, 并推送到 Telegram 和 Feishu。用于:(1) 定时检查版本更新 (2) 推送版本更新通知 (3) 生成中文版发布说明

Archived SourceRecently Updated
Coding

ask-claude

Delegate a task to Claude Code CLI and immediately report the result back in chat. Supports persistent sessions with full context memory. Safe execution: no data exfiltration, no external calls, file operations confined to workspace. Use when the user asks to run Claude, delegate a coding task, continue a previous Claude session, or any task benefiting from Claude Code's tools (file editing, code analysis, bash, etc.).

Archived SourceRecently Updated