boof

Convert PDFs and documents to markdown, index them locally for RAG retrieval, and analyze them token-efficiently. Use when asked to: read/analyze/summarize a PDF, process a document, boof a file, extract information from papers/decks/NOFOs, or when you need to work with large documents without filling the context window. Supports batch processing and cross-document queries.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "boof" with this command: npx skills add chiefsegundo/boof

Boof 🍑

Local-first document processing: PDF → markdown → RAG index → token-efficient analysis.

Documents stay local. Only relevant chunks go to the LLM. Maximum knowledge absorption, minimum token burn.

Powered by opendataloader-pdf — #1 in PDF parsing benchmarks (0.90 overall, 0.93 table accuracy). CPU-only, no GPU required.

Quick Reference

Convert + index a document

bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf

Convert with custom collection name

bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf --collection my-project

Query indexed content

qmd query "your question" -c collection-name

Core Workflow

  1. Boof it: Run boof.sh on a PDF. This converts it to markdown via opendataloader-pdf (local Java engine, no API, no GPU) and indexes it into QMD for semantic search.

  2. Query it: Use qmd query to retrieve only the relevant chunks. Send those chunks to the LLM — not the entire document.

  3. Analyze it: The LLM sees focused, relevant excerpts. No wasted tokens, no lost-in-the-middle problems.

When to Use Each Approach

"Analyze this specific aspect of the paper" → Boof + query (cheapest, most focused)

"Summarize this entire document" → Boof, then read the markdown section by section. Summarize each section individually, then merge summaries. See advanced-usage.md.

"Compare findings across multiple papers" → Boof all papers into one collection, then query across them.

"Find where the paper discusses X"qmd search "X" -c collection for exact match, qmd query "X" -c collection for semantic match.

Output Location

Converted markdown files are saved to knowledge/boofed/ by default (override with --output-dir).

Setup

If boof.sh reports missing dependencies, see setup-guide.md for installation instructions (Java + opendataloader-pdf + QMD).

Environment

  • ODL_ENV — Path to opendataloader-pdf Python venv (default: ~/.openclaw/tools/odl-env)
  • QMD_BIN — Path to qmd binary (default: ~/.bun/bin/qmd)
  • BOOF_OUTPUT_DIR — Default output directory (default: ~/.openclaw/workspace/knowledge/boofed)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Document Workflow

一键实现学术论文的搜索、下载、分块提取文本及结构化总结,支持按年份和引用数筛选。

Registry SourceRecently Updated
5111Profile unavailable
Research

Research To Wechat

A native research-first pipeline that turns a topic, notes, article, URL, or transcript into a sourced article with an evidence ledger, polished Markdown, in...

Registry SourceRecently Updated
5792Profile unavailable
Research

Paper Fetch

Use when the user wants to download a paper PDF from a DOI (or title, resolved to a DOI first). Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, Sema...

Registry SourceRecently Updated
2371Profile unavailable
Research

paper-reader (XuRuitian version)

精读学术文献的专家级 Skill。当用户上传 PDF、Word、Excel、PPT 或 TXT 格式的学术论文,并希望进行深度学术分析时使用本 Skill。支持中英双语文献, 可自动识别文件类型、提取全文内容,并按六大维度(研究目标、新方法、实验验证、 未来方向、批判分析、实用建议)输出结构化分析报告。触发词包括...

Registry SourceRecently Updated
1150Profile unavailable