large-document-reader

Intelligently splits long academic or technical documents into chapters, generates structured JSON summaries for each, and creates a file system with a global index. This enables efficient AI retrieval and analysis, perfectly solving context window limitations by enabling “overview via summaries, deep-dive on demand” workflows.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "large-document-reader" with this command: npx skills add mrchenkuan/large-document-reader

Literature Structuring Expert

Automatically decompose long documents (papers, reports, books) into a structured, AI-friendly knowledge base. Splits by chapter, generates machine-readable summaries, and builds a navigable index to overcome context limits.

When to Use This Skill

Use this skill when the user:

  • Has a document that is too long for the AI's context window.
  • Needs to perform cross-chapter analysis or get a high-level overview of a long text.
  • Wants to build a reusable, queryable knowledge base from a PDF, Markdown, or text file.
  • Asks: "How can I get my AI to read this whole book/paper?"

Quick Reference

SituationAction
User provides a long document1. Analyze and split it into chapters.<br>2. Generate a JSON summary for each chapter.<br>3. Create a master index file.
User asks a high-level, cross-chapter questionProvide the content of the MASTER_INDEX.md file to the AI.
User asks a detailed, chapter-specific questionProvide the corresponding single file from the ./chapters/ directory to the AI.
Task completedPresent the generated file tree and MASTER_INDEX.md preview to the user.

Core Workflow

Phase 1: Intelligent Splitting

  1. Analyze Input: Receive the long document text or file path.
  2. Identify Structure: Automatically analyze the document to identify heading hierarchies (e.g., #, ##, 1., 1.1) to determine chapter boundaries. Prioritize user-specified splitting preferences.
  3. Execute Split: Split the document into independent plain-text files by chapter.
    • Naming Convention: {sequence_number}_{chapter_title}.md (e.g., 01_Introduction.md).
    • Storage Location: All chapter files are saved in the ./chapters/ directory.

Phase 2: Summary Generation & Structuring

  1. Generate Summary per Chapter: For each file in ./chapters/, generate a corresponding JSON summary file.
    • Structured Fields (JSON format):
      {
        "chapter_id": "Unique identifier matching the filename, e.g., 02_1",
        "chapter_title": "Chapter Title",
        "abstract": "Core summary of the chapter, 200-300 words.",
        "keywords": ["Keyword1", "Keyword2", "Keyword3"],
        "key_points": ["Key point one", "Key point two"],
        "related_sections": ["IDs of other chapters strongly related to this one"]
      }
      
    • Storage Location: JSON summary files are saved in the ./summaries/ directory (e.g., 01_Introduction.summary.json).

Phase 3: Create Global Index

  1. Aggregate Information: Collect data from all JSON files in ./summaries/.
  2. Generate Index: Create a global index file, MASTER_INDEX.md.
    • Content: Lists all chapters' IDs, titles, a short abstract preview, and keywords in a Markdown list or table.
    • Purpose: Provides a "bird's-eye view" for quick navigation and high-level Q&A.

Final Deliverables & File Structure

Upon completion, the following file tree is generated:

Project_Root/
├── chapters/           # 【Source Repository】Contains all split chapter texts (.md files)
│   ├── 01_Introduction.md
│   ├── 02_1_Experimental_Methods.md
│   └── ...
├── summaries/          # 【Summary Repository】Contains all structured JSON summaries
│   ├── 01_Introduction.summary.json
│   ├── 02_1_Experimental_Methods.summary.json
│   └── ...
└── MASTER_INDEX.md     # 【Global Navigation】Core document summary index

Usage Instructions for the User

For Global, Cross-Chapter Queries (e.g., “What is the paper's main thesis?”):

  • Provide the content of the MASTER_INDEX.md file to the AI. This is token-efficient.

For Specific, In-Depth Queries Within a Chapter (e.g., “What were the parameters in the 'Methods' section?”):

  • Provide the corresponding single chapter file from the chapters/ directory to the AI for full context.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

SpeakNotes: YouTube, Audio & Document Summaries

Use when OpenClaw needs to call SpeakNotes API routes directly using an API key and generate transcripts/summaries from YouTube URLs, media files, or documen...

Registry Source
2880Profile unavailable
Research

Core Refinery

Find the core that runs through everything — the ideas that survive across all your sources.

Registry Source
1.7K2Profile unavailable
Research

PBE Extractor

Extract invariant principles from any text — find the ideas that survive rephrasing.

Registry Source
1.8K6Profile unavailable
Research

OpenClaw Memory OS

OpenClaw Memory-OS - Digital immortality service with conversation recording infrastructure (Phase 1) | 数字永生服务对话记录基础设施(第一阶段)

Registry SourceRecently Updated
5630Profile unavailable