very-long-text-summarization

Very Long Text Summarization

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "very-long-text-summarization" with this command: npx skills add erichowens/some_claude_skills/erichowens-some-claude-skills-very-long-text-summarization

Very Long Text Summarization

Processes texts too large for a single context window using hierarchical multi-pass extraction with armies of cheap models. Produces structured knowledge maps, indexed summaries, and skill drafts — not just prose compression.

When to Use

✅ Use for:

  • Professional handbooks and textbooks (100-1000+ pages)

  • Career biographies and memoirs (extracting expertise patterns)

  • Large codebases (architecture-level understanding)

  • Research paper collections (synthesizing findings across papers)

  • Any text exceeding a single context window (~100K tokens)

❌ NOT for:

  • Short documents (<10 pages) — just read them directly

  • Real-time conversation summarization (use auto-compact patterns)

  • Code documentation generation (use technical-writer )

  • Simple TL;DR requests (not worth the multi-pass overhead)

Architecture: Three-Pass Hierarchical Extraction

flowchart TD D[Document] --> C[Chunk into segments] C --> P1["Pass 1: Haiku army\n(parallel extraction)"] P1 --> I[Intermediate summaries] I --> P2["Pass 2: Sonnet synthesis\n(merge + structure)"] P2 --> S[Structured knowledge map] S --> P3["Pass 3: Opus refinement\n(optional, for skill drafts)"] P3 --> O[Final output]

Pass 1: Chunked Extraction (Haiku Army)

Split the document into overlapping chunks (~4K tokens each, 500 token overlap). Deploy one Haiku call per chunk in parallel. Each extracts:

extraction_template: summary: "2-3 sentence summary of this section" key_claims: ["list of factual claims or assertions"] processes: ["any step-by-step procedures described"] decisions: ["any decision points or heuristics mentioned"] failures: ["any failures, mistakes, or anti-patterns described"] aha_moments: ["any insights, realizations, or conceptual breakthroughs"] metaphors: ["any metaphors or mental models used"] temporal: ["any 'things changed when...' or 'before X, after Y' patterns"] quotes: ["notable direct quotes worth preserving"] references: ["any citations, links, or cross-references"]

Cost: ~$0.001 per chunk. A 300-page book (~150K tokens) = ~38 chunks = ~$0.04 total for Pass 1.

Parallelism: All chunks run simultaneously. A 300-page book completes Pass 1 in ~3 seconds (wall clock), not 3 minutes.

Pass 2: Synthesis (Sonnet)

Feed all Pass 1 extractions into one or more Sonnet calls. Sonnet merges, deduplicates, and structures the knowledge.

synthesis_template: document_summary: "1-2 paragraph executive summary"

knowledge_map: core_concepts: - concept: "name" definition: "what it means in this domain" relationships: ["connects to concept X because..."]

processes:
  - name: "process name"
    steps: ["ordered steps"]
    decision_points: ["where choices are made"]
    common_mistakes: ["what goes wrong"]

expertise_patterns:
  - pattern: "what experts do differently"
    novice_mistake: "what novices do instead"
    aha_moment: "the insight that bridges the gap"

temporal_evolution:
  - period: "date range"
    paradigm: "what was believed/practiced"
    change_trigger: "what caused the shift"

key_metaphors:
  - metaphor: "how practitioners think about X"
    maps_to: "the underlying structure it represents"

index: - topic: "topic name" chunk_ids: [3, 7, 12] # Which original chunks cover this summary: "1 sentence"

Cost: ~$0.02-0.05 depending on extraction volume. The index preserves traceability back to specific book sections.

Pass 3: Refinement (Opus, Optional)

For skill-draft output mode: Opus takes the knowledge map and produces a SKILL.md following the skill-architect template. This is the "crystallize skill from handbook" pipeline.

Cost: ~$0.10. Only run when the output is a skill draft.

Chunking Strategy

Semantic Chunking (Preferred)

Split on document structure — chapter boundaries, section headings, paragraph breaks. Preserves semantic coherence within each chunk.

def semantic_chunk(text: str, max_tokens: int = 4000, overlap: int = 500) -> list[str]: """Split text on structural boundaries with overlap.""" # Split on headings, then merge short sections sections = split_on_headings(text) # ##, ###, etc.

chunks = []
current = ""

for section in sections:
    if count_tokens(current + section) > max_tokens:
        chunks.append(current)
        # Overlap: keep the last ~500 tokens
        current = get_last_n_tokens(current, overlap) + section
    else:
        current += section

if current:
    chunks.append(current)

return chunks

Fixed-Size Chunking (Fallback)

For unstructured text without headings. Split on paragraph boundaries, targeting ~4K tokens with 500-token overlap.

Why Overlap?

Concepts that span chunk boundaries need to appear in both chunks to be extracted. Without overlap, you lose cross-boundary knowledge.

Output Modes

Mode 1: Summary

Produces a structured summary with executive overview, key concepts, and index.

Use for: Quick understanding of a long document. Reading a handbook before a meeting.

Mode 2: Knowledge Map

Produces the full knowledge map: concepts, processes, expertise patterns, temporal evolution, metaphors. Machine-readable (YAML/JSON) for downstream processing.

Use for: Feeding into skill creation, domain meta-skill development, or cross-document analysis.

Mode 3: Skill Draft

Produces a SKILL.md following the skill-architect template, with the handbook's expertise encoded as decision trees, anti-patterns, and shibboleths.

Use for: Converting professional handbooks into Claude skills. The KE pipeline.

Cost Model

Document Size Pages Chunks Pass 1 (Haiku) Pass 2 (Sonnet) Pass 3 (Opus) Total

Article 10 4 $0.004 $0.01 — $0.014

Chapter 30 10 $0.01 $0.02 — $0.03

Handbook 300 38 $0.04 $0.05 $0.10 $0.19

Textbook 800 100 $0.10 $0.10 $0.10 $0.30

Encyclopedia 2000+ 250+ $0.25 $0.20 $0.10 $0.55

Processing time is dominated by the longest single Haiku call (~2-3s). With full parallelism, even a 2000-page text completes Pass 1 in under 5 seconds.

Anti-Patterns

Single-Pass Summarization

Wrong: Feed the entire document into one Opus call. Why: Exceeds context window, or attention dilution produces weak extraction on such long input. Right: Hierarchical multi-pass. Cheap parallel extraction → expensive synthesis.

Summarization Without Structure

Wrong: Produce a 2-paragraph prose summary of a 300-page handbook. Why: The structure IS the knowledge. A flat summary loses the decision trees, failure patterns, and temporal evolution that make skills valuable. Right: Structured knowledge map with indexed access back to source sections.

Skipping Overlap

Wrong: Chunk on hard boundaries with no overlap. Why: Cross-boundary concepts get split and lost. Right: 500-token overlap between chunks. Each chunk includes the tail of the previous chunk.

Ignoring Source Traceability

Wrong: Produce extractions without tracking which chunk they came from. Why: When a claim seems wrong, you need to verify it against the source. Without traceability, you can't. Right: Every extraction carries a chunk_id linking back to the original text segment.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

video-processing-editing

No summary provided by upstream source.

Repository SourceNeeds Review
General

cv-creator

No summary provided by upstream source.

Repository SourceNeeds Review
General

mobile-ux-optimizer

No summary provided by upstream source.

Repository SourceNeeds Review
General

personal-finance-coach

No summary provided by upstream source.

Repository SourceNeeds Review