Ebook Analysis: Non-Fiction Knowledge Extraction
You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:
-
Concept Extraction - Extract ideas classified by abstraction (principle → tactic)
-
Entity Extraction - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books
Core Principle
Every extraction must be traceable to its exact source. Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.
Two Extraction Modes
Mode 1: Concept Extraction
For extracting IDEAS organized by abstraction level.
Use when: Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.
Output: JSON files (analysis.json, concepts.json)
Example: "Spaced repetition improves retention" is a MECHANISM at Layer 2.
Mode 2: Entity Extraction
For extracting NAMED THINGS that can be cross-referenced across books.
Use when: Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.
Output: Markdown files in knowledge base structure
Example: "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.
Choosing a Mode
If you want to... Use Mode
Understand a book's argument structure Concept Extraction
Build a reference library across books Entity Extraction
Create actionable takeaways Concept Extraction
Track what researchers say across sources Entity Extraction
Both Run both modes sequentially
Entity Extraction Mode (Detailed)
Entity Types
Type What It Captures Example
study Research findings, experiments, data Flynn Effect, Marshmallow Test
researcher People and their contributions Anders Ericsson, Robin Hogarth
framework Mental models, taxonomies, systems Kind vs Wicked, Desirable Difficulties
anecdote Stories used to illustrate points Tiger vs Roger, Challenger Disaster
concept Ideas that aren't frameworks Cognitive entrenchment, Match quality
Extended Entity Type Guidance
Some entities don't fit cleanly into the five types. Guidelines:
Entity Kind Use Type Rationale
Simulations/Games (Superstruct, EVOKE) anecdote Illustrative events, even if hypothetical
Institutions (IFTF, WEF) researcher Organizations contribute ideas like individuals
Historical events (Challenger disaster) anecdote Stories that illustrate principles
Hypothetical scenarios anecdote Future scenarios from books like Imaginable
Thought experiments framework If systematic; otherwise concept
When uncertain: Default to anecdote for narratives/events, concept for ideas, framework for systematic methods.
Author-as-Subject Pattern
When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):
Create a researcher entity if:
-
Author has notable prior work or institutional affiliation
-
Author appears in Wikipedia or other reference sources
-
Author's background/credentials are relevant to understanding the book
-
Other books in your collection might reference them
Skip if:
-
Author is primarily known only for this book
-
No external sources to verify/enrich the entity
Template addition for author-subjects:
Note
This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
Entity File Template
[Entity Name]
Type: study | researcher | framework | anecdote | concept Status: stub | partial | solid | authoritative Last Updated: YYYY-MM-DD Aliases: alias1, alias2, alias3
Summary
[2-3 sentence synthesized understanding]
Key Findings / What It Illustrates
-
[Claim or finding with source] — Source: [Book], Ch.[X]
-
[Another claim] — Source: [Book], Ch.[X]
Key Quotes
"Quotable text here."
"Another memorable quote."
Sources in Collection
| Book | Author | How It's Used | Citation |
|---|---|---|---|
| Range | Epstein | [Role in book] | Ch.X |
Sources NOT in Collection
- [Book that would enrich this entity]
Related Entities
- Other Entity - Relationship description
Open Questions
- [What we don't yet know]
Knowledge Base Structure
/knowledge/ ├── _index.md # Master registry ├── _entities.json # Searchable index (generated) │ ├── nonfiction/ │ ├── _index.md # Domain index │ ├── _[book]-quotes.md # Book-specific quotes file │ ├── studies/ │ │ ├── flynn-effect.md │ │ └── chase-simon-chunking.md │ ├── researchers/ │ │ ├── hogarth-robin.md │ │ └── tetlock-philip.md │ ├── frameworks/ │ │ ├── kind-vs-wicked-environments.md │ │ └── desirable-difficulties.md │ ├── anecdotes/ │ │ ├── tiger-vs-roger.md │ │ └── challenger-disaster.md │ └── concepts/ │ ├── cognitive-entrenchment.md │ └── match-quality.md │ ├── cooking/ # Domain-specific structure │ ├── techniques/ │ ├── ingredients/ │ └── equipment/ │ └── technical/ ├── patterns/ └── technologies/
Quotes Extraction
Quotable quotes are a distinct extraction type. For each book, create a quotes file:
File: _[book-slug]-quotes.md
Structure:
Quotable Quotes from [Book Title]
Author: [Author] Last Updated: YYYY-MM-DD
On [Theme 1]
"Quote text here."
"Another quote on same theme."
On [Theme 2]
"Quote on different theme."
What makes a good quote:
-
Memorable phrasing that captures a key insight
-
Self-contained (understandable without context)
-
Surprising or counterintuitive formulation
-
Useful for presentations, writing, or reference
Entity Extraction Workflow
-
Scan book - Read through identifying named studies, researchers, frameworks, illustrative stories
-
Check existing entities - Use kb-resolve-entity.ts to see if entity already exists
-
Create or update - New entity → create file; existing → add as source
-
Add quotes - Extract memorable quotes to quotes file
-
Cross-link - Add Related Entities sections
-
Regenerate index - Run kb-generate-index.ts
Entity Extraction States (KB0-KB5)
State Symptoms Intervention
KB0 No knowledge base Create directory structure
KB1 Structure exists, no entities Begin extraction
KB2 Extracting from book Create entity files
KB3 Entities created, not linked Add Related Entities
KB4 Linked, no index Run kb-generate-index.ts
KB5 Complete for this book Proceed to next book
Cross-Book Synthesis Workflow
Triggered when: 2+ books have been extracted to the knowledge base.
Goals:
-
Find entities that appear in multiple books
-
Identify conceptual connections between books
-
Surface contradictions or complementary perspectives
-
Update entity files with multi-source synthesis
Process:
Entity overlap detection
Find entities with 2+ sources
grep -l "Sources in Collection" knowledge/nonfiction/**/.md |
xargs grep -l "| . | .* |" | head -20
Or manually review entities updated with new source.
Conceptual connection mapping
-
Compare frameworks across books (e.g., Range's "wicked environments" ↔ Imaginable's "futures thinking")
-
Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable)
-
Look for complementary themes (prediction failure → preparation despite uncertainty)
Synthesis documentation For entities appearing in 2+ books, update the Summary section:
Summary
[Synthesized understanding from BOTH sources, noting agreements and differences]
Cross-book insights Document thematic connections in context/insights/cross-book-{theme}.md :
Cross-Book Insight: [Theme]
Books Contributing
- Range (Epstein) - [perspective]
- Imaginable (McGonigal) - [perspective]
Synthesis
[How the books complement or contradict each other]
Concept Extraction Mode (Detailed)
Concept Types (Abstract → Concrete)
Type Definition Example
Principle Foundational truth or axiom "Communities form around shared identity"
Mechanism How something works "Reciprocity creates social bonds"
Pattern Recurring structure or approach "The community lifecycle pattern"
Strategy High-level approach to achieve goals "Build trust before asking for contribution"
Tactic Specific actionable technique "Send welcome emails within 24 hours"
Abstraction Layers
Layer Name Abstraction Example
0 Foundational Universal principles "Humans seek belonging"
1 Theoretical Domain-specific theory "Community requires shared purpose"
2 Strategic Approaches and frameworks "The funnel model of engagement"
3 Tactical Specific methods "Onboarding sequences"
4 Specific Concrete implementations "Use Discourse for forums"
Relationship Types
Relationship Meaning When to Use
INFLUENCES A affects B Causal or correlational connection
SUPPORTS A provides evidence for B Citation, example, validation
CONTRADICTS A conflicts with B Opposing claims
COMPOSED_OF A contains B Part-whole relationships
DERIVES_FROM A is derived from B Logical conclusions
Concept Extraction States (EA0-EA7)
State Symptoms Intervention
EA0 No input file Guide file preparation
EA1 Raw file, not parsed Run ea-parse.ts
EA2 Parsed, not extracted LLM extracts concepts
EA3 Extracted, not classified Assign types and layers
EA4 Classified, not annotated Add themes, relationships
EA5 Single book complete Export or proceed to synthesis
EA6 Multi-book ready Cross-book synthesis
EA7 Analysis complete Generate reports
Concept Extraction Workflow
-
Parse - Run ea-parse.ts to chunk book with position tracking
-
Extract - Present chunks to LLM for concept identification with exact quotes
-
Classify - Assign type (principle→tactic) and layer (0-4)
-
Annotate - Add themes and functional analysis
-
Link - Connect related concepts
-
Export - Generate analysis.json, concepts.json, report.md
Available Tools
Parsing Tools
ea-parse.ts
Parse ebook files into chunks with metadata and position tracking.
deno run --allow-read scripts/ea-parse.ts path/to/book.txt deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
Output: JSON with metadata, chapters (if detected), and chunks with positions.
Knowledge Base Tools
kb-generate-index.ts
Scan knowledge base and generate searchable entity index.
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
Output: Creates _entities.json with all entities, aliases, and metadata.
kb-resolve-entity.ts
Search for existing entities before creating duplicates.
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect" deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5 deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json
Options:
-
--threshold <0-1>
-
Minimum match score (default: 0.3)
-
--limit <n>
-
Maximum results (default: 5)
-
--json
-
Output as JSON
Validation Tools
ea-validate.ts
Validate analysis output for citation accuracy and schema completeness.
deno run --allow-read scripts/ea-validate.ts analysis.json --report
Anti-Patterns
The Extraction Flood
Pattern: Extracting every potentially interesting phrase. Fix: Ask "Would I cite this?" before extracting. Quality over quantity.
The Citation Black Hole
Pattern: Extracting without preserving exact quotes or positions. Fix: Always capture: exact quote, chapter reference, context.
The Duplicate Entity
Pattern: Creating new entity without checking if it exists. Fix: Always run kb-resolve-entity.ts first.
The Orphan Entity
Pattern: Entities without Related Entities links. Fix: Every entity should connect to at least 2 others.
The Quote-Free Entity
Pattern: Entity captures ideas but no memorable phrasing. Fix: Include Key Quotes section with author's exact words.
The Single-Book Silo
Pattern: Analyzing books without cross-referencing. Fix: After 2+ books, run synthesis to find connections.
Example Workflows
Full Entity Extraction (Range Example)
- Scan book chapter by chapter
- Identify all named studies, researchers, frameworks, anecdotes
- Create inventory document listing all potential entities
- For each entity: a. kb-resolve-entity.ts "[entity name]" to check existence b. Create markdown file in appropriate type directory c. Fill in template with findings and citations d. Add Key Quotes section
- Create _range-quotes.md with all memorable quotes
- Update _index.md with new entities
- kb-generate-index.ts to rebuild _entities.json
Quick Concept Scan
- ea-parse.ts book.txt --chunk-size 2000
- For each chunk, extract top 3-5 concepts
- Classify by type and layer
- Generate concepts.json and report.md
Output Persistence
Entity Extraction Output
File Location
Entity files knowledge/{domain}/{type}/{entity-slug}.md
Quotes file knowledge/{domain}/_[book]-quotes.md
Entity index knowledge/_entities.json
Domain index knowledge/{domain}/_index.md
Concept Extraction Output
File Location
Full analysis ebook-analysis/{author}-{title}/analysis.json
Concepts only ebook-analysis/{author}-{title}/concepts.json
Citations ebook-analysis/{author}-{title}/citations.json
Report ebook-analysis/{author}-{title}/report.md
Verification (Oracle)
What This Skill Can Verify
-
Citation positions exist - Validate quoted text appears at claimed position
-
Schema completeness - Required fields present
-
Cross-reference integrity - Referenced entities exist
-
Duplicate detection - Entity doesn't already exist (via kb-resolve-entity.ts)
What Requires Human Judgment
-
Significance - Is this worth extracting?
-
Classification - Is this really a "framework" vs "concept"?
-
Relationship validity - Does A really influence B?
-
Quote quality - Is this actually memorable?
Integration Graph
Inbound (From Other Skills)
Source Leads to
research Multi-book synthesis ready
reverse-outliner Structural data for concept extraction
Outbound (To Other Skills)
From State Leads to
Entity extraction complete dna-extraction (deep functional analysis)
Concept extraction complete media-meta-analysis (cross-source synthesis)
Complementary Skills
Skill Relationship
dna-extraction 6-axis functional analysis for annotation
reverse-outliner Structural approach for fiction
voice-analysis Author style fingerprinting
context-network Knowledge base maintenance
Calibration Data (from Range + Imaginable extractions)
By Book Density
Book Type Expected Entities Estimated Effort
Dense non-fiction (Range, Thinking Fast & Slow) 60-100 4-6 hours
Moderate non-fiction (most business books) 30-50 2-3 hours
Light non-fiction (popular science) 15-30 1-2 hours
Technical books 20-40 2-3 hours
By Book Subtype
Different non-fiction subtypes yield different entity profiles:
Subtype Example Entity Profile Expected Count
Research synthesis Range Many studies, researchers, frameworks 60-100
Methodological/How-to Imaginable Many frameworks, few studies 30-50
Memoir/Narrative Educated Few frameworks, many anecdotes 20-40
Reference Technical manuals Many concepts, few anecdotes Variable
Research synthesis books cite many studies and researchers, connecting ideas across domains. Methodological books teach techniques and frameworks but cite fewer external sources. Memoir/narrative books use personal stories to illustrate points rather than research.
Metadata Reliability Warning
Book classification metadata (Calibre tags, library categories) is often:
-
Wrong - Fiction/non-fiction misclassified
-
Generic - "General Fiction" or "Self-Help" applied broadly
-
Inconsistent - Same book categorized differently across sources
Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error.
Reasoning Requirements
Standard Reasoning
-
Single chunk concept extraction
-
Type/layer classification
-
Simple relationship identification
-
Individual entity creation
Extended Reasoning (ultrathink)
Use extended thinking for:
-
Multi-book synthesis - requires holding multiple networks simultaneously
-
Contradiction detection - semantic comparison across sources
-
Theme emergence - identifying patterns across large sets
-
Knowledge gap identification - reasoning about what's missing
Trigger phrases: "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis"
What You Do NOT Do
-
Extract without citation traceability
-
Create entities without checking for duplicates
-
Skip the linking phase (orphan entities are not useful)
-
Leave entities without quotes
-
Treat fiction as non-fiction
-
Use regex for semantic analysis (LLM judgment only)