RKnowledge - Knowledge Graph Builder
Build knowledge graphs from any text corpus using LLMs. This skill helps you extract concepts and relationships from documents and store them in a queryable graph database.
When to Use
Use this skill when you need to:
-
Extract knowledge from documents (PDF, Markdown, HTML, TXT)
-
Build a knowledge graph for Graph RAG applications
-
Analyze relationships between concepts in a corpus
-
Create visual representations of document content
-
Query extracted knowledge using natural language or Cypher
Quick Start
- Initialize
Install and initialize rknowledge
rknowledge init
This creates a configuration file and starts Neo4j via Docker.
- Configure API Keys
Use the auth command to configure your LLM provider:
Interactive setup
rknowledge auth
Or specify provider directly
rknowledge auth --provider anthropic
Or set directly with key
rknowledge auth --provider anthropic --key your-key-here
List configured providers
rknowledge auth --list
Alternatively, use environment variables:
export ANTHROPIC_API_KEY=your-key-here
or
export OPENAI_API_KEY=your-key-here
or
export GOOGLE_API_KEY=your-key-here # also accepts GEMINI_API_KEY
or use Ollama for local models (no API key needed)
- Build Knowledge Graph
Process a single document
rknowledge build ./document.pdf
Process a directory of documents
rknowledge build ./docs/
Specify provider and model
rknowledge build ./docs/ --provider anthropic --model claude-sonnet-4-20250514
- Query the Graph
Natural language search
rknowledge query "What concepts relate to authentication?"
Direct Cypher query
rknowledge query "cypher: MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 10"
- Export
Export to JSON
rknowledge export --format json --output graph.json
Export to CSV (creates nodes.csv and edges.csv)
rknowledge export --format csv --output graph
Export to GraphML
rknowledge export --format graphml --output graph.graphml
Export to Cypher statements
rknowledge export --format cypher --output import.cypher
- Visualize
Open interactive visualization in browser
rknowledge viz
The visualization features a premium glassmorphism dashboard with:
-
Entity Filtering: Show/hide specific concept types.
-
Neighborhood Search: Highlight concepts and their relations.
-
Interactive Cards: Explore properties and connections.
Commands Reference
Command Description
rknowledge init
Initialize config and start Neo4j
rknowledge auth
Configure API keys for LLM providers
rknowledge build <path>
Process documents and build graph
rknowledge add
Manually insert relations into the graph
rknowledge query <query>
Search or query the graph
rknowledge path <from> <to>
Find shortest path between concepts
rknowledge stats
Show graph statistics and analytics
rknowledge communities
List detected communities and members
rknowledge export
Export graph to various formats
rknowledge viz
Open interactive visualization in browser
rknowledge doctor
Check system health and diagnose problems
Build Options
Option Description Default
--provider
LLM provider (anthropic, openai, ollama, google) anthropic
--model
Model to use Provider default
--output
Output destination (neo4j, json, csv) neo4j
--tenant
Tenant namespace for knowledge isolation default
--domain
Domain name for specialized extraction None
--context
Custom context for extraction prompts None
--context-file
Path to file containing custom prompt None
--chunk-size
Text chunk size in characters 1500
--chunk-overlap
Overlap between chunks 150
--concurrency, -j
Number of concurrent LLM requests 4
--append
Append to existing graph (incremental) false
Supported File Types
-
PDF (.pdf) - Extracts text from PDF documents
-
Markdown (.md) - Parses and extracts text from Markdown
-
HTML (.html, .htm) - Extracts text content from HTML
-
Plain Text (.txt) - Direct text processing
LLM Providers
Anthropic (Recommended)
export ANTHROPIC_API_KEY=your-key rknowledge build ./docs --provider anthropic --model claude-sonnet-4-20250514
OpenAI
export OPENAI_API_KEY=your-key rknowledge build ./docs --provider openai --model gpt-4o
Google (Gemini)
Accepts either GOOGLE_API_KEY or GEMINI_API_KEY
export GEMINI_API_KEY=your-key rknowledge build ./docs --provider google --model gemini-2.0-flash
Ollama (Local - Free)
Start Ollama and pull a model first
ollama pull mistral rknowledge build ./docs --provider ollama --model mistral
OpenAI-Compatible APIs (Groq, DeepSeek, Mistral, Together, etc.)
The OpenAI provider works with any OpenAI-compatible API. Set base_url in your config file to point to the service:
~/.config/rknowledge/config.toml
Groq (fast inference)
[providers.openai] api_key = "${GROQ_API_KEY}" base_url = "https://api.groq.com/openai/v1" model = "llama-3.3-70b-versatile"
DeepSeek
[providers.openai] api_key = "${DEEPSEEK_API_KEY}" base_url = "https://api.deepseek.com/v1" model = "deepseek-chat"
Mistral
[providers.openai] api_key = "${MISTRAL_API_KEY}" base_url = "https://api.mistral.ai/v1" model = "mistral-large-latest"
Together AI
[providers.openai] api_key = "${TOGETHER_API_KEY}" base_url = "https://api.together.xyz/v1" model = "meta-llama/Llama-3-70b-chat-hf"
OpenRouter (access many models through one API)
[providers.openai] api_key = "${OPENROUTER_API_KEY}" base_url = "https://openrouter.ai/api/v1" model = "anthropic/claude-sonnet-4-20250514"
LM Studio or vLLM (local)
[providers.openai] api_key = "not-needed" base_url = "http://localhost:1234/v1" model = "local-model"
Then build with:
rknowledge build ./docs --provider openai
Advanced Features
Tenant Isolation
Isolate multiple projects or users within the same Neo4j instance using the --tenant flag:
Add data to project-alpha
rknowledge build ./docs-alpha --tenant alpha
Add data to project-beta
rknowledge build ./docs-beta --tenant beta
Queries only see data from the specified tenant
rknowledge stats --tenant alpha rknowledge query "Who is the lead dev?" --tenant beta
Domain-Aware Extraction
Customize LLM prompts for specialized domains (medical, legal, technical) using domain flags:
Specify domain-specific context directly
rknowledge build ./medical-docs --domain medical --context "Focus on clinical trials and drug interactions"
Or use a custom prompt file for complex instructions
rknowledge build ./codebase --context-file extraction_prompt.txt
Manual Relation Insertion
Add individual relations directly to the graph without document processing. This is useful for adding "ground truth" or linking concepts that weren't captured by the LLM:
Interactive mode (walks you through node names, types, and relation)
rknowledge add --interactive
Single relation with types
rknowledge add "Knowledge Graph" "contains" "Nodes" --type1 "concept" --type2 "structure"
Single relation for a specific tenant
rknowledge add "Project A" "depends on" "Library B" --tenant my-client --relation "dependency"
Batch import from JSON (see docs for schema)
rknowledge add --from-file relations.json --tenant alpha
Tenant-Aware Visualization
The redesigned visualization dashboard supports all v0.2.0 features:
Visualize a specific project
rknowledge viz --tenant project-x
-
Filter Sidebar: Toggle entire entity types (e.g., hide all "proximity" nodes) to clean up the graph.
-
Search & Focus: Use the search bar to find a node; it will automatically highlight and pull in its neighbors.
-
Deep Exploration: Click any node to see its full property list and all related nodes in a dedicated detail panel.
Example Workflows
Build a Knowledge Base from Documentation
Clone a repo's docs
git clone https://github.com/example/project docs
Build knowledge graph
rknowledge build ./docs --provider anthropic
Query for specific topics
rknowledge query "How does authentication work?"
Analyze Research Papers
Process PDF papers
rknowledge build ./papers/ --chunk-size 2000
Export for further analysis
rknowledge export --format json --output research-graph.json
Create Graph RAG Backend
Build comprehensive graph
rknowledge build ./knowledge-base/
Query programmatically via Neo4j
Connect to bolt://localhost:7687 with neo4j/rknowledge
Neo4j Access
After running rknowledge init , Neo4j is available at:
-
Browser: http://localhost:7474
-
Bolt: bolt://localhost:7687
-
Credentials: neo4j / rknowledge
Troubleshooting
Run the built-in diagnostics first:
rknowledge doctor
This checks config, Docker, Neo4j connectivity, LLM providers, and graph status.
Neo4j Connection Failed
Check if Docker is running
docker ps
Restart Neo4j
cd ~/.config/rknowledge docker compose up -d
API Key Issues
Verify API key is set
echo $ANTHROPIC_API_KEY
Or check config file
cat ~/.config/rknowledge/config.toml
Large Documents
For very large documents, increase chunk size:
rknowledge build ./large-doc.pdf --chunk-size 3000 --chunk-overlap 300
See Also
-
REFERENCE.md - Detailed configuration reference
-
Neo4j Documentation
-
Cypher Query Language