pdf-read

PDF Read Skill

Extract text and metadata from PDF files.

When to Use

✅ USE this skill when:

User uploads a PDF and asks for summary
Extract text from a PDF document
Read PDF metadata (author, title, pages)
Analyze PDF content

When NOT to Use

❌ DON'T use this skill when:

Creating or generating PDFs → use reporting tools
Editing existing PDFs → use PDF manipulation tools
OCR on scanned images → use OCR/tesseract tools
Password-protected PDFs → ask user to unlock first

Installation

cd /job npm install pdf-parse

Usage

const fs = require('fs'); const pdf = require('pdf-parse');

async function readPDF(filePath) { const dataBuffer = fs.readFileSync(filePath); const data = await pdf(dataBuffer);

return { text: data.text, pages: data.numpages, info: data.info, // metadata version: data.version, metadata: data.metadata }; }

// Example const result = await readPDF('/path/to/document.pdf'); console.log(Pages: ${result.pages}); console.log(Text preview: ${result.text.substring(0, 500)}...);

Extract Text by Page Range

const pdf = require('pdf-parse'); const fs = require('fs');

async function readPDFPages(filePath, startPage, endPage) { const dataBuffer = fs.readFileSync(filePath); const data = await pdf(dataBuffer, { max: endPage, version: 'v2.0.550', normalizeWhitespace: true, });

return data.text; }

Get Metadata

const result = await readPDF('/path/to/document.pdf'); console.log('Author:', result.info?.Author); console.log('Title:', result.info?.Title); console.log('Subject:', result.info?.Subject); console.log('Keywords:', result.info?.Keywords); console.log('Creator:', result.info?.Creator); console.log('Producer:', result.info?.Producer); console.log('Creation Date:', result.info?.CreationDate); console.log('Mod Date:', result.info?.ModDate);

Search Text in PDF

async function searchInPDF(filePath, searchTerm) { const result = await readPDF(filePath); const text = result.text; const lines = text.split('\n');

const matches = []; lines.forEach((line, index) => { if (line.toLowerCase().includes(searchTerm.toLowerCase())) { matches.push({ line: index + 1, content: line.trim() }); } });

return { total_matches: matches.length, matches: matches.slice(0, 20) // limit results }; }

Extract All Text as Single String

async function extractFullText(filePath) { const result = await readPDF(filePath); // Normalize whitespace for cleaner output return result.text.replace(/\s+/g, ' ').trim(); }

Handling Large PDFs

For PDFs with many pages, process in chunks:

async function readPDFFirstNPages(filePath, maxPages = 10) { const dataBuffer = fs.readFileSync(filePath); const data = await pdf(dataBuffer, { max: maxPages });

return { text: data.text, total_pages: data.numpages, pages_read: Math.min(maxPages, data.numpages) }; }

Error Handling

async function safeReadPDF(filePath) { try { const result = await readPDF(filePath); return { success: true, ...result }; } catch (error) { if (error.message.includes('password')) { return { success: false, error: 'PDF is password-protected' }; } if (error.message.includes('parse')) { return { success: false, error: 'Invalid or corrupted PDF' }; } return { success: false, error: error.message }; } }

Quick Response Template

"Read this PDF"

const result = await readPDF(filePath); return `📄 PDF Summary

Pages: ${result.pages} Title: ${result.info?.Title || 'N/A'} Author: ${result.info?.Author || 'N/A'}

Preview (first 500 chars): ${result.text.substring(0, 500)}... `;

Notes

pdf-parse works on most standard PDFs
Does NOT support OCR for scanned documents
Does NOT handle password-protected PDFs
For image-heavy PDFs, text extraction may be limited
Large PDFs (>100 pages) should be read in chunks

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

vector-memory

model-router

rss-reader

video-frames