pdf-tools

PDF engineering for extraction, generation, modification, and form filling. Use when extracting text or tables from PDFs, generating PDFs with Puppeteer, modifying PDFs with pdf-lib, filling PDF forms, or implementing PDF security. Use for AI-assisted OCR, HTML-to-PDF conversion, and document processing pipelines.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pdf-tools" with this command: npx skills add oakoss/agent-skills/oakoss-agent-skills-pdf-tools

PDF Tools

Full-lifecycle PDF engineering covering extraction, generation, modification, form filling, and security. Prioritizes JavaScript-first solutions (pdf-lib, unpdf, Puppeteer) with Python/CLI utilities for advanced scenarios.

When to use: Extracting structured data from PDFs, generating pixel-perfect PDFs from HTML/React, modifying existing PDFs, filling forms (fillable or non-fillable), or securing documents with encryption.

When NOT to use: Simple text file processing, image-only manipulation without PDF context, or tasks better handled by a word processor.

Quick Reference

TaskToolKey Point
Generate PDF from HTMLPuppeteer / Playwrightpage.pdf(); use networkidle0 (Puppeteer) or networkidle (Playwright)
Extract text (lightweight)unpdfEdge/serverless compatible
Extract tables (AI)Vision model + Zod schemaMulti-column and merged cell support
Extract tables (non-AI)pdfplumber (Python)Precise cell boundary detection
Modify, merge, splitpdf-lib (or @pdfme/pdf-lib)Byte-level PDF manipulation in JS
Fill fillable formspdf-lib (or @pdfme/pdf-lib)Inspect AcroForm fields before writing
Fill non-fillable formsPython annotation scriptsVisual analysis + bounding box annotations
Encrypt PDFqpdfAES-256: qpdf --encrypt user owner 256 --
Repair corrupted PDFqpdfqpdf input.pdf --replace-input
Fast text extraction (CLI)poppler-utilspdftotext -layout input.pdf -
Merge thousands of filespypdf (Python)Lighter than headless browser
Batch queue processingBullMQ + unpdfRedis-backed with retry, concurrency, progress tracking
PDF/A archival complianceghostscript + verapdfgs -dPDFA=2 for conversion; verapdf for validation
Tagged PDF (accessibility)Puppeteertagged: true maps HTML semantics to PDF structure tags
Digital signatures@signpdf/*PKCS#7 signing with P12 certificates
PDF comparisonunpdf + diff / pixelmatchText diff or pixel-level visual diff between versions
Secure redactionpymupdf (fitz)apply_redactions() removes content bytes, not just visual overlay

Common Mistakes

MistakeCorrect Pattern
Using canvas drawing commands for PDF generationUse Puppeteer/Playwright with HTML/CSS templates
Running Puppeteer in edge/serverless environmentsUse unpdf for edge; Puppeteer requires full Node.js
Extracting complex layouts with basic text parsersUse AI-assisted OCR or pdfplumber for multi-column text
Storing unencrypted PDFs with PII in public storageApply AES-256 encryption via qpdf before storage
Relying on window.print() for server-side generationUse headless browser APIs (page.pdf()) for deterministic output
Using pypdf for complex layout extractionUse pdfplumber or AI OCR for multi-column or overlapping text
Skipping font embedding in containerized environmentsEmbed Google Fonts or WOFF2 files with Puppeteer
Writing to flattened PDF form fieldsInspect AcroForm fields with pdf-lib before writing
Using unmaintained pdf-lib for encrypted PDFsUse @cantoo/pdf-lib fork which adds encrypted PDF support

Delegation

  • Inspect PDF structure and diagnose extraction issues: Use Explore agent to examine AcroForm fields, encoding, and metadata
  • Build end-to-end document processing pipelines: Use Task agent to implement extraction, transformation, and generation workflows
  • Design PDF architecture for a new system: Use Plan agent to select tools and plan extraction, generation, or modification strategies

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

application-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

database-security

No summary provided by upstream source.

Repository SourceNeeds Review
Security

quality-auditor

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

playwright

No summary provided by upstream source.

Repository SourceNeeds Review