PDF Tools
Tools for viewing, extracting, and editing PDF files using Python libraries (pdfplumber and PyPDF2).
Quick Start
All scripts require dependencies:
pip3 install pdfplumber PyPDF2
Core Operations
Extract Text
Extract text from PDF (all pages or specific pages):
scripts/extract_text.py document.pdf
scripts/extract_text.py document.pdf -p 1 3 5
scripts/extract_text.py document.pdf -o output.txt
Get PDF Info
View metadata and structure:
scripts/pdf_info.py document.pdf
scripts/pdf_info.py document.pdf -f json
Merge PDFs
Combine multiple PDFs into one:
scripts/merge_pdfs.py file1.pdf file2.pdf file3.pdf -o merged.pdf
Split PDF
Split into individual pages:
scripts/split_pdf.py document.pdf -o output_dir/
Split by page ranges:
scripts/split_pdf.py document.pdf -o output_dir/ -m ranges -r "1-3,5-7,10-12"
Rotate Pages
Rotate all pages or specific pages:
scripts/rotate_pdf.py document.pdf -o rotated.pdf -r 90
scripts/rotate_pdf.py document.pdf -o rotated.pdf -r 180 -p 1 3 5
Edit Text
Add text overlay on a page:
scripts/edit_text.py document.pdf -o edited.pdf --overlay "New Text" --page 1 --x 100 --y 700
scripts/edit_text.py document.pdf -o edited.pdf --overlay "Watermark" --page 1 --x 200 --y 400 --font-size 20
Replace text (limited, works best for simple cases):
scripts/edit_text.py document.pdf -o edited.pdf --replace "Old Text" "New Text"
Note: PDF text editing is complex due to the format. The overlay method is more reliable than replacement.
Workflow Patterns
Viewing PDF Content
- Get basic info:
scripts/pdf_info.py file.pdf - Extract text to preview:
scripts/extract_text.py file.pdf -p 1 - Extract full text if needed:
scripts/extract_text.py file.pdf -o content.txt
Reorganizing PDFs
- Split into pages:
scripts/split_pdf.py input.pdf -o pages/ - Merge selected pages:
scripts/merge_pdfs.py pages/page_1.pdf pages/page_3.pdf -o reordered.pdf
Extracting Sections
- Get page count:
scripts/pdf_info.py document.pdf - Split by ranges:
scripts/split_pdf.py document.pdf -o sections/ -m ranges -r "1-5,10-15"
Advanced Usage
For detailed library documentation and advanced patterns, see references/libraries.md.
Notes
- Page numbers are 1-indexed in all scripts (page 1 = first page)
- Text extraction works best with text-based PDFs (not scanned images)
- Rotation angles: 90, 180, 270, or -90 (counterclockwise)
- All scripts validate file existence before processing