office-docs

Office Docs Skills Library

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "office-docs" with this command: npx skills add vamseeachanta/workspace-hub/vamseeachanta-workspace-hub-office-docs

Office Docs Skills Library

Office document automation, template generation, and document processing patterns Version: 1.0.0 | Last Updated: 2026-01-17

Overview

This library contains 5 production-ready skills for automating Microsoft Office and PDF document workflows. Each skill covers a specific document type with patterns for generation, manipulation, and template-based automation. Skills follow the Anthropic Skills format with practical examples from real-world document processing pipelines.

Quick Start

Browse available skills

ls skills/office-docs/

Read a skill

cat skills/office-docs/python-docx/SKILL.md

Skills are documentation - implement patterns in your document workflows

Available Skills

Skill Description Key Features

python-docx Word document creation and manipulation Paragraphs, tables, images, styles

openpyxl Excel workbook automation Cells, formulas, charts, formatting

python-pptx PowerPoint presentation generation Slides, shapes, layouts, animations

pypdf PDF manipulation and extraction Merge, split, extract, watermark

docx-templates Template-based document generation Jinja2 syntax, loops, conditionals

Skill Categories

Document Creation

  • python-docx - Create and modify Word documents programmatically

  • python-pptx - Generate PowerPoint presentations from data

Spreadsheet Automation

  • openpyxl - Full Excel workbook manipulation with formulas

PDF Processing

  • pypdf - Read, merge, split, and modify PDF files

Template Engines

  • docx-templates - Fill Word templates with dynamic data

Skill Selection Guide

Choose python-docx when:

  • Creating reports, contracts, or documentation from scratch

  • Modifying existing Word documents programmatically

  • Need full control over document structure and styling

  • Generating documents with complex formatting requirements

Choose openpyxl when:

  • Automating Excel report generation

  • Reading and processing spreadsheet data

  • Creating workbooks with formulas and charts

  • Batch processing multiple Excel files

Choose python-pptx when:

  • Generating presentations from data automatically

  • Creating slide decks with consistent branding

  • Building presentation templates with dynamic content

  • Automating reporting dashboards as slides

Choose pypdf when:

  • Merging multiple PDFs into one document

  • Extracting text or pages from PDF files

  • Adding watermarks or page numbers

  • Splitting large PDFs into smaller files

Choose docx-templates when:

  • Filling template documents with database records

  • Generating bulk documents (letters, invoices, contracts)

  • Non-developers need to maintain document templates

  • Jinja2-style syntax is preferred for templates

Quick Examples

Python-docx Report Generation

from docx import Document from docx.shared import Inches, Pt from docx.enum.text import WD_ALIGN_PARAGRAPH

Create document

doc = Document()

Add title

title = doc.add_heading('Monthly Report', level=0) title.alignment = WD_ALIGN_PARAGRAPH.CENTER

Add paragraph with formatting

para = doc.add_paragraph() run = para.add_run('Executive Summary: ') run.bold = True para.add_run('This report covers key metrics for January 2026.')

Add table

table = doc.add_table(rows=4, cols=3) table.style = 'Table Grid'

Header row

headers = ['Metric', 'Value', 'Change'] for i, header in enumerate(headers): cell = table.rows[0].cells[i] cell.text = header cell.paragraphs[0].runs[0].bold = True

Data rows

data = [ ['Revenue', '$1.2M', '+15%'], ['Users', '45,000', '+8%'], ['Retention', '92%', '+3%'] ] for row_idx, row_data in enumerate(data, start=1): for col_idx, value in enumerate(row_data): table.rows[row_idx].cells[col_idx].text = value

Add image

doc.add_picture('chart.png', width=Inches(5))

Save

doc.save('monthly_report.docx')

Openpyxl Excel Automation

from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment from openpyxl.chart import BarChart, Reference

Create workbook

wb = Workbook() ws = wb.active ws.title = "Sales Data"

Headers with styling

headers = ['Product', 'Q1', 'Q2', 'Q3', 'Q4', 'Total'] header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid") header_font = Font(color="FFFFFF", bold=True)

for col, header in enumerate(headers, start=1): cell = ws.cell(row=1, column=col, value=header) cell.fill = header_fill cell.font = header_font cell.alignment = Alignment(horizontal='center')

Data with formulas

data = [ ['Widget A', 100, 150, 200, 180], ['Widget B', 80, 120, 140, 160], ['Widget C', 200, 220, 250, 280] ]

for row_idx, row_data in enumerate(data, start=2): for col_idx, value in enumerate(row_data, start=1): ws.cell(row=row_idx, column=col_idx, value=value) # Total formula ws.cell(row=row_idx, column=6, value=f'=SUM(B{row_idx}:E{row_idx})')

Create chart

chart = BarChart() chart.title = "Quarterly Sales" chart.x_axis.title = "Product" chart.y_axis.title = "Units"

data_ref = Reference(ws, min_col=2, max_col=5, min_row=1, max_row=4) cats_ref = Reference(ws, min_col=1, min_row=2, max_row=4) chart.add_data(data_ref, titles_from_data=True) chart.set_categories(cats_ref)

ws.add_chart(chart, "H2")

wb.save('sales_report.xlsx')

Python-pptx Presentation

from pptx import Presentation from pptx.util import Inches, Pt from pptx.enum.text import PP_ALIGN from pptx.dml.color import RgbColor

Create presentation

prs = Presentation()

Title slide

title_layout = prs.slide_layouts[0] slide = prs.slides.add_slide(title_layout) slide.shapes.title.text = "Q1 2026 Results" slide.placeholders[1].text = "Company Performance Review"

Content slide with bullet points

bullet_layout = prs.slide_layouts[1] slide = prs.slides.add_slide(bullet_layout) slide.shapes.title.text = "Key Highlights"

body = slide.placeholders[1] tf = body.text_frame tf.text = "Revenue up 15% YoY"

for point in ["Customer base grew 8%", "New product launch successful", "Expanded to 3 new markets"]: p = tf.add_paragraph() p.text = point p.level = 1

Slide with chart placeholder

chart_layout = prs.slide_layouts[5] slide = prs.slides.add_slide(chart_layout) slide.shapes.title.text = "Sales Breakdown"

Add image

slide.shapes.add_picture( 'sales_chart.png', Inches(1), Inches(2), width=Inches(8) )

prs.save('quarterly_results.pptx')

PyPDF Manipulation

from pypdf import PdfReader, PdfWriter, PdfMerger

Merge PDFs

merger = PdfMerger() merger.append('report_part1.pdf') merger.append('report_part2.pdf') merger.append('appendix.pdf') merger.write('complete_report.pdf') merger.close()

Split PDF

reader = PdfReader('large_document.pdf') for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) writer.write(f'page_{i+1}.pdf')

Extract text

reader = PdfReader('document.pdf') text = "" for page in reader.pages: text += page.extract_text()

Add watermark

reader = PdfReader('document.pdf') watermark = PdfReader('watermark.pdf') writer = PdfWriter()

for page in reader.pages: page.merge_page(watermark.pages[0]) writer.add_page(page)

writer.write('watermarked_document.pdf')

Docx-Templates Bulk Generation

from docxtpl import DocxTemplate

Load template

doc = DocxTemplate("contract_template.docx")

Context for template

context = { 'client_name': 'Acme Corporation', 'contract_date': '2026-01-17', 'contract_value': '$50,000', 'terms': '12 months', 'services': [ {'name': 'Consulting', 'hours': 100, 'rate': '$200'}, {'name': 'Development', 'hours': 200, 'rate': '$150'}, {'name': 'Support', 'hours': 50, 'rate': '$100'} ], 'include_sla': True, 'sla_response_time': '4 hours' }

Render and save

doc.render(context) doc.save('acme_contract.docx')

Bulk generation

clients = load_clients_from_database() for client in clients: doc = DocxTemplate("invoice_template.docx") doc.render(client.to_dict()) doc.save(f'invoices/{client.id}_invoice.docx')

Integration Patterns

Document Generation Pipeline

Data Source --> Template Selection --> Rendering --> Post-processing --> Output | | | | | +-- Database +-- By type +-- Variables +-- Convert +-- Save +-- API +-- By language +-- Loops +-- Combine +-- Email +-- Form +-- By recipient +-- Images +-- Sign +-- Archive

Batch Processing Pattern

Input Files --> Validation --> Processing --> Quality Check --> Output | | | | | +-- Queue +-- Format +-- Transform +-- Verify +-- Organize +-- Watch +-- Content +-- Extract +-- Log +-- Notify

Template Management

Template Repo --> Version Control --> Environment --> Runtime | | | | +-- Design +-- Review +-- Dev/Prod +-- Hot reload +-- Test +-- Approve +-- Variables +-- Fallbacks

Common Patterns Across Skills

Error Handling

from pathlib import Path

def safe_document_generation(template_path, context, output_path): """Generate document with comprehensive error handling.""" try: if not Path(template_path).exists(): raise FileNotFoundError(f"Template not found: {template_path}")

    doc = DocxTemplate(template_path)
    doc.render(context)

    # Ensure output directory exists
    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    doc.save(output_path)

    return {"success": True, "path": output_path}

except Exception as e:
    logger.error(f"Document generation failed: {e}")
    return {"success": False, "error": str(e)}

Temporary File Handling

import tempfile from contextlib import contextmanager

@contextmanager def temp_document(): """Create temporary document that auto-cleans.""" with tempfile.NamedTemporaryFile(suffix='.docx', delete=False) as f: temp_path = f.name try: yield temp_path finally: Path(temp_path).unlink(missing_ok=True)

Streaming Large Files

def process_large_excel(file_path, chunk_size=1000): """Process large Excel files in chunks.""" from openpyxl import load_workbook

wb = load_workbook(file_path, read_only=True)
ws = wb.active

chunk = []
for row in ws.iter_rows(values_only=True):
    chunk.append(row)
    if len(chunk) >= chunk_size:
        yield chunk
        chunk = []

if chunk:
    yield chunk

Integration with Workspace-Hub

These skills power document automation across the workspace-hub ecosystem:

workspace-hub/ ├── documents/ │ ├── templates/ # Uses: docx-templates │ │ ├── contracts/ │ │ ├── invoices/ │ │ └── reports/ │ ├── generators/ # Uses: python-docx, openpyxl, python-pptx │ │ ├── report_builder.py │ │ ├── spreadsheet_gen.py │ │ └── presentation_gen.py │ └── processors/ # Uses: pypdf │ ├── pdf_merger.py │ └── text_extractor.py ├── output/ │ ├── generated/ │ └── archive/ └── config/ └── document_config.yaml

Best Practices

  1. Template Versioning

templates/ ├── v1/ │ └── contract_template.docx ├── v2/ │ └── contract_template.docx # Updated branding └── current -> v2/ # Symlink to current version

  1. Validation Before Generation

def validate_context(context, required_fields): """Validate context before document generation.""" missing = [f for f in required_fields if f not in context or not context[f]] if missing: raise ValueError(f"Missing required fields: {missing}") return True

  1. Output Organization

from datetime import datetime

def get_output_path(doc_type, client_id, extension='docx'): """Generate organized output path.""" date_str = datetime.now().strftime('%Y/%m/%d') return f"output/{doc_type}/{date_str}/{client_id}.{extension}"

  1. Logging and Audit Trail

def generate_with_audit(template, context, output_path): """Generate document with audit logging.""" start_time = time.time()

result = generate_document(template, context, output_path)

audit_log.info({
    'action': 'document_generated',
    'template': template,
    'output': output_path,
    'duration': time.time() - start_time,
    'context_keys': list(context.keys())
})

return result

Testing Document Generation

import pytest from docx import Document

def test_report_generation(): """Test report document structure.""" generate_report(sample_data, 'test_output.docx')

doc = Document('test_output.docx')

# Verify structure
assert len(doc.paragraphs) > 0
assert doc.paragraphs[0].text == 'Monthly Report'

# Verify tables
assert len(doc.tables) == 1
assert len(doc.tables[0].rows) == 4

def test_template_rendering(): """Test template variable substitution.""" context = {'name': 'Test Corp', 'amount': '$1000'}

doc = DocxTemplate('template.docx')
doc.render(context)
doc.save('output.docx')

result = Document('output.docx')
full_text = '\n'.join([p.text for p in result.paragraphs])

assert 'Test Corp' in full_text
assert '$1000' in full_text

def test_pdf_merge(): """Test PDF merging preserves pages.""" merge_pdfs(['doc1.pdf', 'doc2.pdf'], 'merged.pdf')

reader = PdfReader('merged.pdf')
assert len(reader.pages) == 4  # 2 + 2 pages

Related Resources

  • python-docx Documentation

  • openpyxl Documentation

  • python-pptx Documentation

  • pypdf Documentation

  • docxtpl Documentation

Version History

  • 1.0.0 (2026-01-17): Initial release with 5 office document skills

These skills represent patterns refined across document automation systems generating thousands of documents daily in production environments.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

echarts

No summary provided by upstream source.

Repository SourceNeeds Review
General

pandoc

No summary provided by upstream source.

Repository SourceNeeds Review
General

mkdocs

No summary provided by upstream source.

Repository SourceNeeds Review
General

gis

No summary provided by upstream source.

Repository SourceNeeds Review