Programmatic SEO
Production-grade framework for building SEO page sets at scale. Covers the full lifecycle from keyword pattern discovery through template design, data pipeline construction, quality assurance, and post-launch optimization. Designed for deployments ranging from 50 to 100,000+ pages.
Table of Contents
-
When to Use vs When Not To
-
Initial Assessment
-
The 14 Playbooks
-
Playbook Selection Matrix
-
Keyword Pattern Mining
-
Data Pipeline Architecture
-
Template Design System
-
Quality Control Framework
-
Internal Linking Architecture
-
Indexation Strategy
-
Launch Sequence
-
Post-Launch Optimization
-
Anti-Patterns and Penalty Avoidance
-
Decision Matrix: Build vs Skip
-
Output Artifacts
-
Related Skills
When to Use vs When Not To
Use this skill when:
-
You have a repeating keyword pattern with 50+ variations
-
You have (or can acquire) structured data to populate pages
-
The search intent is consistent across variations
-
Your domain has sufficient authority to compete
Do NOT use when:
-
Each page requires unique editorial content (use content-creator instead)
-
Total addressable pages < 30 (manual content is more effective)
-
You lack a data source and would be generating thin placeholder content
-
Your domain authority is below DR 20 and competitors are DR 60+
Initial Assessment
Before designing any pSEO strategy, answer these questions. Skip nothing.
- Opportunity Validation
Question Why It Matters Red Flag
What is the repeating keyword pattern? Defines the template structure Pattern is vague or inconsistent
What is the aggregate monthly search volume? Determines ROI ceiling < 5,000 aggregate monthly searches
How many unique pages can you generate? Scope the project < 50 pages (too few) or > 50K without data infrastructure
What does the SERP look like for sample queries? Competitive feasibility Page 1 dominated by DR 80+ editorial content
Is intent informational, navigational, or transactional? Template design Mixed intent across the same pattern
- Data Source Evaluation
Rate your data source on this scale:
Tier Source Type Defensibility Example
S Proprietary first-party Unbeatable Your product usage data, internal benchmarks
A Product-derived Strong Aggregated user analytics, customer outcomes
B User-generated Moderate Community reviews, submitted content
C Licensed exclusive Moderate Paid data feed no competitor has
D Public aggregated Weak Government data, public APIs
F Scraped commodity None Wikipedia rewrites, copied listings
Rule: Do not build pSEO on Tier F data. Google penalizes commodity rewrites. If your only data source is public and easily replicable, invest in acquiring Tier A-C data first.
- Competitive Moat Assessment
For 5 sample queries in your pattern, analyze page 1 results:
-
What is the average Domain Rating of ranking pages?
-
Are existing results programmatic or editorial?
-
What unique data do ranking pages provide?
-
What is the content depth (word count, data richness, UX quality)?
Go/No-Go threshold: If the average DR gap between you and page 1 is > 30 AND existing results have proprietary data, the opportunity requires either a differentiated approach or domain authority building first.
The 14 Playbooks
Playbook Pattern Example Data Requirement
1 Templates "[Type] template" "resume template", "invoice template" Template files + metadata
2 Curation "best [category]" "best CRM for startups" Product/service reviews + ratings
3 Conversions "[X] to [Y]" "100 USD to EUR" Conversion logic/API
4 Comparisons "[X] vs [Y]" "Notion vs Confluence" Feature data for both products
5 Examples "[type] examples" "landing page examples" Curated example collection
6 Locations "[service] in [city]" "coworking in Austin" Location-specific data
7 Personas "[product] for [audience]" "CRM for real estate" Audience-specific use cases
8 Integrations "[A] + [B] integration" "Slack Asana integration" Integration documentation
9 Glossary "what is [term]" "what is churn rate" Domain expertise
10 Translations Content in N languages Localized guides Translation + localization data
11 Directory "[category] tools" "AI writing tools" Tool listings + evaluations
12 Profiles "[entity name]" "Stripe company profile" Entity-level data
13 Statistics "[topic] statistics" "SaaS churn statistics 2026" Verified statistical data
14 Calculators "[topic] calculator" "LTV calculator" Calculation logic + inputs
Playbook Selection Matrix
If you have... Primary Playbook Secondary Layer
A product with many integrations Integrations Comparisons
A design/creative tool Templates + Examples Personas
A multi-segment audience Personas Comparisons
Local/regional presence Locations Directory
A tool/utility product Calculators + Conversions Glossary
Deep domain expertise Glossary + Statistics Curation
A competitor landscape to exploit Comparisons + Curation Directory
User-generated content Examples + Directory Profiles
Layering rule: Combine up to 2 playbooks per page set. Example: "Best coworking spaces in [city]" = Curation + Locations.
Keyword Pattern Mining
Step 1: Pattern Identification
Extract the repeating structure from seed keywords:
Seed: "react developer salary san francisco" Pattern: [role] salary [city] Variables: role (200+ options), city (500+ options) Max pages: 200 x 500 = 100,000
Step 2: Volume Distribution Analysis
Not all variable combinations have search volume. Map the distribution:
Tier Volume Range Typical % of Total Pages Strategy
Head 1,000+ monthly 2-5% Priority indexation, highest content quality
Torso 100-999 monthly 15-25% Standard template, full deployment
Long-tail 10-99 monthly 40-50% Template with conditional content blocks
Zero-volume < 10 monthly 20-40% Noindex OR skip unless data is uniquely valuable
Step 3: Intent Classification
For each pattern, verify intent consistency:
Intent Type Template Implications CTA Strategy
Informational Data-heavy, educational content Newsletter, related content
Commercial investigation Comparison tables, pros/cons Free trial, demo
Transactional Pricing, availability, features Buy now, sign up
Navigational Brand-specific, direct answer Product page link
Data Pipeline Architecture
Pipeline Design
[Data Source] → [Extraction] → [Transformation] → [Enrichment] → [Validation] → [Template Population] → [Quality Check] → [Publish]
Data Quality Gates
Every record must pass these gates before page generation:
Gate Check Failure Action
Completeness All required fields populated Skip page, log for manual review
Accuracy Data matches source, no staleness > 90 days Flag for refresh
Uniqueness No duplicate records Merge or deduplicate
Minimum richness Page will have > 300 words of unique content Skip or enrich
Legal compliance Data usage rights verified Block publication
Update Cadence
Data Type Recommended Update Frequency Staleness Penalty
Pricing data Weekly High (users notice immediately)
Company/product data Monthly Medium
Statistical data Quarterly Low if year-tagged
Glossary/educational Semi-annually Very low
Location data Monthly Medium (closures, address changes)
Template Design System
Page Architecture
Every programmatic page must have these zones:
┌─────────────────────────────────────┐ │ Zone 1: Unique Header │ H1 with target keyword, unique intro paragraph ├─────────────────────────────────────┤ │ Zone 2: Primary Data Section │ The core data/content for this specific page ├─────────────────────────────────────┤ │ Zone 3: Contextual Analysis │ Insights, comparisons, trends specific to this entity ├─────────────────────────────────────┤ │ Zone 4: Related Data │ Adjacent data points that add depth ├─────────────────────────────────────┤ │ Zone 5: Internal Navigation │ Related pages, breadcrumbs, category links ├─────────────────────────────────────┤ │ Zone 6: CTA │ Conversion element matched to intent └─────────────────────────────────────┘
Uniqueness Requirements
Each page MUST have at least 3 of these 5 uniqueness sources:
-
Unique data points -- Numbers, facts, or attributes specific to this entity
-
Conditional content blocks -- Sections that appear/disappear based on data attributes
-
Calculated insights -- Derived metrics (percentages, comparisons, rankings)
-
Contextual recommendations -- "If X, then Y" advice blocks based on the data
-
User-generated content -- Reviews, comments, or community contributions
URL Structure
Always use subfolders. Never subdomains for pSEO.
Pattern URL Template Example
Location /[service]/[city]/
/coworking/austin/
Comparison /compare/[a]-vs-[b]/
/compare/notion-vs-confluence/
Integration /integrations/[partner]/
/integrations/slack/
Glossary /glossary/[term]/
/glossary/churn-rate/
Persona /[product]-for-[audience]/
/crm-for-real-estate/
Quality Control Framework
Pre-Publication QA Checklist
Content Quality:
-
Each page has > 300 words of unique content (not counting shared template elements)
-
H1 is unique and contains the target keyword
-
Meta title is unique (< 60 chars) and meta description is unique (< 155 chars)
-
No broken data references (empty fields rendered as "N/A" or blank)
-
At least 2 conditional content blocks triggered per page
-
No duplicate pages targeting the same keyword
Technical SEO:
-
Canonical tag points to self
-
Hreflang tags if multilingual
-
Schema markup renders without errors
-
Page loads in < 3 seconds
-
Mobile responsive
Internal Linking:
-
Breadcrumb trail is complete
-
3-5 related pages linked contextually
-
Hub page links to this page
-
No orphan pages in the set
Thin Content Detection
Run this check against every generated page:
Signal Threshold Action
Unique word count < 200 unique words Block publication
Content similarity to another page in set
80% Jaccard similarity Merge or differentiate
Data fields populated < 60% of template fields Skip or enrich
User time-on-page (post-launch) < 15 seconds average Review and improve
Bounce rate (post-launch)
85% Review intent match
Internal Linking Architecture
Hub-and-Spoke Model
┌─────────┐
│ HUB │ /coworking/
│ PAGE │ (ranks for "coworking spaces")
└────┬────┘
┌──────────────┼──────────────┐
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ SPOKE 1 │ │ SPOKE 2 │ │ SPOKE 3 │
│ /austin/│ │ /denver/│ │ /seattle/│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
Cross-links between related spokes
Linking rules:
-
Hub links DOWN to every spoke (or top 50 spokes if > 200 pages)
-
Every spoke links UP to the hub
-
Spokes link ACROSS to 3-5 related spokes (geographic proximity, thematic similarity)
-
Deep pages link UP to their spoke AND the hub
-
Cross-silo links only when contextually genuine
Pagination for Large Sets
If a hub page has > 50 spokes, implement paginated sub-hubs:
/coworking/ → Top cities + browse by state /coworking/california/ → All California cities /coworking/california/page/2/ → Paginated if > 25 cities
Indexation Strategy
Crawl Budget Management
Page Set Size Strategy
< 500 pages Single XML sitemap, submit all
500-5,000 Segmented sitemaps by category
5,000-50,000 Segmented sitemaps + priority scoring + IndexNow
50,000+ Programmatic sitemap generation + crawl budget monitoring + strategic noindex
Indexation Priority
Priority Pages Action
P0 Hub pages Submit immediately, internal link from homepage
P1 Head-volume spokes (top 10%) Submit in first sitemap batch
P2 Torso-volume spokes Submit in second batch, 1-2 weeks later
P3 Long-tail spokes Submit gradually over 4-6 weeks
P4 Zero-volume pages Noindex unless data is uniquely valuable
IndexNow Integration
For large-scale updates, use IndexNow to notify search engines immediately:
POST https://api.indexnow.org/indexnow { "host": "yoursite.com", "key": "your-api-key", "urlList": ["https://yoursite.com/page1", "https://yoursite.com/page2"] }
Launch Sequence
Phase 1: Pilot (Week 1-2)
-
Deploy 20-50 pages from head-volume tier
-
Submit sitemap with pilot pages only
-
Monitor indexation rate daily
-
Check for crawl errors in Search Console
Phase 2: Scale (Week 3-6)
-
Deploy remaining torso-volume pages in batches of 100-500
-
Add cross-links between deployed pages
-
Monitor thin content warnings
-
Track impressions in Search Console
Phase 3: Long-Tail (Week 7-12)
-
Deploy long-tail pages
-
Noindex zero-volume pages (keep them crawlable but not indexed)
-
Begin link acquisition outreach for hub pages
Phase 4: Optimization (Ongoing)
-
A/B test template variations on head-volume pages
-
Refresh stale data quarterly
-
Add conditional content blocks based on engagement data
-
Monitor for keyword cannibalization across the set
Post-Launch Optimization
Metrics Dashboard
Metric Frequency Target
Indexation rate Weekly
90% of submitted pages indexed within 60 days
Organic impressions Weekly Trending up month-over-month
Average position (by tier) Bi-weekly Head pages: top 10; Torso: top 30
Click-through rate Monthly
3% for head pages
Bounce rate Monthly < 70%
Conversion rate Monthly
1% for transactional intent
Pages per session Monthly
1.5
Optimization Playbook
Signal Diagnosis Action
Indexed but not ranking Content quality or authority gap Enrich content, build links to hub
Ranking but low CTR Title/description not compelling A/B test meta titles
Ranking but high bounce Intent mismatch or thin content Audit against search intent, add data
Deindexed after initial indexing Thin content penalty Improve uniqueness, reduce similarity
Crawled but not indexed Quality threshold not met Add more unique content per page
Anti-Patterns and Penalty Avoidance
Anti-Pattern Why It Fails Prevention
City-name swapping Same content + different city = doorway page penalty Each location page needs unique local data
Keyword stuffing in templates Unnatural density triggers spam filters Keep keyword density 1-2%, write naturally
Generating pages for zero-demand queries Wastes crawl budget, signals low quality Validate demand before generating
No internal links to pSEO pages Orphan pages get deprioritized Connect every page to the hub-spoke structure
Stale data never refreshed Users lose trust, Google notices Set update cadence per data type
All pages identical structure Lack of variation signals automation to Google Use 3-5 template variants
Decision Matrix: Build vs Skip
Score each dimension 1-5, then apply the threshold.
Dimension Weight 1 (Skip) 5 (Build)
Search demand 30% < 1K aggregate monthly
50K aggregate monthly
Data quality 25% Public/scraped, easily replicated Proprietary, defensible
Competitive gap 20% DR gap > 40, strong incumbents DR gap < 15, weak/no incumbents
Template feasibility 15% Each page needs unique editorial Clean template fits all variations
Business alignment 10% No conversion path from these pages Direct path to core product
Scoring guide:
-
4.0+ weighted average: Build immediately
-
3.0-3.9: Build if resources allow, validate with pilot first
-
2.0-2.9: Invest in data quality or authority first
-
< 2.0: Do not build
Output Artifacts
Artifact Format Description
Opportunity Analysis Markdown table Keyword patterns x volume x data source x difficulty x business alignment
Playbook Recommendation Decision matrix If/then mapping with rationale and real-world examples
Page Template Specification Annotated wireframe (markdown) URL pattern, zone structure, uniqueness sources, conditional logic
Data Pipeline Spec Flow diagram (text) Source > extraction > transformation > validation > publication
Quality Scorecard Checklist + thresholds Pre-publication QA gates with pass/fail criteria
Indexation Plan Phased timeline Priority tiers, sitemap structure, crawl budget allocation
Post-Launch Dashboard Metric table KPIs, targets, review cadence, optimization triggers
Related Skills
-
seo-audit -- Run after pSEO pages are live to diagnose indexation issues, thin content warnings, or ranking problems across the page set.
-
schema-markup -- Add structured data to pSEO templates (Product, FAQ, LocalBusiness) for rich snippet eligibility at scale.
-
site-architecture -- Plan hub-and-spoke structure and crawl budget management for large pSEO deployments (500+ pages).
-
competitor-alternatives -- Use the Comparisons playbook when building "[X] vs [Y]" pages; competitor-alternatives has dedicated comparison page frameworks.
-
content-creator -- Use when individual pages in the set need editorial-quality unique content beyond template generation.