pseo-data

Design and implement the structured data architecture that powers programmatic SEO pages, including content models, data sources, slug generation, and data-fetching layers. Use when setting up or refactoring the data foundation for pSEO, designing content models, or building the data pipeline that feeds page templates.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pseo-data" with this command: npx skills add lisbeth718/pseo-skills/lisbeth718-pseo-skills-pseo-data

pSEO Data Architecture

Design and implement the structured data layer that feeds all programmatic SEO pages. This is the foundation every other pSEO skill depends on.

Core Principles

  1. Single source of truth: All page data flows from one data layer
  2. SEO-complete models: Every content model includes all fields needed for metadata, schema markup, and linking
  3. Unique slugs by construction: Slug generation enforces uniqueness at the data level
  4. Type safety: All data models are fully typed (TypeScript interfaces/types)
  5. Separation of concerns: Data fetching is decoupled from page rendering

Implementation Steps

1. Define Content Models

Create TypeScript interfaces for each page type using a two-tier model. The lightweight index tier is safe to hold in memory for all pages; the heavy full tier is loaded per-page only.

// Index tier: safe to load all at once (~1KB per page)
interface PageIndex {
  slug: string;               // unique, URL-safe
  title: string;              // page title (50-60 chars target)
  metaDescription: string;    // meta description (150-160 chars target)
  h1: string;                 // primary heading (can differ from title)
  canonicalPath: string;      // canonical URL path
  category: string;           // for hub-spoke and breadcrumbs
  lastModified: string;       // ISO date for sitemap
}

// Full tier: extends PageIndex with heavy fields (~50-500KB per page)
interface BaseSEOContent extends PageIndex {
  introText: string;
  bodyContent: string;
  faqs?: FAQ[];
  relatedSlugs?: string[];
  featuredImage?: SEOImage;
}

Extend BaseSEOContent for each page type with domain-specific fields. The interfaces above show the minimum required fields. See references/content-models.md for the full definitions (which add subcategory, tags, publishedDate, status, and more) and extended type examples (LocationPage, ProductPage, ComparisonPage, CategoryPage).

2. Build the Data-Fetching Layer

Create a centralized data module (e.g., lib/data.ts or src/data/index.ts) that exports:

  • getAllSlugs() - Returns all valid slugs for static generation. Must handle pagination internally when the data source has 1000+ records (fetch in batches, return the complete list).
  • getPageData(slug) - Returns full content for a single page
  • getPagesByCategory(category, opts?) - Returns pages in a category for hub pages. Accept optional limit and offset for paginated hub pages.
  • getRelatedPages(slug, limit?) - Returns related pages for internal linking
  • getAllCategories() - Returns all categories for navigation and hubs
  • getPageCount() - Returns total page count (useful for sitemap splitting and build diagnostics)

All functions must be:

  • Cached or memoized during build to avoid redundant reads
  • Typed with explicit return types
  • Guarded against missing or malformed data
  • Internally paginated when the data source imposes limits (e.g., CMS APIs with 100-item pages). The consumer should never need to handle pagination — the data layer abstracts it.

3. Implement Slug Generation

Design a slug strategy that:

  • Produces URL-safe, lowercase, hyphenated strings
  • Guarantees uniqueness across the entire dataset
  • Is deterministic (same input always produces same slug)
  • Includes a collision detection mechanism
  • Follows a consistent URL hierarchy (e.g., /category/page-slug)

4. Validate Data Integrity

Build a validation function or script that checks:

  • No duplicate slugs exist
  • All required fields are present and non-empty
  • Title and description lengths are within SEO targets
  • All category references resolve to valid categories
  • No orphan pages (pages not reachable through any category)

5. Set Up Data Source Integration

Based on the data source ($ARGUMENTS or detected):

JSON files: Create a data/ directory with typed JSON, a loader, and build-time validation.

CMS (headless): Create API client with typed responses, implement caching, handle pagination for 1000+ items.

Database: Create a query layer with connection pooling, implement cursor-based pagination, add query caching.

MDX files: Set up frontmatter schema validation, create a content loader with gray-matter parsing.

API: Create a typed API client, implement rate limiting and retry logic, add response caching.

Scale Limits

The in-memory and file-based patterns in this skill work up to ~10K pages. Beyond that:

  • 10K-50K pages: Requires a database (PostgreSQL, MySQL). In-memory index tier becomes borderline at 50K (~50MB). File-based data sources are too slow.
  • 50K-100K+ pages: Requires database + cache layer (Redis) + cursor-based pagination. getAllSlugs() must use cursor iteration, not array return. Data sufficiency gating prevents generating thin pages.

See pseo-scale for the complete database-backed data layer, sufficiency scoring, and scale-specific patterns.

Memory-Conscious Data Patterns

At 1000+ pages, how data is loaded matters more than what is loaded. A full content model with body text, FAQs, and images can be 50-500KB per page. Loading all pages into memory simultaneously will OOM.

Two-tier data model:

Split the data layer into lightweight index data and full page data. The PageIndex and BaseSEOContent interfaces from section 1 define the two tiers:

  • getAllSlugs(), getRelatedPages(), getPagesByCategory() — return PageIndex[] (lightweight, ~1KB per page)
  • getPageData() — returns BaseSEOContent (or an extended type) for a single page (heavy, ~50-500KB per page, only one at a time)

Never do this:

// Loads ALL full content into memory — will OOM at scale
const allPages = await Promise.all(slugs.map(s => getPageData(s)));

Instead:

// Process pages one at a time or in small batches
for (const slug of slugs) {
  const page = await getPageData(slug);
  await processPage(page);
  // page is GC'd after each iteration
}

CMS/API pagination:

  • Fetch in batches of 100-250 records
  • Yield or push to an array incrementally — don't hold all API responses in memory simultaneously
  • If using GraphQL, only request index fields in list queries, full fields in single-item queries

File Organization

lib/
  data/
    index.ts          # public API (re-exports)
    types.ts          # TypeScript interfaces
    fetcher.ts        # data source integration
    slugs.ts          # slug generation and validation
    validation.ts     # data integrity checks
    cache.ts          # build-time caching utilities

Quality Checks

Before considering this complete:

  • All content models extend BaseSEOContent (which extends PageIndex)
  • getAllSlugs() returns 0 duplicates
  • Data validation passes with zero errors
  • Data layer exports are fully typed with no any
  • Fetching is memoized for build performance
  • A test or script can validate the full dataset
  • Two-tier data model implemented (index data vs. full page data)
  • No function loads all full page content into memory simultaneously
  • CMS/API fetching uses batched pagination internally

Relationship to Other Skills

This skill provides the data foundation for:

  • pseo-templates: Consumes getPageData() and getAllSlugs()
  • pseo-metadata: Reads title, description, canonical from content models
  • pseo-schema: Uses structured fields for JSON-LD generation
  • pseo-linking: Uses getRelatedPages() and category data
  • pseo-quality-guard: Validates against the content models

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

pseo-audit

No summary provided by upstream source.

Repository SourceNeeds Review
General

pseo-llm-visibility

No summary provided by upstream source.

Repository SourceNeeds Review
General

pseo-discovery

No summary provided by upstream source.

Repository SourceNeeds Review