manuscript-provenance

Computational provenance audit verifying that every number, table, figure, ordering, and terminology in a manuscript is derived from code and scripts — not manually entered. Cross-references LaTeX source against the codebase to detect hardcoded values, stale outputs, broken pipelines, and manual data entry. Companion to manuscript-review: that skill audits the document as prose; this skill audits whether the document is faithfully generated from code. Use when the user says "check provenance", "verify reproducibility", "audit my pipeline", "are my numbers from code", "check manuscript against scripts", "provenance audit", or any request to verify that manuscript content traces back to computational outputs.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "manuscript-provenance" with this command: npx skills add mathews-tom/praxis-skills/mathews-tom-praxis-skills-manuscript-provenance

Manuscript Provenance Audit

Purpose

Verify that a manuscript is a faithful rendering of computational outputs. Every number, table, figure, category label, ordering, and threshold in the document must trace to a specific script, config file, or pipeline output. Manual data entry in a manuscript is a reproducibility defect.

This skill produces a provenance map — a structured report linking each manuscript artifact to its generating code — and flags every break in the chain.

Companion skill: manuscript-review audits the document as prose (structure, argumentation, citations). This skill audits whether the document content is computationally grounded. Run both for complete pre-publication coverage.

Boundary Agreement with manuscript-review

Concernmanuscript-reviewThis skill (manuscript-provenance)
ReproducibilityDoes the paper describe enough to reproduce? (§6)Does the code actually produce what the paper claims? (§1, §7)
Figures/TablesLegible, accessible, well-formatted? (§12)Generated by scripts, not manual entry? (§2, §3)
Rendered visualsReadable at print scale? Floats near references? (§23)Figure generation script produces correct format? (§3)
HyperparametersListed in the paper with rationale? (§6)Values trace to config files, not hardcoded? (§1, §8)
Code availabilityStatement exists in the paper? (§17)Repo URL valid, README accurate, pipeline works? (§11)
TerminologyAbbreviations consistent within document? (§14)Terms match code identifiers? (§5)
Significant figuresConsistent precision within document? (§12)Precision matches script output? (§2)
Figure formatAppropriate format for document quality? (§12)Format generated by script, not manually exported? (§3)
Computational costReported in the paper? (§7)Values trace to benchmarking scripts? (§1)
Macro-prose coherenceProse framing appropriate for injected value? (§24)Value traced to code, macro manifest produced? (§4)
Cross-element consistencyProse, captions, figures, tables mutually consistent? (§24)All elements from same run/pipeline output? (§9)

Rule: This skill never judges prose quality. manuscript-review never opens the codebase. Each reads the other's report when available.

Integration point — Macro Manifest: This skill produces a macro manifest as part of the §4 audit: a structured list of every macro-injected value with:

  • Macro name (e.g., \bestf)
  • Resolved value (e.g., 0.847)
  • Source (script + output file that generates it)
  • Location(s) in manuscript text (file, line number, surrounding sentence)
  • Classification (TRACED / MACRO-TRACED / CONFIG-TRACED / UNTRACED / STALE)

manuscript-review's Pass 13 (Cross-Element Coherence, §24) consumes this manifest to check whether the prose surrounding each injected value is appropriate for the actual numeric value. Provenance owns "is this value computationally grounded?" Review owns "does the text wrapping this value make sense given what the value is?"

Scope

In scope:

  • Numbers, metrics, percentages in manuscript text
  • Tables (content, ordering, formatting)
  • Figures (generation scripts, data sources)
  • LaTeX macros (\newcommand, \def, \pgfmathsetmacro)
  • Terminology, mode names, mechanism labels, category names
  • Ordering of items in enumerations, tables, discussion
  • Config values (thresholds, hyperparameters, model names)
  • Pipeline completeness (raw data → final PDF)
  • Timestamp consistency (scripts vs outputs)

Out of scope:

  • Prose quality (→ manuscript-review)
  • Citation hygiene (→ manuscript-review)
  • Argumentation structure (→ manuscript-review)
  • Code quality/style (separate concern)

Inputs

This audit requires TWO artifacts:

  1. Manuscript source — LaTeX .tex files (preferred), or PDF/DOCX as fallback
  2. Codebase — the scripts, configs, and pipeline that generate manuscript content

If the user provides only one, ask for the other. LaTeX source is strongly preferred over compiled PDF — provenance auditing requires seeing the raw markup, macros, and input commands.

Workflow

Phase 1 — Inventory

1a. Manuscript Artifact Extraction

Read all .tex files (main + included via \input/\include). Extract:

  • Inline values: bare numbers in running text (percentages, counts, metrics, p-values, confidence intervals, thresholds, sizes)
  • LaTeX macros: all \newcommand, \def, \pgfmathsetmacro, and custom command definitions that carry data values
  • Tables: full content of every tabular/table environment — cell values, row/column ordering, headers
  • Figures: \includegraphics paths, caption content, referenced data
  • Input files: any \input{generated/*.tex} patterns that pull from script-generated LaTeX fragments
  • Labels and references: \label/\ref pairs for cross-referencing
  • Terminology: named modes, mechanisms, strategies, categories, method names used in prose
  • Ordered lists: any enumerated or ranked items (methods compared, features listed, results ordered)

Build an artifact registry — a flat list of every data-carrying element in the manuscript with its location (file, line number).

1b. Codebase Mapping

Scan the project directory. Identify:

  • Pipeline entry points: Makefile, snakemake, dvc.yaml, run.sh, main.py, or equivalent orchestration
  • Analysis scripts: files that produce numbers, tables, figures
  • Config files: config.toml, config.yaml, .env, params.yaml, hyperparameter files
  • Output directories: where scripts write results (results/, output/, figures/, tables/, generated/)
  • Generated LaTeX fragments: .tex files in output directories that scripts produce for \input inclusion
  • Data files: CSVs, JSON, HDF5, pickles that intermediate results flow through

Build a source registry — a flat list of every code artifact that produces or configures manuscript content.

Phase 2 — Provenance Tracing

For each entry in the artifact registry, attempt to establish a provenance chain: manuscript value → generated output → script → input data/config.

2a. Value Provenance

For every number in the manuscript:

  1. Search for the value in script outputs (logs, result files, generated LaTeX)
  2. Trace the output back to the script that produces it
  3. Verify the script reads from data/config (not hardcoded)
  4. Record the full chain or flag as UNTRACED

Classification:

  • TRACED — full chain from manuscript value to generating code
  • MACRO-TRACED — value defined in a LaTeX macro that is generated by a script
  • CONFIG-TRACED — value comes from a config file read by scripts
  • UNTRACED — no provenance chain found; manually entered
  • STALE — provenance chain exists but output is older than generating script

2b. Table Provenance

For each table:

  1. Is the table content generated by a script (CSV → LaTeX, or direct LaTeX generation)?
  2. Is the row/column ordering determined by code (sorted by metric, alphabetical, grouped by category) or manually arranged?
  3. Are header labels matching code-defined names?
  4. Are formatting choices (bold for best, significant figures) applied by code?

Classification:

  • GENERATED — entire table produced by script
  • PARTIAL — some cells generated, some manual
  • MANUAL — no generation script found
  • ORDER-MANUAL — content generated but ordering is manually set

2c. Figure Provenance

For each figure:

  1. Does a script produce the exact file referenced by \includegraphics?
  2. Does the script use a deterministic seed for reproducibility?
  3. Is the figure output path in the script consistent with the LaTeX reference?
  4. Are figure parameters (colors, labels, axis ranges) set in code or manually edited post-generation?

Classification:

  • GENERATED — script produces the exact file
  • POST-EDITED — script generates base figure, but manual edits detected (e.g., Illustrator metadata, different checksum than script output)
  • MANUAL — no generating script found
  • STALE — generating script modified after figure file

2d. Terminology Provenance

For each named mode, mechanism, category, or method label:

  1. Is the term defined in code (enum, constant, config key, class name)?
  2. Does the manuscript term match the code term exactly?
  3. If the manuscript uses a display-friendly name, is there an explicit mapping in code or config?

Classification:

  • CODE-DEFINED — term matches code definition
  • MAPPED — explicit code→display mapping exists
  • UNMAPPED — term appears in manuscript but not in code
  • INCONSISTENT — term appears in both but differs (e.g., code says greedy_search, manuscript says "Greedy Search" in some places and "greedy approach" in others)

2e. Ordering Provenance

For each ordered list, ranked comparison, or sequenced enumeration:

  1. Does code determine the ordering (sort by metric, alphabetical, enum order)?
  2. Does the manuscript ordering match the code-determined order?
  3. Are there items in the manuscript list not present in code output, or vice versa?

Classification:

  • CODE-ORDERED — ordering matches code output
  • MANUAL-ORDER — ordering differs from code output or no ordering logic in code
  • SUBSET-MISMATCH — manuscript lists different items than code produces

Phase 3 — Infrastructure Audit

3a. LaTeX Macro Hygiene

  • Every data-carrying macro should be generated by a script, not hand-typed in the preamble
  • Pattern to detect: \newcommand{\someMetric}{42.7} defined directly in .tex files (bad) vs \input{generated/metrics.tex} where that file is script output (good)
  • Flag macros whose values appear nowhere in script outputs
  • Flag macros defined in main .tex files that carry numeric/data values

3b. Pipeline Completeness

  • Does a single command reproduce all manuscript artifacts from raw data?
  • Is the pipeline documented (Makefile, README, CI config)?
  • Are intermediate steps cached or do they require full re-execution?
  • Are random seeds fixed for reproducibility?
  • Are software versions pinned (requirements.txt, environment.yml, lock files)?

3c. Config/Code Separation

  • Are hyperparameters, thresholds, model names in config files?
  • Are file paths relative (portable) or absolute (fragile)?
  • Are credentials, API keys, or machine-specific paths absent from committed code?
  • Is there a single config entry point or are settings scattered across scripts?

3d. Stale Output Detection

  • Compare modification timestamps: script vs its output files
  • Flag outputs that are older than their generating scripts (stale)
  • Flag outputs with no corresponding script (orphaned)
  • Flag scripts with no corresponding output (dead code or unrun)

3e. Version Pinning

  • Are dependencies locked (requirements.txt with versions, conda environment.yml, poetry.lock, package-lock.json)?
  • Are data versions tracked (DVC, git-lfs, data checksums)?
  • Is the manuscript itself versioned alongside code (same repo, tagged releases)?

Phase 4 — Cross-Reference and Manifest Generation

4a. Macro Manifest Generation

Produce the macro manifest — the primary handoff artifact to manuscript-review. For every data-carrying macro identified in Phase 1a and traced in Phase 2a:

Macro: \bestf
Value: 0.847
Source: results/metrics.json → scripts/generate_latex_macros.py → generated/metrics.tex
Locations:
  - paper.tex:142 — "achieving an F1 score of \bestf{}"
  - paper.tex:287 — "The \bestf{} result represents a substantial improvement"
  - abstract.tex:8 — "...with \bestf{} F1 score"
Classification: MACRO-TRACED

Also include every bare number (not a macro) found in Phase 1a that carries data (metrics, counts, parameters) — these are values that SHOULD be macros but aren't:

Bare value: 50
Location: paper.tex:198 — "convergence after 50 epochs"
Should-be-macro: YES — this is a training parameter, should trace to config
Classification: UNTRACED (no macro, no provenance)

Save the manifest as [manuscript-name]-macro-manifest.json alongside the provenance report. This file is consumed by manuscript-review Pass 13 (Cross-Element Coherence) to verify prose-value appropriateness.

4b. Cross-Reference with manuscript-review

If a manuscript-review report exists for this manuscript, load it and:

  • Map UNTRACED values to manuscript-review §6 (Methodology) and §7 (Results) findings — provenance gaps often co-occur with reproducibility concerns
  • Flag terminology inconsistencies as potential §14 (Abbreviations) or §15 (Notation) issues in the manuscript-review framework
  • Feed HIGH-priority provenance issues as §6/§7 failures
  • Feed macro manifest into manuscript-review §24 (Cross-Element Coherence) findings — macro values whose surrounding prose uses inappropriate qualitative language ("marginal" for 14.3%, "dramatic" for 0.3%) are §24 failures

If no manuscript-review report exists, recommend running it as a companion audit and note that the macro manifest is available for its Pass 13.

Phase 5 — Report Generation

Load references/checklist.md and references/report-template.md.

Read references/checklist.md
Read references/report-template.md

Generate the provenance report following the template structure:

  1. Provenance Summary — overall score, breakdown by category
  2. Provenance Map — each manuscript artifact linked to its source
  3. Defect Registry — every UNTRACED, STALE, MANUAL, INCONSISTENT finding
  4. Infrastructure Assessment — pipeline, config, versioning status
  5. Remediation Queue — prioritized fixes
  6. Checklist Status — full checklist with pass/fail per checkpoint

Phase 6 — Output

Save two files in the manuscript directory:

  1. [manuscript-name]-provenance-report.md — the full provenance report
  2. [manuscript-name]-macro-manifest.json — the structured macro manifest for consumption by manuscript-review Pass 13

The macro manifest JSON structure:

{
  "macros": [
    {
      "name": "\\bestf",
      "value": "0.847",
      "source_chain": "results/metrics.json → scripts/gen_macros.py → generated/metrics.tex",
      "locations": [
        {
          "file": "paper.tex",
          "line": 142,
          "context": "achieving an F1 score of \\bestf{}"
        },
        {
          "file": "paper.tex",
          "line": 287,
          "context": "The \\bestf{} result represents a substantial improvement"
        }
      ],
      "classification": "MACRO-TRACED"
    }
  ],
  "bare_numbers": [
    {
      "value": "50",
      "location": {
        "file": "paper.tex",
        "line": 198,
        "context": "convergence after 50 epochs"
      },
      "section": "methodology",
      "should_be_macro": true,
      "rationale": "Training parameter — should trace to config",
      "classification": "UNTRACED"
    }
  ]
}

Present to the user:

  • Provenance coverage percentage (TRACED / total artifacts)
  • Count of UNTRACED / STALE / MANUAL findings by severity
  • Count of bare numbers that should be macros
  • Top 5 remediation actions
  • Pipeline completeness verdict
  • Note that macro manifest is available for manuscript-review Pass 13

Severity Classification

  • CRITICAL — Value in manuscript has no provenance chain AND is a key result (main finding, abstract metric, table headline number). This means the paper's core claims cannot be verified from code.

  • HIGH — Value/table/figure is untraced or stale, and appears in results or methodology sections. Reproducibility gap.

  • MEDIUM — Terminology mismatch, manual ordering, partial table generation, config values hardcoded in scripts. Maintenance and consistency risk.

  • LOW — Minor issues: display-name mapping missing but terms are close, non-critical figures without generation scripts, cosmetic post-editing of generated figures.

Core Principles

  • Binary provenance. Every artifact is either traced or not. No "partially reproducible" — partial means broken.

  • Code is truth. When manuscript and code disagree, the manuscript is wrong until proven otherwise. Flag the disagreement; do not assume the manuscript author "meant to" override code output.

  • Macros over magic numbers. Every data value in LaTeX should be a macro. Every macro should be generated. No exceptions for "obvious" values.

  • Pipeline as proof. If make (or equivalent) does not produce the PDF from raw data, the manuscript is not reproducible. Partial pipelines get partial credit, not a pass.

  • Config is not code. Hyperparameters, thresholds, model names, file paths — all belong in config files, not scattered through script bodies.

  • Ordering is data. The sequence of items in a table or enumeration is an assertion. It must come from code (sort order, enum definition) not from the author's sense of what "looks right."

  • Timestamps matter. A figure generated last month from a script modified yesterday is suspect. Stale outputs are provenance failures.

  • Companion, not replacement. This audit checks computational grounding. manuscript-review checks document quality. Both are needed. Neither subsumes the other.

Example Invocation Patterns

User says any of:

  • "Check provenance"
  • "Are my numbers from code"
  • "Audit my pipeline"
  • "Verify reproducibility"
  • "Check manuscript against scripts"
  • "Provenance audit"
  • "Are my tables generated"
  • "Do my figures come from scripts"
  • "/manuscript-provenance"

All trigger this skill.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

rag-auditor

No summary provided by upstream source.

Repository SourceNeeds Review
Security

dependency-audit

No summary provided by upstream source.

Repository SourceNeeds Review
General

manuscript-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

html-presentation

No summary provided by upstream source.

Repository SourceNeeds Review