table-schema

Table Schema (two layers: index vs Appendix)

Tables are not decorations; they are compression.

A common failure mode in this pipeline: the first table that gets generated looks like an internal index. So we separate tables into two layers:

outline/tables_index.md (internal)
purpose: coverage/debugging + fast evidence scan
allowed to be more exhaustive
should NOT be inserted into the paper
outline/tables_appendix.md (reader-facing)
purpose: publishable survey tables (clean layout + high information density)
can be inserted into the final PDF as Appendix

This skill designs both layers before filling.

Default mode: semantic (LLM-first)

Treat this as a design task, not running a script.

If a column cannot be filled from existing evidence packs without guessing, the schema is wrong.

Roles (use explicitly)

Table Designer (reader lens)

Mission: choose tables that answer reader questions, not pipeline questions.

Do:

make each table answer one question
keep tables small enough to fill (two good tables beat one impossible mega-table)

Avoid:

internal/log-style tables inside the paper
column labels that only make sense inside this repo

Evidence Steward (fillability)

Mission: refuse schemas that require invented facts.

Do:

map every column to concrete upstream fields
reject columns that would force long paragraph cells

Avoid:

TODO columns
placeholders ("TBD", "...", "(placeholder)")

Workflow (explicit inputs)

Use GOAL.md to keep the reader question and scope stable.
Use outline/outline.yml to align table row units with the paper structure.
Use outline/subsection_briefs.jsonl to ground table dimensions/axes in the approved structure.
Use outline/evidence_drafts.jsonl to ensure every planned column is fillable without guessing.

Inputs

outline/outline.yml
outline/subsection_briefs.jsonl
outline/evidence_drafts.jsonl
Optional: GOAL.md

Output

outline/table_schema.md

Non-negotiables (schema contract)

Minimum definitions:
Index tables: >=2
Appendix tables: >=2
Every table definition must include:
the question it answers
the row unit (H3 / benchmark / work / failure mode)
columns (with cell style constraints)
evidence mapping (which upstream fields fill which columns)
Cell style: short phrases; avoid paragraph cells.
Paper voice: Appendix table captions/columns must be publishable (no pipeline jargon).

Recommended defaults (arxiv-survey family)

Index tables (for table-filler -> outline/tables_index.md )

I1) Subsection map (axes + representative works)

Row unit: H3
Columns: subsection; axes; representative works
Evidence sources: subsection_briefs.axes

citations from evidence_drafts

I2) Concrete anchors (benchmarks / numbers / caveats)

Row unit: H3
Columns: subsection; anchor facts; representative works
Evidence sources: anchor_sheet.anchors

Appendix tables (for appendix-table-writer -> outline/tables_appendix.md )

A1) Method/architecture map (representative works)

Row unit: work/system line
Columns: work; core idea; loop + interface assumptions; key refs
Evidence sources: evidence_drafts (comparisons + definitions) + anchor_sheet

A2) Evaluation protocol / benchmark map

Row unit: benchmark or evaluation setting (fallback: protocol dimension)
Columns: benchmark/setting; task+metric; key protocol constraints; key refs
Evidence sources: anchor_sheet

evidence_drafts.evaluation_protocol

Positive / negative examples

Good (publishable question + fillable columns):

Question: "Which benchmarks anchor evaluation in this area, and what task/metric/constraints do they imply?"
Columns: Benchmark; Task+metric; Protocol constraints; Key refs
Evidence mapping: evidence_drafts.evaluation_protocol

anchor_sheet

Bad (internal/pipeline voice):

Question: "What is evidence readiness + verification needs?"
Columns: evidence levels, missing fields, TODO checklist

If you want internal diagnostics, put them in an audit report, not in reader-facing tables.

Script (optional bootstrap)

Quick Start

python .codex/skills/table-schema/scripts/run.py --help
python .codex/skills/table-schema/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <workspace_dir> (required)
--unit-id <id> (optional; used only for runner bookkeeping)
--inputs <outline;briefs;packs;goal> (optional; override inputs)
--outputs <relpath> (optional; defaults to outline/table_schema.md )
--checkpoint <C#> (optional; ignored by the bootstrapper)

Examples

Bootstrap a two-layer schema (index + Appendix):

python .codex/skills/table-schema/scripts/run.py --workspace workspaces/<ws>

Write to a custom schema path (rare):

python .codex/skills/table-schema/scripts/run.py --workspace workspaces/<ws> --outputs outline/table_schema.md

Notes:

Use the script as a starting point, then refine the schema as a paper artifact.
If a column cannot be filled from evidence packs without guessing, the schema is wrong.

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

pdf-text-extractor

latex-compile-qa

draft-polisher

citation-verifier