Citation Verifier
Generate citations/ref.bib and ensure every entry has a traceable verification record in citations/verified.jsonl .
When network access is restricted, prefer a “record now, verify later” workflow: keep URLs/titles consistent and leave a clear verification note.
Input
- papers/paper_notes.jsonl
Outputs
-
citations/ref.bib
-
citations/verified.jsonl
Workflow (heuristic)
-
Collect bibkey , title , url , year , authors from papers/paper_notes.jsonl .
-
Write/refresh citations/ref.bib :
-
Prefer arXiv-style fields when arxiv_id / primary_category exist (eprint , archivePrefix , primaryClass ).
-
Write one verification record per BibTeX entry to citations/verified.jsonl with at least:
-
bibkey , title , url , date
-
If you cannot verify via network, record a clear notes field (e.g., “auto-generated; needs manual verification”) and/or request human confirmation depending on your policy.
Quality checklist
-
Every BibTeX entry has a corresponding verified.jsonl record.
-
No missing url /date /title in verification records.
Offline Mode
When network access is restricted, run in offline mode to produce auditable records now, then verify later.
-
Generate offline records: verification_status: offline_generated
-
Verify later (when network is available): --verify-only
verification_status
-
offline_generated : record was generated without network verification (needs later verification)
-
verified_online : URL/title verified successfully by the script
-
verify_failed : verification was attempted but failed (network error or title mismatch)
-
needs_manual_verification : missing/ambiguous fields (e.g., empty url /title )
Script
Quick Start
-
python .codex/skills/citation-verifier/scripts/run.py --help
-
Offline (record now, verify later): python .codex/skills/citation-verifier/scripts/run.py --workspace <workspace_dir> --offline
All Options
-
--offline : do not attempt network verification; write verification_status=offline_generated
-
--verify-only : verify existing citations/verified.jsonl records (does not rewrite BibTeX)
-
--verification-note <text> : stored in citations/verified.jsonl notes
Examples
-
Generate BibTeX + offline verification records:
-
python .codex/skills/citation-verifier/scripts/run.py --workspace <ws> --offline --verification-note "auto-generated; needs manual verification"
-
Later, verify-only (when network is available):
-
python .codex/skills/citation-verifier/scripts/run.py --workspace <ws> --verify-only
Notes
-
Minimal requirement for every verification record: url , date , title .
-
The script sanitizes stray/unbalanced {} in titles to keep bibtex parsing robust.
-
The script escapes LaTeX special chars in text fields (& % $ # _ ) and rewrites superscript patterns like X^N or X$^N$ as X\textsuperscript{N} to keep LaTeX builds stable.
-
URLs are kept raw in BibTeX url fields (BibTeX styles wrap them with \url{...} ); @misc uses howpublished=\url{...} .
-
In offline mode, records are not truly verified; treat offline_generated as a to-do for human/network verification.
Troubleshooting
Common Issues
Issue: Missing bibkey / missing url in notes
Symptom:
- citations/ref.bib is missing entries, or verified.jsonl has empty url/title .
Causes:
- papers/paper_notes.jsonl lacks bibkey /url fields.
Solutions:
-
Ensure each core paper note has a stable bibkey and a canonical url .
-
Rerun citation generation after fixing notes.
Issue: verification_status=offline_generated
Symptom:
- Records exist but are not truly verified.
Causes:
- --offline was used, or network verification was unavailable.
Solutions:
-
When network is available, run --verify-only to upgrade records.
-
Or manually verify and update citations/verified.jsonl with notes.
Recovery Checklist
-
Every BibTeX entry has a matching citations/verified.jsonl record.
-
Verification records include url , date , title .