/bench-debug <doc_id>
Compares parsing output with ground-truth for a specific document and analyzes failure causes.
Usage
/bench-debug 01030000000189
Execution Steps
Run benchmark for the specific document
./scripts/bench.sh --doc-id <doc_id>
Compare files
-
Ground-truth: tests/benchmark/ground-truth/markdown/<doc_id>.md
-
Prediction: tests/benchmark/prediction/opendataloader/markdown/<doc_id>.md
-
Original PDF: tests/benchmark/pdfs/<doc_id>.pdf
Analyze differences
-
Missing/extra text locations
-
Table structure differences (TEDS score causes)
-
Heading level mismatches (MHS score causes)
-
Reading order errors (NID score causes)
Identify root causes
-
Which PDF elements caused the issue
-
Which Java core components are involved
Suggest improvements
-
Java classes/methods that need modification
-
Expected impact scope
Reference Files
-
ground-truth/reference.json : Per-document element info (categories, coordinates, etc.)
-
java/opendataloader-pdf-core/ : Core parsing logic
Example Output
Document 01030000000189 Analysis:
Overall: 0.2763 (one of the worst performing documents)
Issues:
-
2 of 3 tables not detected (TEDS: 0.15)
- Table boundary detection failed
- Related code: TableDetector.java
-
Reading order errors (NID: 0.45)
- Multi-column layout handling failed
- Related code: ColumnDetector.java
Recommended Actions:
- Adjust clustering threshold in TableDetector
- Improve multi-column detection logic