PDF Master Translator (V10 Architecture)

This skill provides a battle-tested, "bulletproof" pipeline for translating complex PDF documents. It was forged from extensive trial and error on NASA engineering specifications.

Do NOT attempt to use simple OCR or zero-shot LLM translation for complex engineering documents. They will fail. Use the translator_engine_v10.py script provided in this skill.

Core Capabilities & The V10 Pipeline

This skill relies on a Python script (scripts/translator_engine_v10.py) that implements a specific, multi-agent workflow:

Layout & Physical Isolation (Masking):
- Never ask an LLM to "ignore the picture and translate the text" on a messy scan.
- The pipeline first detects figures and tables.
- It physically whites out (masks) these regions on a temporary image.
- The "clean" image is sent for translation, eliminating visual hallucinations.
- Original figures are extracted, converted to Base64, and safely appended to the final HTML/PDF.
Holographic Context Injection:
- Masking creates fragmented sentences around the masked areas.
- To prevent the translation Agent from producing out-of-context or broken translations, the pipeline injects the raw, unformatted text stream of the entire page as a reference dictionary. The Agent uses this context to seamlessly bridge the visual gaps.
Protocol Downgrade (XML over JSON):
- Forcing LLMs to output thousands of words of Markdown inside a strict JSON structure is fragile and prone to escaping errors.
- The engine enforces simple XML tags (<HEADER>, <BODY>, <FOOTER>) for structural routing.
Strict Math & Symbol Rendering:
- Standard PDF renderers (like WeasyPrint) cannot execute JavaScript (MathJax).
- The script uses regex to intercept all LaTeX ( $...$ or $$...$$) and calls an external API (math.vercel.app) to render them as high-quality, embeddable SVG images.
- The Prompt strictly mandates the format **$Variable$**: Description for symbol glossaries, ensuring visual consistency.
Terminal Defense (Sanity Cleaner):
- The final step before PDF generation is a regex sweep to remove any leaked LLM artifacts (like ````markdownwrappers) or error placeholders (likeRetryError[]`) that might have survived the pipeline.

Usage Instructions

To use this skill, execute the translator_engine_v10.py script.

Prerequisites

Ensure the required dependencies are installed (typically handled via uv run if inline metadata is used) and the Gemini API key is set.

export GEMINI_API_KEY="your_api_key_here"
# If a proxy is required for your network:
export HTTPS_PROXY="http://127.0.0.1:10809"

Execution

Run the script, providing the path to the target PDF and the specific page range.

uv run ~/.npm-global/lib/node_modules/openclaw/skills/pdf-master-translator/scripts/translator_engine_v10.py /path/to/target.pdf --start <start_page> --end <end_page>

Important Operational Rules:

Always specify --start and --end explicitly.
For very large documents (>20 pages), it is highly recommended to run this using nohup ... & in the background, as the multi-agent cross-checking and API rate-limiting sleep cycles make this a long-running process.

Output

The script will generate a new PDF named [OriginalName]_V10_FINAL_P[start]-[end].pdf in the current working directory.

This PDF will feature:

A clear --- Page X --- divider for continuous reading.
Consistent Header and Footer markdown tables.
SVG-rendered math formulas.
A dedicated [ 原文图表/示意图 ] section at the bottom of relevant pages containing the extracted original diagrams.
(If applicable) A [ 图例符号说明 ] section containing translations of text found inside the diagrams.