Deck Pipeline

A 4-stage, audit-trailed Chinese→English deck globalization system with a swappable project profile.

This skill bundles the generic deck-globalization engine (originally upstream DeckGlobalizer v2.1.1) and an editable PROFILE block (palette, fonts, glossary, style preferences). The two are separated by section so the profile can be swapped per project / brand without touching the engine.

For a marketing-style overview, see README.md in this directory. For implementation, see scripts/ and the per-stage runbooks below.

0. Modes & activation

Mode	Trigger	Stages
Full pipeline	CN deck (± EN draft) + user wants English output	1 → 2 → 3 → 4
Polish-only	Single-language deck + "layout / format only / skip translation"	1 → 3 → 4
Reverse-sync only	User hand-edited a PPT after a comparison Excel was generated	3.5 (sync sub-routine)

Detect the mode in the first turn. If ambiguous, ask one yes/no question ("This deck is already in EN — should I just polish layout, or also rewrite McKinsey-style?"). Do not guess silently.

1. PROFILE block — defaults (swappable)

Edit this block to retarget the skill for your project / brand. Everything below this block is profile-agnostic.

PROFILE:
  # ---- L1 Tokens ----
  palette:
    # Replace with your brand colors.
    ink:          "#1A1A1A"
    primary:      "#000000"   # accent / brand primary
    soft:         "#FFFFFF"   # soft fill behind banners
    page_bg:      "#FFFFFF"
  fonts:
    # Choose a serif title face + a sans-serif body face for best contrast.
    title:        "Georgia"
    body:         "Verdana"
    title_bold:   true
  unit_table:
    # Chinese number magnitudes → English. 亿 is 100M, NOT "billion".
    "百万":       "M"
    "千万":       "10M"
    "亿":         "100M"
    "十亿":       "1B"
    "百亿":       "10B"
    "千亿":       "100B"
    "万亿":       "1T"
    # Currency suffix is left to the user — append "$" / "RMB" / "€" as appropriate.

  # ---- L2 Constants ----
  size_ladder:           [22, 14, 10, 8, 6, 4]   # H1, H2, body, caption, footnote, source
  floors:
    body:     7
    caption:  6
    source:   4
  compression_step:      0.1     # discrete -0.1pt iterations only
  line_height_default:   1.25
  line_height_fallback:  1.15    # used before sub-floor compression
  quote_style:           "single"  # 'McKinsey' single quotes
  footer_format:         "Confidential · For Intended Recipients Only · {month} {year}"
  separator_in_footer:   "·"     # middle dot, NOT em-dash

  # ---- L1 Glossary (extensible) ----
  # Replace the example entries below with your project's locked terms.
  # Categories are illustrative; you can rename / add / remove.
  glossary:
    locked:
      people_orgs:
        # "<source term>": "<canonical translation>"
        # e.g. "John Smith": "John Smith"
        # e.g. "Acme Capital": "Acme Capital"
        {}
      business_terms:
        # Common Chinese business-deck idioms with industry-standard
        # English mappings. Edit / extend as needed.
        "流水":     "gross revenue"
        "私域":     "owned audience"
        "出海":     "global expansion"
      domain_specific:
        # Project / industry / domain terms.
        # "<source term>": "<canonical translation>"
        {}
    rejected_rewrites:
      # Entries the user vetoed during prior sessions.
      # Format: { source: "...", proposed: "...", reason: "..." }
      []
    pending: []
    session_added: []

  # ---- Style rules ----
  # McKinsey is the default baseline. Additional style references can be
  # uploaded and distilled via scripts/style_distill.py; their rules layer
  # ON TOP of the McKinsey base.
  style_baseline: "mckinsey"
  mckinsey:
    title_is_takeaway:      true   # title = the so-what, not the topic
    lead_with_so_what:      true
    parallel_structure:     true   # bullets share tense, opening part-of-speech
    strong_action_verbs:    true   # cut "is/has", prefer concrete verb
    cut_filler:
      - "in order to → to"
      - "a number of → many"
      - "due to the fact that → because"
      - "at this point in time → now"
    case:                   "sentence"   # lowercase unless proper noun or locked term
    em_dash_policy:         "use em-dash for parentheticals; use · (middle dot) in lists/footers"
  style_references:
    # Each entry is a PDF / .pptx reference. style_distill.py reads it and
    # emits rules (cadence, signature phrases, paragraph length, tone) that
    # layer on top of the McKinsey base. Conflicts: more recent entry wins;
    # user is asked at first conflict.
    # Example:
    # - path: "/path/to/sample.pdf"
    #   weight: 0.7
    []

  # ---- Structural anchor heuristics ----
  anchor_detection:
    min_pages: 3              # appears on ≥3 slides
    match_on:                 # signature components
      - position_xy
      - fill_color
      - font_size_class
    auto_protect: true

  # ---- Overflow estimator ----
  overflow:
    severity:
      high: 1.5
      med:  1.15
      low:  1.0
    surface_only: "high"      # surface MED/LOW only when explicitly asked
    defer_to_user_threshold: 10   # if HIGH > 10 → ask user to render externally

  # ---- CN ↔ EN slide alignment ----
  # Default is 1:1 (EN slide N maps to CN slide N).
  # Set overrides only when the two decks have been restructured.
  # Pass this config to excel_sync.py via `--cn-offset <yaml>`.
  cn_en_slide_offset:
    default: 0           # offset added to EN slide number (0 = 1:1)
    overrides: {}        # e.g. {"9-26": -1, "20": null}
                         # int = relative offset; null = no CN counterpart

Profile-agnostic note: all sections below treat PROFILE as an opaque dict. Do not hardcode project-specific values anywhere outside the PROFILE block.

2. Pipeline stages

Each stage has: inputs · what it does · outputs · stop-and-ask conditions.

Stage 1 — Sense Pass

Inputs: one or two .pptx paths (CN, optional EN draft) What it does:

Run scripts/sense_pass.py to extract:
- palette (top fill colors)
- font usage histogram
- size distribution
- title-zone shapes (top ≤ 600K EMU)
- layout heuristics
Cross-check sensed values against PROFILE.palette / PROFILE.fonts. If a sensed font is NOT in the whitelist AND NOT in SKIP_POLLUTION, record it as font pollution.
Surface candidate glossary entries: any CN noun phrase that appears ≥2 times and isn't already in glossary.locked.

Outputs:

Style_Manifest.md (in-memory; not written to disk unless requested)
pollution_report (slide → font → count)
candidate_glossary (term → count → sample context)

Stop-and-ask:

Candidate glossary surfaces a term Claude can't confidently translate → ask user, write answer to glossary.session_added.
Sensed primary palette color differs from PROFILE.palette.primary → ask whether to update profile or keep existing.

Stage 2 — McKinsey Translation (skipped in polish-only mode)

Inputs: Stage 1 outputs + the CN deck (and optional EN draft for diff context) + any uploaded style_references.

Style layering: McKinsey base rules (PROFILE.mckinsey) apply first. If PROFILE.style_references is non-empty, run scripts/style_distill.py on each reference before translation begins; the distilled rules (cadence, signature phrases, paragraph length, tone) layer on top. More recent entry wins on conflict; ask user at the first conflict.

Page-by-page execution (hard requirement):

Overall confirmation first — after Stage 1, show the user the planned per-page edit count + sample of style rules in effect; wait for "go".
Then loop slides 1 → N, one at a time:
- Collect paragraph-level CN text on this slide via scripts/extract.py.
- For each paragraph, produce EN per the layered style rules:
  - lowercase by default; title = so-what; parallel bullets; strong verbs; filler-word table applied; glossary locked inline
  - any unknown term → STOP, ask user, write to session_added
- Build the slide's edit batch as a JSON object.
- Run scripts/apply.py with the slide's batch → writes that slide's changes into <file>-en-polished-<date>.pptx AND appends rows to <file>-bilingual-diff-<date>.xlsx immediately.
- Checkpoint: print "P{n} done — N changes applied. Continue?" and wait for user OK before moving to P{n+1}.
- User can interject "back to P{n-1}" or "stop here" between pages.

Why per-page (not all-at-once):

The user can review and steer mid-stream.
A bad assumption on P3 doesn't propagate to P27 unnoticed.
Excel grows incrementally — survives any mid-session interruption.
Token-efficient: only one slide's context in active scratchpad.

Stage 3 — Layout Audit

Inputs: the post-translation deck (or, in polish-only mode, the raw deck) What it does:

3a. Font pollution cleanup

Run scripts/layout_audit.py --fix:

For every run whose font.name is NOT in the title/body whitelist OR ends in a style suffix (Bold / Regular / Italic / Light):
- Strip the suffix
- Set font.name to the pure family
- Set font.bold / font.italic attributes accordingly
Skip any face in the configured SKIP_POLLUTION set.

3b. Structural-anchor detection

Run scripts/anchor_detect.py:

For each shape, compute a signature: (rounded_position, fill_color, font_size_class).
Group across slides. Any signature occurring on ≥ PROFILE.anchor_detection.min_pages pages becomes an anchor.
Build per_page_protect[page] = [anchor_shape_ids...].
Surface the anchor list to the user. They can add/remove.

3c. Overflow estimation

Run scripts/overflow_recheck.py:

Honor auto_size (skip if SHAPE_TO_FIT_TEXT or TEXT_TO_FIT_SHAPE).
Read actual margin_*.
Use PROFILE.line_height_default = 1.25 initially. If a shape is flagged, try 1.15 as a what-if before flagging as HIGH.
Per-character width by class (narrow iIl, wide MW, digits, upper, space).
Greedy word-wrap simulation.
Emit only HIGH (ratio > PROFILE.overflow.severity.high) by default.

If HIGH count > PROFILE.overflow.defer_to_user_threshold:

Do not dump 30+ rendered PNGs into the session.
Tell the user: "Render to PDF/PNG via Keynote or PowerPoint, tell me which pages look broken, I'll fix those targeted pages."

3d. Compression (when user OKs a fix)

For each shape needing fix:

Is it in per_page_protect[page]? → SKIP (it's an anchor).
Try widening: increase shape width until ratio < 1.0 OR shape collides.
Still > 1.0? Try line-height 1.25 → 1.15.
Still > 1.0? Iterate font.size -= PROFILE.compression_step (0.1pt) until floor (PROFILE.floors.<body|caption|source>) hit.
Still > 1.0 at floor? STOP. Escalate to user. List the shape, its current size, the calculated ratio, and ask whether to break the floor.

3e. Late-stage glossary re-scan

Run scripts/glossary_audit.py:

For each text run in the deck, check against glossary.locked:
- If a CN-side phrase exists locked but a non-canonical EN translation appears → flag.
- If the same source term is translated two different ways in the deck (wavering) → flag.
Surface flagged rows. Auto-fix if all flags point to the same canonical translation; ask otherwise.

3f. Reverse sync (sub-routine, also Mode 3.5 entry point)

Run scripts/excel_sync.py --reverse:

Diff current PPT against the Excel's en_optimized column.
For each mismatched row:
- Try ordinal-position match (slide + paragraph-index).
- If no match, try difflib.get_close_matches against same-slide texts.
- Update Excel cell on success.
Report any leftover unmatched rows.

Outputs:

<file>-final-<date>.pptx (full pipeline) or <file>-final-<date>.pptx (polish-only)
Updated Excel (if applicable)

Stage 4 — Handoff

Inputs: all prior-stage outputs What it does:

Write HANDOFF.md to the same directory as the deck — see scripts/handoff.py.
Print a one-paragraph deliverables summary to the user.

Stop-and-ask: none.

3. Operational rules (apply across stages)

3.1 File-write discipline

Before writing any .pptx or .xlsx:

Check for ~$<filename> lock file in the same directory.
If present → STOP. Tell the user: "<filename> is open in PowerPoint/Excel. Save and close it, then say 'go' to continue."
After writing, immediately readback-verify (next rule).

3.2 Excel companion three guard-rails

Pre-write check — load existing Excel (if any), confirm header row is [page, kind, cn, en_original, en_optimized, notes]. If columns missing → rebuild header before writing data.
Post-write readback — immediately reload the saved file and assert max_column ≥ 7 and header is intact.
Reverse sync available on demand: see Stage 3f.

3.3 Font compression discipline

See Stage 3d. The single rule: never bulk-reduce font sizes. Always discrete -0.1pt, always after exhausting widening + line-height fallback, always with anchor protection.

3.4 Glossary discipline

Ask once per session per unknown term. Then it's in session_added for the rest of the session.
At handoff, promote session_added to a glossary_proposed_additions.yaml file next to the deck. The user can copy them into PROFILE for the next run.
Never silently apply a translation Claude is unsure about. Stop and ask.

3.5 Magnitude verification

Any number with a CN magnitude word (百万 / 千万 / 亿 / 百亿 / 千亿 / 万亿) must be re-verified against PROFILE.unit_table before being written to EN. Treat this as a HARD CHECK; do NOT take prior-session translations on faith.

3.6 CN-alignment confidence

When auto-aligning the Excel's cn column by paragraph ordinal:

Slides with > 15 changes → auto-tag notes column as needs-review.
Always present this as best-effort, never as ground truth.

4. Scripts (in `scripts/`)

Script	Role	Stage
`sense_pass.py`	extract design DNA, font usage, palette	1
`extract.py`	paragraph-level text extraction	1, 2, 3
`apply.py`	apply EN edits + write Excel with highlight	2
`layout_audit.py`	font pollution cleanup, suffix audit	3a
`anchor_detect.py`	cross-page anchor signature detection	3b
`overflow_recheck.py`	overflow estimator with severity tiers	3c
`glossary_audit.py`	late-stage glossary re-scan + wavering	3e
`excel_sync.py`	bidirectional PPT ↔ Excel sync (configurable slide offset)	3f
`handoff.py`	write HANDOFF.md	4
`style_distill.py`	distill style fingerprint from a reference PDF/.pptx	pre-2

Each script is invokable standalone; the skill wires them together.

5. Deliverables (recap)

Full pipeline (4 files):

<file>-en-polished-<date>.pptx
<file>-final-<date>.pptx
<file>-bilingual-diff-<date>.xlsx
HANDOFF.md

Polish-only (3 files):

<file>-final-<date>.pptx
<file>-layout-changes-<date>.xlsx
HANDOFF.md

Mode 3.5 (reverse-sync only):

updated <file>-bilingual-diff-<date>.xlsx

6. Known limitations

Overflow estimator is a hint, not a verdict — final visual check requires external rendering (Keynote / PowerPoint export to PDF).
python-pptx cannot render slides. There is no built-in preview.
CN auto-alignment by paragraph ordinal can drift on heavily-restructured pages — configure PROFILE.cn_en_slide_offset.overrides for known cases.
The skill assumes the CN source is semantic ground truth — typos in CN propagate to EN unless the user catches them.
File-lock collisions silently corrupt output. The pre-write ~$xxx check is the only line of defense.

7. Changelog

See CHANGELOG.md.

8. Credits

Generic deck-globalization engine derived from upstream DeckGlobalizer v2.1.1 by tinadu-ai (https://clawhub.ai/tinadu-ai/deckglobalizer). Original three-phase architecture (Visual Audit / Semantic Alignment / Page-by-Page Execution) credited and retained.

deck-pipeline

Safety Notice

Copy this and send it to your AI assistant to learn

Deck Pipeline

0. Modes & activation

1. PROFILE block — defaults (swappable)

2. Pipeline stages

Stage 1 — Sense Pass

Stage 2 — McKinsey Translation (skipped in polish-only mode)

Stage 3 — Layout Audit

3a. Font pollution cleanup

3b. Structural-anchor detection

3c. Overflow estimation

3d. Compression (when user OKs a fix)

3e. Late-stage glossary re-scan

3f. Reverse sync (sub-routine, also Mode 3.5 entry point)

Stage 4 — Handoff

3. Operational rules (apply across stages)

3.1 File-write discipline

3.2 Excel companion three guard-rails

3.3 Font compression discipline

3.4 Glossary discipline

3.5 Magnitude verification

3.6 CN-alignment confidence

4. Scripts (in `scripts/`)

5. Deliverables (recap)

6. Known limitations

7. Changelog

8. Credits

Source Transparency

Related Skills

frontend-design

pptx

agent development

skill development

deck-pipeline

Safety Notice

Copy this and send it to your AI assistant to learn

Deck Pipeline

0. Modes & activation

1. PROFILE block — defaults (swappable)

2. Pipeline stages

Stage 1 — Sense Pass

Stage 2 — McKinsey Translation (skipped in polish-only mode)

Stage 3 — Layout Audit

3a. Font pollution cleanup

3b. Structural-anchor detection

3c. Overflow estimation

3d. Compression (when user OKs a fix)

3e. Late-stage glossary re-scan

3f. Reverse sync (sub-routine, also Mode 3.5 entry point)

Stage 4 — Handoff

3. Operational rules (apply across stages)

3.1 File-write discipline

3.2 Excel companion three guard-rails

3.3 Font compression discipline

3.4 Glossary discipline

3.5 Magnitude verification

3.6 CN-alignment confidence

4. Scripts (in scripts/)

5. Deliverables (recap)

6. Known limitations

7. Changelog

8. Credits

Source Transparency

Related Skills

frontend-design

pptx

agent development

skill development

4. Scripts (in `scripts/`)