/dm:multilingual-score
Purpose
Score translated or localized content across multiple quality dimensions to determine whether it is ready for publishing, needs native speaker review, or requires re-translation. Combines technical translation scoring (length ratios, formatting preservation, placeholder integrity, key term consistency) with content quality evaluation, brand voice consistency checking, and market-specific compliance verification into a single composite multilingual quality score.
Use this command after any translation or localization workflow to validate quality before content goes live. It replaces subjective "looks good" assessments with a structured, repeatable scoring methodology that catches issues automated translation often introduces — brand voice drift, formatting damage, missing do-not-translate terms, compliance gaps in the target market, and length distortion that signals missing or added content. The composite score provides a clear publish/review/re-translate classification so the team knows exactly what action to take.
Input Required
The user must provide (or will be prompted for):
- Original content: The source text that was translated — provided as inline text, a file path, or a URL to the source content. This serves as the reference for translation accuracy scoring. Required for technical translation scoring; if omitted, only content quality, brand voice, and compliance dimensions are scored
- Translated content: The translated or localized text to score — provided as inline text, a file path, or a URL. This is the primary content being evaluated. Required
- Source language: The language code of the original content (e.g.,
en-US,en-GB,de-DE). Defaults to the brand's primary language from the language configuration if not specified - Target language: The language code of the translated content (e.g.,
de-DE,fr-FR,hi-IN,ja-JP). Required — determines which compliance rules apply and which translation service benchmarks to reference - Do-not-translate terms (optional): Specific terms that must appear unchanged in the translation. Defaults to the brand profile's
language.do_not_translatelist. Additional terms can be provided to supplement the brand list for this specific scoring run - Content type (optional): The type of content being scored —
blog,email,ad,landing_page,social,product_description,legal,technical. Affects content quality scoring weights and brand voice expectations. Defaults to auto-detection based on content structure
Process
- Load brand context: Read
~/.claude-marketing/brands/_active-brand.jsonfor the active slug, then load~/.claude-marketing/brands/{slug}/profile.json. Apply brand voice dimensions, compliance rules for target markets (skills/context-engine/compliance-rules.md), and industry context. Load the language configuration — specifically the do-not-translate terms fromlanguage.do_not_translateand any translation quality baselines from past scoring runs. Also check for guidelines at~/.claude-marketing/brands/{slug}/guidelines/_manifest.json— if present, load restrictions and voice-and-tone rules that apply across all languages. Check for agency SOPs at~/.claude-marketing/sops/. If no brand exists, ask: "Set up a brand first (/dm:brand-setup)?" — or proceed with defaults. - Run technical translation scoring: Execute
language-router.py --action scorewith the original content, translated content, source language, target language, and do-not-translate terms. This produces four sub-scores: length ratio (translated content length versus expected length for the language pair — e.g., German typically runs 20-30% longer than English, Japanese typically runs 10-20% shorter; deviations beyond expected ranges indicate missing or added content), formatting preservation (markdown structure, HTML tags, merge tags like {{first_name}}, UTM parameters, and link structures survived translation intact), key term consistency (every do-not-translate term from the brand profile and any additional specified terms appears exactly as specified in the translation), and placeholder integrity (all variables, template tokens, and dynamic content markers are present and correctly positioned in the translated version). - Run content quality evaluation: Execute
eval-runner.py --action run-quickon the translated content alone, scoring it as standalone content in the target language. This assesses structural quality, readability for the target audience, completeness, and coherence — catching cases where a translation is technically accurate but reads poorly as native content. The eval-runner scores content against the plugin's standard quality dimensions regardless of whether it was translated or originally authored. - Run brand voice consistency check: Execute
brand-voice-scorer.py --brand {slug} --text "{translated_content}"to score how well the translated content matches the brand's voice profile. Brand voice should survive translation — the brand should sound recognizably like itself in every language, adapted for local expectations but maintaining its core personality dimensions (formality, energy, humor, authority). Score the translated content against the same voice dimensions as the original to detect voice drift introduced during translation. - Check compliance for target market: Based on the target language-region code, identify applicable regulatory requirements from
skills/context-engine/compliance-rules.md. For EU languages: GDPR consent language, cookie consent, right-to-erasure references. For hi-IN and other Indian languages: DPDPA compliance. For pt-BR: LGPD. For ko-KR: PIPA. For ja-JP: APPI. For en-US: CAN-SPAM, CCPA/CPRA where applicable. Verify that required compliance elements are present and correctly localized in the translated content — not just copied in English. Score as compliant, partially compliant (elements present but not fully localized), or non-compliant (required elements missing). - Compute multilingual composite score: Calculate the weighted composite score across all four dimensions — translation technical score (40% weight, reflecting the core accuracy of the translation), content quality score (25% weight, assessing readability and structural quality in the target language), brand voice score (20% weight, measuring voice consistency across languages), and compliance score (15% weight, verifying regulatory requirements for the target market). Each dimension is scored 0-100, and the composite is the weighted average. If original content is not provided (skipping technical translation scoring), redistribute weights: content quality 40%, brand voice 35%, compliance 25%.
- Classify the result: Based on the composite score, assign a clear action classification — 85 and above: Publish Ready (content meets quality standards for the target market, no blocking issues, can proceed to publishing workflow), 70-84: Native Speaker Review Recommended (content is functional but has quality gaps that a native speaker should review and correct before publishing — list the specific issues requiring review), Below 70: Re-translate (content has significant quality issues that spot corrections cannot fix — recommend re-translation with specific guidance on what went wrong and which dimensions need the most improvement).
- Generate improvement suggestions: For any dimension scoring below its threshold (technical < 85, content quality < 80, brand voice < 75, compliance < 100), generate specific, actionable improvement suggestions. For technical issues: cite the exact problem (e.g., "Do-not-translate term 'BrandName Pro' was translated to 'BrandName Profi' on line 3 — must appear as 'BrandName Pro'"). For voice issues: cite the dimension and specific text (e.g., "Formality is at 8 in the translation but brand targets 5 — replace 'Wir freuen uns, Ihnen mitzuteilen' with 'Wir sind gespannt, euch zu zeigen'"). For compliance: cite the missing requirement and the regulation (e.g., "Missing GDPR-compliant unsubscribe link — required for all EU-targeted email content per Article 7(3)").
Output
A structured multilingual quality scorecard containing:
- Multilingual composite score: The weighted overall score (0-100) with letter grade (A+ through F) and publish/review/re-translate classification, providing an immediate actionable verdict
- Translation technical score breakdown: Overall technical score plus the four sub-scores — length ratio (with expected range for the language pair and actual ratio), formatting preservation (with count of preserved versus damaged elements), key term consistency (with list of any violated do-not-translate terms), and placeholder integrity (with list of any missing or modified placeholders)
- Content quality score: The eval-runner quality score for the translated content as standalone text in the target language, with dimension breakdown (structure, readability, completeness, coherence) showing how the translation performs as native content
- Brand voice score: Voice consistency score from brand-voice-scorer.py with per-dimension breakdown (formality, energy, humor, authority, etc.) for the translated version, compared against the brand profile targets — highlighting any voice dimensions that drifted during translation
- Compliance status: Per-regulation compliance check results for the target market — compliant, partially compliant, or non-compliant for each applicable regulation, with specific missing or incorrectly localized elements identified
- Classification and action: Clear verdict — Publish Ready, Native Speaker Review Recommended, or Re-translate — with the reasoning based on score thresholds and any blocking issues
- Specific issues and fix suggestions: Every issue found across all dimensions, with the exact location in the content, a description of what is wrong, and a specific suggested fix. Grouped by dimension and ordered by severity within each group
- Baseline comparison (if available): How this score compares to the brand's multilingual quality baseline — average scores for this language pair from previous scoring runs, trend direction (improving, stable, declining), and whether this particular piece is above or below the brand's typical quality for this language
Agents Used
- localization-specialist — Leads the multilingual scoring workflow. Executes translation technical scoring via language-router.py (length ratio, formatting, key terms, placeholders), interprets results in the context of the specific language pair's characteristics (expected expansion/contraction ratios, formatting conventions), validates do-not-translate term preservation, assesses cultural adaptation quality beyond mechanical accuracy, and synthesizes all dimension scores into the composite multilingual quality score with classification and actionable improvement suggestions
- quality-assurance — Runs the content quality evaluation via eval-runner.py on the translated content, scoring it as standalone content in the target language. Applies the standard quality evaluation framework (structure, readability, completeness, coherence) to determine whether the translation reads well as native content, independent of translation accuracy. Checks eval configuration for brand-specific quality thresholds and ensures scoring is logged for baseline tracking via quality-tracker.py