Skip to content

Fix TRANSLATION_DOCUMENTATION_README: replace misleading infrastructure-only scores with two-tier metrics#1100

Merged
pethers merged 2 commits intomasterfrom
copilot/update-translation-readme
Mar 6, 2026
Merged

Fix TRANSLATION_DOCUMENTATION_README: replace misleading infrastructure-only scores with two-tier metrics#1100
pethers merged 2 commits intomasterfrom
copilot/update-translation-readme

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 5, 2026

The document reported 75–99% "quality" scores per language, but these measured infrastructure completeness (hreflang, Schema.org, nav) — not whether visitors can actually read the page. Actual body content translation ranges from 12% (Dutch) to 68% (Japanese).

Changes

New: Translation Gap Analysis table

Added at the top — 13 languages sorted worst→best by body content %, with both Infrastructure % (100% all) and Body Content % (12–68%) side-by-side, plus priority tiers (⚡ CRITICAL / 🔴 HIGH / 🟠 MEDIUM / 🟡 LOWER).

Language Infrastructure % Body Content % Priority
Dutch 100% 12% ⚡ CRITICAL
Danish 100% 24% ⚡ CRITICAL
Finnish 100% 25% 🔴 HIGH
100%
Japanese 100% 68% 🟡 LOWER

New: Measurement Methodology section

Documents the detection heuristic (<p>[A-Z][a-z]+ [a-z]+ [a-z]+) and the distinction between infrastructure metrics (SEO plumbing) and body content metrics (what users actually read).

Updated: Language summary tables

Replaced single Quality % column with Infra % + Body % columns across all 5 language group tables. Status emojis corrected: ⚠️ for <25% body, 🚧 for 25–80%.

Updated: Revised Language Status

Replaced estimated values (50–98%) with the actual measured values for all 13 languages.

Rewritten: Priority Recommendations

Re-ordered by actual body content gap (NL 12% → JA 68%), replacing the obsolete file-count-based section.

Updated: Mermaid diagrams

Both diagrams now render two-tier 100% Infra | X% Body nodes with ISMS-standard colour coding (red = CRITICAL, orange = HIGH, amber = LOWER).

Updated: Summary statistics & footer

Overall body translation corrected to ~32% (~395/1,248 files); "100% complete 🎉" replaced with "100% infrastructure complete ✅ | ~32% body content translated ⚠️"; dates updated to March 2026.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Translation] Update TRANSLATION_DOCUMENTATION_README.md with accurate body content translation percentages</issue_title>
<issue_description>## 🎯 Objective
Update the TRANSLATION_DOCUMENTATION_README.md to reflect the actual body content translation status rather than the misleading infrastructure-only metrics.

📋 Background

The current TRANSLATION_DOCUMENTATION_README.md reports quality scores of 85-99% across languages. However, these scores measure infrastructure completeness (hreflang tags, Schema.org metadata, navigation elements) — NOT actual body content translation.

Actual body content translation status (measured via English sentence pattern detection):

Language Claimed Quality Actual Body Translation Files w/ English Body Gap
🇳🇱 Dutch 91%+ 13% 84/96 -78%
🇩🇰 Danish 95% 24% 73/96 -71%
🇫🇮 Finnish 98% 25% 72/96 -73%
🇫🇷 French 85%+ 29% 69/96 -56%
🇸🇪 Swedish 99.2% 38% 60/96 -61%
🇩🇪 German 98.9% 42% 56/96 -57%
🇰🇷 Korean 75%+ 43% 55/96 -32%
🇪🇸 Spanish 96.6% 45% 53/96 -52%
🇳🇴 Norwegian 99.5% 46% 52/96 -54%
🇮🇱 Hebrew 93%+ 48% 50/96 -45%
🇸🇦 Arabic 67.7% 60% 39/96 -8%
🇨🇳 Chinese 95%+ 66% 33/96 -29%
🇯🇵 Japanese 95%+ 68% 31/96 -27%

The document itself acknowledges this in one section ("~450-488 files (36-39%) need body content translation") but the per-language quality scores on the summary table are misleading.

✅ Acceptance Criteria

  • TRANSLATION_DOCUMENTATION_README.md updated with two separate metric categories:
    • Infrastructure Completion (hreflang, metadata, navigation, Schema.org) — current high scores
    • Body Content Translation (actual article body text translated to target language) — the real gaps shown above
  • Each language section clearly states how many files have translated body content vs infrastructure-only
  • The summary table shows both metrics side-by-side
  • A prioritized "Next Steps" section identifies which languages/categories need translation work most urgently
  • Remove or update any "🏆 PERFECT" / "🎉 COMPLETE" markers that are misleading given the body content gap
  • Add a methodology note explaining how body content translation is measured vs infrastructure

🛠️ Implementation Guidance

Files to modify:

  • TRANSLATION_DOCUMENTATION_README.md — main update target

Approach:

  1. Update the main summary table to include both "Infrastructure %" and "Body Content %" columns
  2. For each language section, add a "Body Content Status" subsection with:
    • Count of files with translated body content
    • Count of files with English body content still pending
    • Breakdown by category (blog, discordian, product, industry, core)
  3. Update status emojis: Use ✅ only when body content is actually translated, use 🚧 for infrastructure-only
  4. Add a "Translation Gap Analysis" section with the table from this issue
  5. Update the "Next Steps" section to prioritize based on actual body content gaps

Validation:
Run this command to verify counts:

for lang in ar ko zh ja he da fi no es fr nl sv de; do
  count=0; total=0
  for f in *_${lang}.html; do
    [ -f "$f" ] || continue; total=$((total+1))
    grep -P '<p>[A-Z][a-z]+ [a-z]+ [a-z]+' "$f" 2>/dev/null | head -1 | grep -qP '[A-Za-z]{3,}' && count=$((count+1))
  done
  echo "${lang}: ${count}/${total} files have English body"
done

🤖 Recommended Agent

Agent: @hack23-isms-ninja
Rationale: This is a documentation accuracy and compliance task. The ISMS Ninja specializes in documentation quality and ensuring accurate status reporting.

For implementation:

  • Audit current translation status claims against actual measurements
  • Update documentation with accurate, two-tier metrics
  • Ensure documentation follows Hack23 ISMS documentation standards
  • Create clear, actionable "Next Steps" section for translation prioritization</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

… translation metrics

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@pull-request-size pull-request-size bot added size/L and removed size/XS labels Mar 6, 2026
Copilot AI changed the title [WIP] Update TRANSLATION_DOCUMENTATION_README.md with accurate translation stats Fix TRANSLATION_DOCUMENTATION_README: replace misleading infrastructure-only scores with two-tier metrics Mar 6, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 6, 2026

@github-actions github-actions bot added documentation Documentation updates content-pages Main website pages labels Mar 6, 2026
@pethers pethers marked this pull request as ready for review March 6, 2026 00:34
@pethers pethers self-requested a review as a code owner March 6, 2026 00:34
@pethers pethers merged commit 5ec66d9 into master Mar 6, 2026
13 checks passed
@pethers pethers deleted the copilot/update-translation-readme branch March 21, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content-pages Main website pages documentation Documentation updates size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Translation] Update TRANSLATION_DOCUMENTATION_README.md with accurate body content translation percentages

2 participants