Skip to content

[Translation] Update TRANSLATION_DOCUMENTATION_README.md with accurate body content translation percentages #1098

@pethers

Description

@pethers

🎯 Objective

Update the TRANSLATION_DOCUMENTATION_README.md to reflect the actual body content translation status rather than the misleading infrastructure-only metrics.

📋 Background

The current TRANSLATION_DOCUMENTATION_README.md reports quality scores of 85-99% across languages. However, these scores measure infrastructure completeness (hreflang tags, Schema.org metadata, navigation elements) — NOT actual body content translation.

Actual body content translation status (measured via English sentence pattern detection):

Language Claimed Quality Actual Body Translation Files w/ English Body Gap
🇳🇱 Dutch 91%+ 13% 84/96 -78%
🇩🇰 Danish 95% 24% 73/96 -71%
🇫🇮 Finnish 98% 25% 72/96 -73%
🇫🇷 French 85%+ 29% 69/96 -56%
🇸🇪 Swedish 99.2% 38% 60/96 -61%
🇩🇪 German 98.9% 42% 56/96 -57%
🇰🇷 Korean 75%+ 43% 55/96 -32%
🇪🇸 Spanish 96.6% 45% 53/96 -52%
🇳🇴 Norwegian 99.5% 46% 52/96 -54%
🇮🇱 Hebrew 93%+ 48% 50/96 -45%
🇸🇦 Arabic 67.7% 60% 39/96 -8%
🇨🇳 Chinese 95%+ 66% 33/96 -29%
🇯🇵 Japanese 95%+ 68% 31/96 -27%

The document itself acknowledges this in one section ("~450-488 files (36-39%) need body content translation") but the per-language quality scores on the summary table are misleading.

✅ Acceptance Criteria

  • TRANSLATION_DOCUMENTATION_README.md updated with two separate metric categories:
    • Infrastructure Completion (hreflang, metadata, navigation, Schema.org) — current high scores
    • Body Content Translation (actual article body text translated to target language) — the real gaps shown above
  • Each language section clearly states how many files have translated body content vs infrastructure-only
  • The summary table shows both metrics side-by-side
  • A prioritized "Next Steps" section identifies which languages/categories need translation work most urgently
  • Remove or update any "🏆 PERFECT" / "🎉 COMPLETE" markers that are misleading given the body content gap
  • Add a methodology note explaining how body content translation is measured vs infrastructure

🛠️ Implementation Guidance

Files to modify:

  • TRANSLATION_DOCUMENTATION_README.md — main update target

Approach:

  1. Update the main summary table to include both "Infrastructure %" and "Body Content %" columns
  2. For each language section, add a "Body Content Status" subsection with:
    • Count of files with translated body content
    • Count of files with English body content still pending
    • Breakdown by category (blog, discordian, product, industry, core)
  3. Update status emojis: Use ✅ only when body content is actually translated, use 🚧 for infrastructure-only
  4. Add a "Translation Gap Analysis" section with the table from this issue
  5. Update the "Next Steps" section to prioritize based on actual body content gaps

Validation:
Run this command to verify counts:

for lang in ar ko zh ja he da fi no es fr nl sv de; do
  count=0; total=0
  for f in *_${lang}.html; do
    [ -f "$f" ] || continue; total=$((total+1))
    grep -P '<p>[A-Z][a-z]+ [a-z]+ [a-z]+' "$f" 2>/dev/null | head -1 | grep -qP '[A-Za-z]{3,}' && count=$((count+1))
  done
  echo "${lang}: ${count}/${total} files have English body"
done

🤖 Recommended Agent

Agent: @hack23-isms-ninja
Rationale: This is a documentation accuracy and compliance task. The ISMS Ninja specializes in documentation quality and ensuring accurate status reporting.

For implementation:

  • Audit current translation status claims against actual measurements
  • Update documentation with accurate, two-tier metrics
  • Ensure documentation follows Hack23 ISMS documentation standards
  • Create clear, actionable "Next Steps" section for translation prioritization

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions