Skip to content

Validate target-language in XLF files#119

Merged
konradmichalik merged 7 commits intomove-elevator:mainfrom
maikschneider:target-language-check
May 3, 2026
Merged

Validate target-language in XLF files#119
konradmichalik merged 7 commits intomove-elevator:mainfrom
maikschneider:target-language-check

Conversation

@maikschneider
Copy link
Copy Markdown
Contributor

@maikschneider maikschneider commented Apr 29, 2026

This PR adds an additional check to the XliffSchemaValidator. Translated language files should have a target-language attribute that matches it's filename. Otherwise an error should be thrown.

Attention: AI generated code.

resolves #118

Summary by CodeRabbit

  • New Features

    • Filename-encoded locales are validated against XLIFF target-language declarations.
  • Improvements

    • Improved handling and logging for unreadable/invalid XML while preserving validation results.
    • Parser detects XLIFF version, extracts target language, supports filename prefix/suffix and region variants (case-insensitive compare).
  • Tests

    • Added tests for filename locale extraction and target-language checks across XLIFF 1.x and 2.0.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Validator refactors error handling and adds filename-driven target-language checks: it validates XLIFF v1.2 target-language or XLIFF v2 trgLang against a locale inferred from the filename, logging errors for missing or mismatched attributes while preserving schema-validation errors; parser exposes version and filename-derived language/target extraction.

Changes

Cohort / File(s) Summary
Validator
src/Validator/XliffSchemaValidator.php
Refactored processFile() control flow (early returns for missing/unreadable files and XML parse failures), separated XML parse vs schema-validation try/catch, preserved captured schema $errors, and added post-schema checks that validate filename-inferred locale vs target-language (v1.x) or trgLang (v2.x), emitting ERROR entries for missing/mismatched values.
Parser
src/Parser/XliffParser.php
Added getLanguageFromFileName() and getTargetLanguage(), exposed isVersion2(), changed getLanguage() to prefer filename-derived locale (fallback to source), and normalized language casing/region handling.
Tests — Validator
tests/src/Validator/XliffSchemaValidatorTest.php
Updated invalid-XML expectation and added tests covering missing/mismatched target-language (XLIFF 1.x), skipping when no filename locale, region-handling, and missing/mismatched trgLang (XLIFF 2.x).
Tests — Parser
tests/src/Parser/XliffParserTest.php
Added tests for filename-based language extraction (prefix/suffix, region formats), getLanguageFromFileName() null behavior when no locale, and getTargetLanguage() behavior across XLIFF v1/v2 and region stripping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐇
I hop through filenames, sniff the locale trail,
If header tag is missing or off, I wag my tail.
A mismatch or a gap — I stamp my foot and say,
"Fix the tag or name, then I’ll let it stay!" 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR addresses issue #118 by implementing target-language validation, but review feedback indicates region-normalization bugs and test gaps that should be resolved before merging. Fix region-stripping to apply consistently to both filename and target-language values, replace assertCount assertions, sharpen XLIFF 2.x trgLang tests, and verify the behavioral change in getLanguage() fallback.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Validate target-language in XLF files' is directly related to the main change, which adds validation that target-language attributes in XLF files match their filenames.
Out of Scope Changes check ✅ Passed All changes align with the target-language validation objective; refactors to XliffParser and try/catch improvements support the core feature without introducing unrelated scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 PHPStan (2.1.51)

PHPStan was skipped because the config uses disallowed bootstrapFiles, bootstrapFile, or includes directives.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/src/Validator/XliffSchemaValidatorTest.php (2)

365-373: Strengthen v2 assertion to lock trgLang behavior.

This test currently checks only a generic target-language substring. If you adopt v2-correct messaging, assert trgLang (and ideally <xliff>) so the test guards the intended v2 path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 365 - 373, The
current test loop in XliffSchemaValidatorTest.php only looks for
'target-language' in $result messages; update the check to assert the XLIFF
v2-specific wording by looking for 'trgLang' (and optionally the '<xliff>' tag)
instead: inside the foreach over $result errors examine $error['message'] for
the substring 'trgLang' (or both 'trgLang' and '<xliff>') and set
$hasTargetLangError accordingly, then keep the
$this->assertTrue($hasTargetLangError, ...) but update the failure message to
mention trgLang to lock the v2 behavior.

305-307: Avoid asserting total issue count in mismatch test.

assertCount(1, $result) is brittle if schema validation starts returning additional errors. Filter by the custom mismatch message (like the missing-target test) and assert that specific entry instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 305 - 307, The
test currently asserts a fixed count with $this->assertCount(1, $result) which
is brittle; instead, remove the count assertion and locate the specific mismatch
entry in $result by filtering for the custom mismatch message (e.g. search
$result items where $item['message'] contains the expected mismatch text or the
language markers '"fr"' and '"de"'), then assert against that filtered entry
(use $this->assertNotEmpty on the filtered array and
$this->assertStringContainsString('"fr"', $entry['message']) and
$this->assertStringContainsString('"de"', $entry['message'])). This targets the
XliffSchemaValidatorTest mismatch check and avoids failing when other schema
errors are present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Validator/XliffSchemaValidator.php`:
- Around line 67-69: Change the misleading log message so it explicitly states
that only schema validation is being skipped (not the whole validation), e.g.
update the notice in XliffSchemaValidator (where the exception is caught and you
call $this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(),
$e->getMessage()))) to something like "Skipping schema validation for %s: %s" or
"Schema validation skipped for %s: %s" so it clearly references schema
validation and still includes $this->getShortName() and $e->getMessage().
- Around line 83-103: The error messages in XliffSchemaValidator (when
$isVersion2 is true and $targetLang is read from
$xliffRoot->getAttribute('trgLang')) incorrectly refer to the "target-language"
attribute on the <file> node; update the two message strings produced in that
branch (the missing-attribute error and the mismatch error) to reference the
XLIFF 2.x attribute name "trgLang" and the <xliff> node (or generic XLIFF 2.x
location) instead of "target-language" on <file>, keeping the existing variables
($targetLang, $expectedLanguage) and logic intact.

---

Nitpick comments:
In `@tests/src/Validator/XliffSchemaValidatorTest.php`:
- Around line 365-373: The current test loop in XliffSchemaValidatorTest.php
only looks for 'target-language' in $result messages; update the check to assert
the XLIFF v2-specific wording by looking for 'trgLang' (and optionally the
'<xliff>' tag) instead: inside the foreach over $result errors examine
$error['message'] for the substring 'trgLang' (or both 'trgLang' and '<xliff>')
and set $hasTargetLangError accordingly, then keep the
$this->assertTrue($hasTargetLangError, ...) but update the failure message to
mention trgLang to lock the v2 behavior.
- Around line 305-307: The test currently asserts a fixed count with
$this->assertCount(1, $result) which is brittle; instead, remove the count
assertion and locate the specific mismatch entry in $result by filtering for the
custom mismatch message (e.g. search $result items where $item['message']
contains the expected mismatch text or the language markers '"fr"' and '"de"'),
then assert against that filtered entry (use $this->assertNotEmpty on the
filtered array and $this->assertStringContainsString('"fr"', $entry['message'])
and $this->assertStringContainsString('"de"', $entry['message'])). This targets
the XliffSchemaValidatorTest mismatch check and avoids failing when other schema
errors are present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96869412-f767-4587-a364-873e794f6c6a

📥 Commits

Reviewing files that changed from the base of the PR and between 55827aa and f59930a.

📒 Files selected for processing (2)
  • src/Validator/XliffSchemaValidator.php
  • tests/src/Validator/XliffSchemaValidatorTest.php

Comment on lines 67 to 69
if (str_contains($e->getMessage(), 'No support implemented for loading XLIFF version')) {
$this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(), $e->getMessage()));
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Notice wording is misleading (Skipping vs actual behavior).

Line 68 logs “Skipping …”, but the method continues and may still return custom target-language issues. Please make the notice explicit that only schema validation is skipped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Validator/XliffSchemaValidator.php` around lines 67 - 69, Change the
misleading log message so it explicitly states that only schema validation is
being skipped (not the whole validation), e.g. update the notice in
XliffSchemaValidator (where the exception is caught and you call
$this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(),
$e->getMessage()))) to something like "Skipping schema validation for %s: %s" or
"Schema validation skipped for %s: %s" so it clearly references schema
validation and still includes $this->getShortName() and $e->getMessage().

Comment thread src/Validator/XliffSchemaValidator.php Outdated
@konradmichalik
Copy link
Copy Markdown
Contributor

Thanks for the PR and for refactoring the try/catch flow — it makes the code much more readable.

One point before merging:

Convention coverage gap: The regex ^([a-z]{2})\. only matches the prefix pattern (TYPO3 style: de.locallang.xlf). Symfony/Laravel use the suffix pattern (messages.de.xlf), and those files will never trigger the check. Since the plugin explicitly supports both conventions, it would be cleaner to derive the language from the FileSet / FileDetector (PrefixFileDetector / SuffixFileDetector already know the locale) instead of re-parsing the filename here.

Minor:

  • The new catch block around XmlUtils::parse() logs "Failed to validate XML schema", but it's actually a parse error — please rename.
  • array_values(XliffUtils::validateSchema($dom)) is redundant; the method already returns a list.
  • A test for XLIFF 2.x with a mismatched trgLang (not just missing) would be worth adding.

Schema errors are now accumulated together with target-language errors instead of returning early — IMO better, but worth noting in the PR description as an intentional behavior change.

@maikschneider
Copy link
Copy Markdown
Contributor Author

Good point. I tried to re-use something to derive the language, but couldn't find a reliable method. XliffParser::getLanguage() mixes filename language (just TYPO3 style de.locallang.xlf) and XML source language - which would cause conflicts. I did a little refactoring to clean things up.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Parser/XliffParser.php`:
- Around line 126-147: getLanguageFromFileName() currently returns locales in
all-lowercase and preserves underscores (e.g. de_de) which doesn't match the
expected target-language/trgLang format (e.g. de-DE); update the function so
after extracting $matches[1] it normalizes the locale to the canonical form:
replace any '_' with '-', split on '-' into language and optional region, make
the language lowercase and the region uppercase, then rejoin with '-' (return
language alone if no region). Apply this normalization in both the prefix and
suffix match branches before returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89fb0ed3-79b1-4180-b105-ee19c869d1fe

📥 Commits

Reviewing files that changed from the base of the PR and between f59930a and f5d1012.

📒 Files selected for processing (4)
  • src/Parser/XliffParser.php
  • src/Validator/XliffSchemaValidator.php
  • tests/src/Parser/XliffParserTest.php
  • tests/src/Validator/XliffSchemaValidatorTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/src/Validator/XliffSchemaValidatorTest.php

Comment thread src/Parser/XliffParser.php
@konradmichalik
Copy link
Copy Markdown
Contributor

Looks great, moving the detection into XliffParser (prefix + suffix + region) is exactly the right path, and the v2 message reads correctly now. Worth picking up the remaining CodeRabbit nitpicks on the tests (replace assertCount with a filter, sharpen the v2 test on trgLang); after that this is good to merge from my side.

@maikschneider
Copy link
Copy Markdown
Contributor Author

Oh, there was indeed an issue. But with base language extraction, not formatting.

@konradmichalik
Copy link
Copy Markdown
Contributor

One thing worth fixing before merging:

Region mismatch produces false positives

With commit c8a2bfe the region is stripped only from the filename, but the comparison runs against the full target-language attribute.

Example:

  • File: de_AT.locallang.xlf$expectedLanguage = 'de'
  • XML: target-language="de-AT"strtolower($targetLang) = 'de-at'
  • Result: 'de-at' !== 'de' → mismatch error, even though file and attribute are consistent.

Suggestion: normalize both sides the same way (strip region on both), or compare both sides in full. The first option fits the current approach — e.g. apply the same trimming to getTargetLanguage() via a small helper.

Optional / low priority

  • The instanceof XliffParser guard in the validator is redundant if supportsParser() is enforced at runtime — worth a quick check.
  • XliffParser::getLanguage() now falls back to getSourceLanguage() when no prefix/suffix matches. That's a behavior change — worth checking who consumes the return value.
  • The 'Skipping %s: %s' notice could be more specific about schema validation (CodeRabbit point), but purely cosmetic.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/src/Validator/XliffSchemaValidatorTest.php (2)

329-332: ⚡ Quick win

Assert the targeted mismatch error, not the total error array size.

This assertion is a bit brittle because processFile() can legitimately include additional non-mismatch errors in the future. Filter for the target-language mismatch message first, then assert count/content on that subset.

Proposed change
-        $this->assertCount(1, $result);
-        $this->assertStringContainsString('"fr"', $result[0]['message']);
-        $this->assertStringContainsString('"de"', $result[0]['message']);
+        $mismatchErrors = array_filter(
+            $result,
+            static fn ($e) => isset($e['message']) && str_contains($e['message'], 'target-language'),
+        );
+        $this->assertCount(1, $mismatchErrors);
+        $error = array_values($mismatchErrors)[0];
+        $this->assertStringContainsString('"fr"', $error['message']);
+        $this->assertStringContainsString('"de"', $error['message']);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 329 - 332, The
test currently asserts the total $result array size which is brittle; change it
to filter $result for entries whose ['message'] contains the target-language
mismatch (both "fr" and "de") and then assert on that filtered subset (e.g.,
assertCount(1, $mismatches) and assertStringContainsString on
$mismatches[0]['message']); locate the assertion block in
XliffSchemaValidatorTest where processFile() is called and replace the direct
count/asserts with a filtered check against the specific mismatch message.

257-264: ⚡ Quick win

Use unique temp locations and guaranteed cleanup in file-backed tests.

These tests use fixed filenames under sys_get_temp_dir() and manual unlink(). That can cause collisions/flakes in parallel runs and leak files if an assertion fails before cleanup.

Suggested pattern
-        $tempDir = sys_get_temp_dir();
-        $tempFile = $tempDir.'/de.locallang.xlf';
-        file_put_contents($tempFile, $xliff);
-
-        $parser = new XliffParser($tempFile);
-        $result = $this->validator->processFile($parser);
-
-        unlink($tempFile);
+        $tempDir = sys_get_temp_dir().'/xliff_test_'.uniqid('', true);
+        mkdir($tempDir);
+        $tempFile = $tempDir.'/de.locallang.xlf';
+        file_put_contents($tempFile, $xliff);
+
+        try {
+            $parser = new XliffParser($tempFile);
+            $result = $this->validator->processFile($parser);
+        } finally {
+            `@unlink`($tempFile);
+            `@rmdir`($tempDir);
+        }

Also applies to: 292-299, 320-327, 349-356, 376-383, 411-418

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 257 - 264,
Tests create files at a fixed path under sys_get_temp_dir() and call unlink()
manually, which risks collisions and leaked files on failure; change to create
unique temp files with tempnam(sys_get_temp_dir(), 'xliff_') (or PHPUnit
temporary helpers), write with file_put_contents, and wrap the parse/processFile
calls (XliffParser and $this->validator->processFile) in a try/finally block
that unlinks the file if it exists to guarantee cleanup; apply the same tempnam
+ try/finally pattern for the other similar test blocks (the ones around the
other occurences) to avoid flakes and leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/src/Validator/XliffSchemaValidatorTest.php`:
- Around line 329-332: The test currently asserts the total $result array size
which is brittle; change it to filter $result for entries whose ['message']
contains the target-language mismatch (both "fr" and "de") and then assert on
that filtered subset (e.g., assertCount(1, $mismatches) and
assertStringContainsString on $mismatches[0]['message']); locate the assertion
block in XliffSchemaValidatorTest where processFile() is called and replace the
direct count/asserts with a filtered check against the specific mismatch
message.
- Around line 257-264: Tests create files at a fixed path under
sys_get_temp_dir() and call unlink() manually, which risks collisions and leaked
files on failure; change to create unique temp files with
tempnam(sys_get_temp_dir(), 'xliff_') (or PHPUnit temporary helpers), write with
file_put_contents, and wrap the parse/processFile calls (XliffParser and
$this->validator->processFile) in a try/finally block that unlinks the file if
it exists to guarantee cleanup; apply the same tempnam + try/finally pattern for
the other similar test blocks (the ones around the other occurences) to avoid
flakes and leaks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a36f3e51-7d31-44c4-ba85-ad7b4be4b542

📥 Commits

Reviewing files that changed from the base of the PR and between f5d1012 and 6dae368.

📒 Files selected for processing (3)
  • src/Parser/XliffParser.php
  • tests/src/Parser/XliffParserTest.php
  • tests/src/Validator/XliffSchemaValidatorTest.php
✅ Files skipped from review due to trivial changes (1)
  • tests/src/Parser/XliffParserTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Parser/XliffParser.php

@maikschneider
Copy link
Copy Markdown
Contributor Author

Okay, this should be done. However, I'm not sure how "strict" the checks should be. Maybe raising a warning about a mismatch of "de_DE" an "de_AT" could be an option. The whole locale parsing and comparison could be worth a separate utility - no idea if other validators would profit from it.

@konradmichalik
Copy link
Copy Markdown
Contributor

Going with Option 1 (lenient): both sides are normalized to the base language, region suffixes (de_AT, de-DE) are stripped before comparison. This keeps the validator pragmatic for projects that use region suffixes inconsistently, while still catching real mismatches like de.xlf with target-language="fr".

A region-level warning (e.g. de_DE.xlf vs target-language="de-AT") is tracked as a follow-up in #122.

LGTM from my side once the test nit is picked up — thanks for the clean refactor!

@konradmichalik konradmichalik merged commit 50f189e into move-elevator:main May 3, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Validate missing target-language

2 participants