Validate target-language in XLF files#119
Conversation
📝 WalkthroughWalkthroughValidator refactors error handling and adds filename-driven target-language checks: it validates XLIFF v1.2 Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 PHPStan (2.1.51)PHPStan was skipped because the config uses disallowed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/src/Validator/XliffSchemaValidatorTest.php (2)
365-373: Strengthen v2 assertion to locktrgLangbehavior.This test currently checks only a generic
target-languagesubstring. If you adopt v2-correct messaging, asserttrgLang(and ideally<xliff>) so the test guards the intended v2 path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 365 - 373, The current test loop in XliffSchemaValidatorTest.php only looks for 'target-language' in $result messages; update the check to assert the XLIFF v2-specific wording by looking for 'trgLang' (and optionally the '<xliff>' tag) instead: inside the foreach over $result errors examine $error['message'] for the substring 'trgLang' (or both 'trgLang' and '<xliff>') and set $hasTargetLangError accordingly, then keep the $this->assertTrue($hasTargetLangError, ...) but update the failure message to mention trgLang to lock the v2 behavior.
305-307: Avoid asserting total issue count in mismatch test.
assertCount(1, $result)is brittle if schema validation starts returning additional errors. Filter by the custom mismatch message (like the missing-target test) and assert that specific entry instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 305 - 307, The test currently asserts a fixed count with $this->assertCount(1, $result) which is brittle; instead, remove the count assertion and locate the specific mismatch entry in $result by filtering for the custom mismatch message (e.g. search $result items where $item['message'] contains the expected mismatch text or the language markers '"fr"' and '"de"'), then assert against that filtered entry (use $this->assertNotEmpty on the filtered array and $this->assertStringContainsString('"fr"', $entry['message']) and $this->assertStringContainsString('"de"', $entry['message'])). This targets the XliffSchemaValidatorTest mismatch check and avoids failing when other schema errors are present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Validator/XliffSchemaValidator.php`:
- Around line 67-69: Change the misleading log message so it explicitly states
that only schema validation is being skipped (not the whole validation), e.g.
update the notice in XliffSchemaValidator (where the exception is caught and you
call $this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(),
$e->getMessage()))) to something like "Skipping schema validation for %s: %s" or
"Schema validation skipped for %s: %s" so it clearly references schema
validation and still includes $this->getShortName() and $e->getMessage().
- Around line 83-103: The error messages in XliffSchemaValidator (when
$isVersion2 is true and $targetLang is read from
$xliffRoot->getAttribute('trgLang')) incorrectly refer to the "target-language"
attribute on the <file> node; update the two message strings produced in that
branch (the missing-attribute error and the mismatch error) to reference the
XLIFF 2.x attribute name "trgLang" and the <xliff> node (or generic XLIFF 2.x
location) instead of "target-language" on <file>, keeping the existing variables
($targetLang, $expectedLanguage) and logic intact.
---
Nitpick comments:
In `@tests/src/Validator/XliffSchemaValidatorTest.php`:
- Around line 365-373: The current test loop in XliffSchemaValidatorTest.php
only looks for 'target-language' in $result messages; update the check to assert
the XLIFF v2-specific wording by looking for 'trgLang' (and optionally the
'<xliff>' tag) instead: inside the foreach over $result errors examine
$error['message'] for the substring 'trgLang' (or both 'trgLang' and '<xliff>')
and set $hasTargetLangError accordingly, then keep the
$this->assertTrue($hasTargetLangError, ...) but update the failure message to
mention trgLang to lock the v2 behavior.
- Around line 305-307: The test currently asserts a fixed count with
$this->assertCount(1, $result) which is brittle; instead, remove the count
assertion and locate the specific mismatch entry in $result by filtering for the
custom mismatch message (e.g. search $result items where $item['message']
contains the expected mismatch text or the language markers '"fr"' and '"de"'),
then assert against that filtered entry (use $this->assertNotEmpty on the
filtered array and $this->assertStringContainsString('"fr"', $entry['message'])
and $this->assertStringContainsString('"de"', $entry['message'])). This targets
the XliffSchemaValidatorTest mismatch check and avoids failing when other schema
errors are present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96869412-f767-4587-a364-873e794f6c6a
📒 Files selected for processing (2)
src/Validator/XliffSchemaValidator.phptests/src/Validator/XliffSchemaValidatorTest.php
| if (str_contains($e->getMessage(), 'No support implemented for loading XLIFF version')) { | ||
| $this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(), $e->getMessage())); | ||
| } else { |
There was a problem hiding this comment.
Notice wording is misleading (Skipping vs actual behavior).
Line 68 logs “Skipping …”, but the method continues and may still return custom target-language issues. Please make the notice explicit that only schema validation is skipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Validator/XliffSchemaValidator.php` around lines 67 - 69, Change the
misleading log message so it explicitly states that only schema validation is
being skipped (not the whole validation), e.g. update the notice in
XliffSchemaValidator (where the exception is caught and you call
$this->logger?->notice(sprintf('Skipping %s: %s', $this->getShortName(),
$e->getMessage()))) to something like "Skipping schema validation for %s: %s" or
"Schema validation skipped for %s: %s" so it clearly references schema
validation and still includes $this->getShortName() and $e->getMessage().
|
Thanks for the PR and for refactoring the try/catch flow — it makes the code much more readable. One point before merging: Convention coverage gap: The regex Minor:
Schema errors are now accumulated together with target-language errors instead of returning early — IMO better, but worth noting in the PR description as an intentional behavior change. |
|
Good point. I tried to re-use something to derive the language, but couldn't find a reliable method. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Parser/XliffParser.php`:
- Around line 126-147: getLanguageFromFileName() currently returns locales in
all-lowercase and preserves underscores (e.g. de_de) which doesn't match the
expected target-language/trgLang format (e.g. de-DE); update the function so
after extracting $matches[1] it normalizes the locale to the canonical form:
replace any '_' with '-', split on '-' into language and optional region, make
the language lowercase and the region uppercase, then rejoin with '-' (return
language alone if no region). Apply this normalization in both the prefix and
suffix match branches before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 89fb0ed3-79b1-4180-b105-ee19c869d1fe
📒 Files selected for processing (4)
src/Parser/XliffParser.phpsrc/Validator/XliffSchemaValidator.phptests/src/Parser/XliffParserTest.phptests/src/Validator/XliffSchemaValidatorTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/src/Validator/XliffSchemaValidatorTest.php
|
Looks great, moving the detection into |
|
Oh, there was indeed an issue. But with base language extraction, not formatting. |
|
One thing worth fixing before merging: Region mismatch produces false positivesWith commit Example:
Suggestion: normalize both sides the same way (strip region on both), or compare both sides in full. The first option fits the current approach — e.g. apply the same trimming to Optional / low priority
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/src/Validator/XliffSchemaValidatorTest.php (2)
329-332: ⚡ Quick winAssert the targeted mismatch error, not the total error array size.
This assertion is a bit brittle because
processFile()can legitimately include additional non-mismatch errors in the future. Filter for the target-language mismatch message first, then assert count/content on that subset.Proposed change
- $this->assertCount(1, $result); - $this->assertStringContainsString('"fr"', $result[0]['message']); - $this->assertStringContainsString('"de"', $result[0]['message']); + $mismatchErrors = array_filter( + $result, + static fn ($e) => isset($e['message']) && str_contains($e['message'], 'target-language'), + ); + $this->assertCount(1, $mismatchErrors); + $error = array_values($mismatchErrors)[0]; + $this->assertStringContainsString('"fr"', $error['message']); + $this->assertStringContainsString('"de"', $error['message']);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 329 - 332, The test currently asserts the total $result array size which is brittle; change it to filter $result for entries whose ['message'] contains the target-language mismatch (both "fr" and "de") and then assert on that filtered subset (e.g., assertCount(1, $mismatches) and assertStringContainsString on $mismatches[0]['message']); locate the assertion block in XliffSchemaValidatorTest where processFile() is called and replace the direct count/asserts with a filtered check against the specific mismatch message.
257-264: ⚡ Quick winUse unique temp locations and guaranteed cleanup in file-backed tests.
These tests use fixed filenames under
sys_get_temp_dir()and manualunlink(). That can cause collisions/flakes in parallel runs and leak files if an assertion fails before cleanup.Suggested pattern
- $tempDir = sys_get_temp_dir(); - $tempFile = $tempDir.'/de.locallang.xlf'; - file_put_contents($tempFile, $xliff); - - $parser = new XliffParser($tempFile); - $result = $this->validator->processFile($parser); - - unlink($tempFile); + $tempDir = sys_get_temp_dir().'/xliff_test_'.uniqid('', true); + mkdir($tempDir); + $tempFile = $tempDir.'/de.locallang.xlf'; + file_put_contents($tempFile, $xliff); + + try { + $parser = new XliffParser($tempFile); + $result = $this->validator->processFile($parser); + } finally { + `@unlink`($tempFile); + `@rmdir`($tempDir); + }Also applies to: 292-299, 320-327, 349-356, 376-383, 411-418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/src/Validator/XliffSchemaValidatorTest.php` around lines 257 - 264, Tests create files at a fixed path under sys_get_temp_dir() and call unlink() manually, which risks collisions and leaked files on failure; change to create unique temp files with tempnam(sys_get_temp_dir(), 'xliff_') (or PHPUnit temporary helpers), write with file_put_contents, and wrap the parse/processFile calls (XliffParser and $this->validator->processFile) in a try/finally block that unlinks the file if it exists to guarantee cleanup; apply the same tempnam + try/finally pattern for the other similar test blocks (the ones around the other occurences) to avoid flakes and leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/src/Validator/XliffSchemaValidatorTest.php`:
- Around line 329-332: The test currently asserts the total $result array size
which is brittle; change it to filter $result for entries whose ['message']
contains the target-language mismatch (both "fr" and "de") and then assert on
that filtered subset (e.g., assertCount(1, $mismatches) and
assertStringContainsString on $mismatches[0]['message']); locate the assertion
block in XliffSchemaValidatorTest where processFile() is called and replace the
direct count/asserts with a filtered check against the specific mismatch
message.
- Around line 257-264: Tests create files at a fixed path under
sys_get_temp_dir() and call unlink() manually, which risks collisions and leaked
files on failure; change to create unique temp files with
tempnam(sys_get_temp_dir(), 'xliff_') (or PHPUnit temporary helpers), write with
file_put_contents, and wrap the parse/processFile calls (XliffParser and
$this->validator->processFile) in a try/finally block that unlinks the file if
it exists to guarantee cleanup; apply the same tempnam + try/finally pattern for
the other similar test blocks (the ones around the other occurences) to avoid
flakes and leaks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a36f3e51-7d31-44c4-ba85-ad7b4be4b542
📒 Files selected for processing (3)
src/Parser/XliffParser.phptests/src/Parser/XliffParserTest.phptests/src/Validator/XliffSchemaValidatorTest.php
✅ Files skipped from review due to trivial changes (1)
- tests/src/Parser/XliffParserTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Parser/XliffParser.php
|
Okay, this should be done. However, I'm not sure how "strict" the checks should be. Maybe raising a warning about a mismatch of "de_DE" an "de_AT" could be an option. The whole locale parsing and comparison could be worth a separate utility - no idea if other validators would profit from it. |
|
Going with Option 1 (lenient): both sides are normalized to the base language, region suffixes (de_AT, de-DE) are stripped before comparison. This keeps the validator pragmatic for projects that use region suffixes inconsistently, while still catching real mismatches like de.xlf with target-language="fr". A region-level warning (e.g. de_DE.xlf vs target-language="de-AT") is tracked as a follow-up in #122. LGTM from my side once the test nit is picked up — thanks for the clean refactor! |
This PR adds an additional check to the
XliffSchemaValidator. Translated language files should have atarget-languageattribute that matches it's filename. Otherwise an error should be thrown.Attention: AI generated code.
resolves #118
Summary by CodeRabbit
New Features
Improvements
Tests