Skip to content

feat: add EncodingValidator for file encoding and JSON syntax validation#33

Merged
konradmichalik merged 3 commits intomainfrom
encoding-validator
Jul 14, 2025
Merged

feat: add EncodingValidator for file encoding and JSON syntax validation#33
konradmichalik merged 3 commits intomainfrom
encoding-validator

Conversation

@konradmichalik
Copy link
Copy Markdown
Contributor

@konradmichalik konradmichalik commented Jul 14, 2025

Summary by CodeRabbit

  • New Features
    • Introduced a new encoding validator to detect file encoding issues, invisible characters, Unicode normalization problems, and JSON syntax errors in translation files.
  • Documentation
    • Simplified documentation by removing detailed PHP translation file examples and added information about the new encoding validator.
  • Tests
    • Added comprehensive tests for the new encoding validator and updated related test assertions for improved type safety.
  • Chores
    • Updated project dependencies to require the PHP Intl and Mbstring extensions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jul 14, 2025

Walkthrough

A new EncodingValidator was introduced to check encoding-related issues in translation files, including UTF-8 validity, BOM presence, invisible characters, Unicode normalization, and JSON syntax. Documentation was updated to reflect this addition. Tests and the validator registry were updated accordingly. Composer requirements now mandate the intl and mbstring PHP extensions.

Changes

File(s) Change Summary
README.md Updated documentation: removed PHP translation file examples and added EncodingValidator description.
src/Validator/EncodingValidator.php Added new EncodingValidator class for encoding checks on translation files.
src/Validator/ValidatorRegistry.php Registered EncodingValidator in the list of available validators.
composer.json Added ext-intl and ext-mbstring as required PHP extensions.
tests/src/Validator/EncodingValidatorTest.php Added comprehensive unit tests for EncodingValidator.
tests/src/Validator/ValidatorRegistryTest.php Updated to assert presence of EncodingValidator in registry and adjusted expected validator count.
tests/src/Command/ValidateTranslationCommandTest.php
tests/src/Parser/ParserCacheTest.php
tests/src/Result/ValidationResultCliRendererTest.php
tests/src/Utility/ClassUtilityTest.php
tests/src/Validator/SchemaValidatorTest.php
Updated assertions to cast tested values to strings for compatibility and reliability.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ValidatorRegistry
    participant EncodingValidator
    participant Parser
    participant FileSystem

    User->>ValidatorRegistry: getAvailableValidators()
    ValidatorRegistry-->>User: [ ..., EncodingValidator, ... ]

    User->>EncodingValidator: processFile(Parser)
    EncodingValidator->>FileSystem: Read file content
    FileSystem-->>EncodingValidator: File content
    EncodingValidator->>EncodingValidator: Check UTF-8 validity
    EncodingValidator->>EncodingValidator: Check BOM presence
    EncodingValidator->>EncodingValidator: Detect invisible/control characters
    EncodingValidator->>EncodingValidator: Check Unicode normalization (if intl)
    EncodingValidator->>EncodingValidator: If JSON, check JSON syntax
    EncodingValidator-->>User: Array of detected issues (if any)
Loading

Poem

A validator hopped in, so spry and bright,
Sniffing out BOMs and bytes not quite right.
It peeks at your files, both JSON and PHP,
Spots sneaky spaces you might never see.
With tests all around and docs up to date,
Encoding is safe—let’s celebrate!
🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/src/Validator/EncodingValidatorTest.php (1)

188-189: Remove unused variable.

The variable $method is assigned but never used, as flagged by static analysis.

-    $reflection = new \ReflectionClass($this->validator);
-    $method = $reflection->getMethod('processFile');
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 598d067 and a82bef0.

📒 Files selected for processing (6)
  • README.md (1 hunks)
  • src/Validator/EncodingValidator.php (1 hunks)
  • src/Validator/ValidatorRegistry.php (1 hunks)
  • tests/src/Fixtures/config/unsupported.txt (1 hunks)
  • tests/src/Validator/EncodingValidatorTest.php (1 hunks)
  • tests/src/Validator/ValidatorRegistryTest.php (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
tests/src/Validator/ValidatorRegistryTest.php (1)
src/Validator/EncodingValidator.php (1)
  • EncodingValidator (14-165)
src/Validator/ValidatorRegistry.php (1)
src/Validator/EncodingValidator.php (1)
  • EncodingValidator (14-165)
tests/src/Validator/EncodingValidatorTest.php (6)
src/Parser/JsonParser.php (1)
  • JsonParser (7-101)
src/Parser/PhpParser.php (1)
  • PhpParser (7-122)
src/Parser/XliffParser.php (1)
  • XliffParser (7-81)
src/Parser/YamlParser.php (1)
  • YamlParser (9-93)
src/Result/Issue.php (1)
  • Issue (7-55)
src/Validator/EncodingValidator.php (5)
  • EncodingValidator (14-165)
  • supportsParser (87-90)
  • resultTypeOnValidationFailure (92-95)
  • processFile (19-64)
  • formatIssueMessage (66-82)
🪛 PHPMD (2.15.0)
tests/src/Validator/EncodingValidatorTest.php

189-189: Avoid unused local variables such as '$method'. (Unused Code Rules)

(UnusedLocalVariable)

🔇 Additional comments (21)
tests/src/Fixtures/config/unsupported.txt (1)

1-1: LGTM!

Removing the trailing newline from the test fixture is fine.

src/Validator/ValidatorRegistry.php (1)

21-21: LGTM!

The EncodingValidator is correctly added to the validator registry.

tests/src/Validator/ValidatorRegistryTest.php (1)

9-9: Test coverage properly updated!

The test correctly verifies the presence of EncodingValidator and updates the expected validator count.

Also applies to: 27-28

README.md (1)

88-88: Documentation looks good!

The EncodingValidator description is comprehensive and accurately reflects its functionality.

src/Validator/EncodingValidator.php (7)

19-64: Well-structured validation logic!

The processFile method provides comprehensive encoding checks with proper error handling and logging.


66-82: Clean message formatting implementation!

Good use of type checking and consistent formatting with color support.


87-95: Configuration methods are correctly implemented!

Appropriate parser support and warning level for encoding issues.


97-106: UTF-8 and BOM detection correctly implemented!

Using mb_check_encoding and checking for the correct UTF-8 BOM sequence.


111-139: Comprehensive invisible character detection!

Excellent coverage of problematic Unicode characters and control characters with proper whitespace exclusions.


141-152: Unicode normalization check with proper extension handling!

Good practice to check for intl extension availability before using Normalizer.


154-164: JSON validation with BOM handling!

Smart approach to remove BOM before JSON validation to prevent parsing errors.

tests/src/Validator/EncodingValidatorTest.php (10)

21-38: Well-structured setup and teardown methods.

The test setup creates a temporary directory structure and proper cleanup, which is good practice for file-based tests.


40-49: Comprehensive parser support verification.

The test correctly verifies that all expected parsers are supported and the count is exact, preventing regressions.


67-77: Good test for invalid UTF-8 detection.

The test properly creates invalid UTF-8 content and validates the correct error message is returned.


91-103: Thorough invisible character testing.

The test correctly validates detection of multiple invisible Unicode characters and verifies specific character types are identified.


117-133: Proper handling of optional dependency.

The test correctly skips when the Intl extension is not available and tests Unicode normalization detection effectively.


135-149: Appropriate use of mocking for JSON syntax testing.

Using a mock parser here is the correct approach since the real JsonParser would fail during construction with invalid JSON, preventing the validator from being tested.


151-165: Smart test design for BOM handling in JSON.

The test correctly verifies that JSON with BOM triggers a BOM warning but not a JSON syntax error, showing the validator handles BOM stripping for JSON validation.


196-209: Excellent message formatting validation.

The test thoroughly validates the formatted output including warning level, prefix usage, and multiple issue handling.


211-224: Comprehensive multiple issues testing.

The test effectively validates that multiple encoding issues can be detected simultaneously in a single file.


226-235: Essential baseline test for clean files.

This test ensures that valid files don't trigger false positives, which is crucial for validator reliability.

Comment on lines +179 to +194
public function testFileReadError(): void
{
// Test with a file that exists but can't be read (we'll mock this)
$filePath = $this->testFilesPath.'/readable.yaml';
file_put_contents($filePath, 'key: value');

$parser = new YamlParser($filePath);

// Mock file_get_contents failure by using a non-readable path
$reflection = new \ReflectionClass($this->validator);
$method = $reflection->getMethod('processFile');

// For this test, we'll just verify the method handles missing files gracefully
$issues = $this->validator->processFile($parser);
$this->assertEmpty($issues);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incomplete test implementation for file read errors.

The test method doesn't actually simulate a file read error condition. It creates a readable file and expects no issues, but the test name and comment suggest it should test unreadable files.

Consider this approach to properly test file read error handling:

 public function testFileReadError(): void
 {
-    // Test with a file that exists but can't be read (we'll mock this)
-    $filePath = $this->testFilesPath.'/readable.yaml';
-    file_put_contents($filePath, 'key: value');
-
-    $parser = new YamlParser($filePath);
-
-    // Mock file_get_contents failure by using a non-readable path
-    $reflection = new \ReflectionClass($this->validator);
-    $method = $reflection->getMethod('processFile');
-
-    // For this test, we'll just verify the method handles missing files gracefully
-    $issues = $this->validator->processFile($parser);
-    $this->assertEmpty($issues);
+    // Test with a non-existent file path
+    $filePath = $this->testFilesPath.'/non-existent.yaml';
+    
+    $mockParser = $this->createMock(YamlParser::class);
+    $mockParser->method('getFilePath')->willReturn($filePath);
+    
+    $issues = $this->validator->processFile($mockParser);
+    $this->assertEmpty($issues); // Should handle gracefully and return empty array
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public function testFileReadError(): void
{
// Test with a file that exists but can't be read (we'll mock this)
$filePath = $this->testFilesPath.'/readable.yaml';
file_put_contents($filePath, 'key: value');
$parser = new YamlParser($filePath);
// Mock file_get_contents failure by using a non-readable path
$reflection = new \ReflectionClass($this->validator);
$method = $reflection->getMethod('processFile');
// For this test, we'll just verify the method handles missing files gracefully
$issues = $this->validator->processFile($parser);
$this->assertEmpty($issues);
}
public function testFileReadError(): void
{
// Test with a non-existent file path
$filePath = $this->testFilesPath.'/non-existent.yaml';
$mockParser = $this->createMock(YamlParser::class);
$mockParser->method('getFilePath')->willReturn($filePath);
$issues = $this->validator->processFile($mockParser);
$this->assertEmpty($issues); // Should handle gracefully and return empty array
}
🧰 Tools
🪛 PHPMD (2.15.0)

189-189: Avoid unused local variables such as '$method'. (Unused Code Rules)

(UnusedLocalVariable)

🤖 Prompt for AI Agents
In tests/src/Validator/EncodingValidatorTest.php around lines 179 to 194, the
testFileReadError method does not simulate a file read error as intended; it
creates a readable file but does not test unreadable file scenarios. To fix
this, modify the test to create a file and then change its permissions to make
it unreadable or mock the file reading function to simulate a failure, then
verify that processFile handles this error gracefully by asserting the expected
behavior or returned issues.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
composer.json (1)

15-16: Re-evaluate making ext-intl and ext-mbstring hard install-time requirements

Declaring these extensions under "require" means Composer will abort the install on systems where either extension is missing (typical in many slim Docker / Alpine or Windows setups).
If the new EncodingValidator is the only code path that needs them and it is optional (e.g. can be disabled via CLI flag), consider:

  1. Moving the two entries to a "suggest" block and adding a runtime guard:
    if (!extension_loaded('intl') || !extension_loaded('mbstring')) {
        // skip EncodingValidator registration / emit warning
    }
  2. Or, keep them in "require" but document the new mandatory platform deps prominently in the README / CHANGELOG to avoid surprise build breaks.

Either way, double-check the CI matrix to ensure these extensions are present, or the pipeline will start failing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a82bef0 and eb943a8.

⛔ Files ignored due to path filters (1)
  • composer.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • composer.json (1 hunks)
  • tests/src/Command/ValidateTranslationCommandTest.php (2 hunks)
  • tests/src/Parser/ParserCacheTest.php (1 hunks)
  • tests/src/Result/ValidationResultCliRendererTest.php (1 hunks)
  • tests/src/Utility/ClassUtilityTest.php (2 hunks)
  • tests/src/Validator/SchemaValidatorTest.php (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/src/Utility/ClassUtilityTest.php
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/src/Parser/ParserCacheTest.php (1)
src/Parser/YamlParser.php (1)
  • YamlParser (9-93)
🔇 Additional comments (6)
tests/src/Validator/SchemaValidatorTest.php (1)

99-99: LGTM: Improved assertion robustness with explicit string casting.

This defensive change ensures the assertion works correctly even if the message value implements Stringable interface or is another type convertible to string.

tests/src/Command/ValidateTranslationCommandTest.php (2)

139-139: LGTM: Robust handling of JSON-decoded message values.

Explicit string casting ensures the assertion works correctly with JSON-decoded output, preventing potential type-related issues.


180-180: LGTM: Consistent string casting for JSON output validation.

This change maintains consistency with the similar casting applied in other JSON format tests and ensures reliable assertions.

tests/src/Result/ValidationResultCliRendererTest.php (2)

354-354: LGTM: Defensive string casting for message formatting tests.

This ensures the assertion works correctly regardless of whether formatIssueMessage returns a string or an object implementing Stringable.


358-358: LGTM: Consistent string casting for verbose mode testing.

This maintains consistency with the non-verbose mode test above and ensures reliable assertion behavior.

tests/src/Parser/ParserCacheTest.php (1)

131-132: LGTM: Enhanced cache key validation with explicit string casting.

Both changes ensure reliable assertion behavior when validating cache key contents, handling cases where cache keys might be objects implementing Stringable or other convertible types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant