Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented Nov 1, 2025

Resolves: #xxx

Change Log


Added

Fixed

Changed

  • Make `HTMLDocument` reliable

Removed

Deprecated

Security

@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.67%. Comparing base (2342f96) to head (967a78d).
⚠️ Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1948      +/-   ##
==========================================
+ Coverage   79.65%   79.67%   +0.02%     
==========================================
  Files         832      832              
  Lines       24757    24748       -9     
==========================================
- Hits        19719    19717       -2     
+ Misses       5038     5031       -7     
Components Coverage Δ
etl 88.82% <ø> (ø)
cli 85.96% <ø> (ø)
lib-array-dot 95.00% <ø> (ø)
lib-azure-sdk 60.39% <ø> (ø)
lib-doctrine-dbal-bulk 95.14% <ø> (ø)
lib-filesystem 80.68% <ø> (ø)
lib-types 88.34% <88.23%> (+0.35%) ⬆️
lib-parquet 68.35% <ø> (ø)
lib-parquet-viewer 83.04% <ø> (ø)
lib-snappy 89.71% <ø> (ø)
bridge-filesystem-async-aws 91.00% <ø> (ø)
bridge-filesystem-azure 89.47% <ø> (ø)
bridge-monolog-http 96.91% <ø> (ø)
bridge-openapi-specification 94.50% <ø> (ø)
symfony-http-foundation 73.17% <ø> (ø)
adapter-chartjs 86.36% <ø> (ø)
adapter-csv 89.08% <ø> (ø)
adapter-doctrine 90.97% <ø> (ø)
adapter-elasticsearch 97.17% <ø> (ø)
adapter-google-sheet 91.40% <ø> (ø)
adapter-http 60.56% <ø> (ø)
adapter-json 89.80% <ø> (ø)
adapter-logger 83.33% <ø> (ø)
adapter-meilisearch 97.87% <ø> (ø)
adapter-parquet 78.91% <ø> (ø)
adapter-text 88.09% <ø> (ø)
adapter-xml 83.07% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd force-pushed the bugfix/html-document branch from 8f9836f to 434fc69 Compare November 2, 2025 10:45
@stloyd stloyd force-pushed the bugfix/html-document branch 2 times, most recently from dc185c0 to caa17e5 Compare November 2, 2025 10:48
@stloyd stloyd force-pushed the bugfix/html-document branch from caa17e5 to ead4313 Compare November 2, 2025 10:55
@stloyd stloyd marked this pull request as ready for review November 2, 2025 11:02
@stloyd stloyd requested a review from norberttech as a code owner November 2, 2025 11:02
@norberttech
Copy link
Member

norberttech commented Nov 2, 2025

It looks much easier now!

What about scenarios where there are some spaces between html elements? Like:

<!DOCTYPE html><html>   <head><title></title></head>    <body><p>invalid</p>  </body>  </html>

You are removing \n and \t but will this regexp work for the above code?

@stloyd
Copy link
Member Author

stloyd commented Nov 2, 2025

While detecting type_html has no problems. When it comes to HTMLDocument, it will work only with a string in tests, cause otherwise it will be dependent on libxml when tested with objects... and that thing goes mad 😂 (it sometimes removes spaces, sometimes just a few of them ;))

Details
Expected :'<!DOCTYPE html><html>   <head><title></title></head>    <body><p>invalid</p>  </body>  </html>'
Actual   :'<!DOCTYPE html><html><head><title></title></head>    <body><p>invalid</p>    </body></html>'

But I added a test for that case with a string.

@norberttech
Copy link
Member

norberttech commented Nov 2, 2025

While detecting type_html has no problems. When it comes to HTMLDocument, it will work only with a string in tests, cause otherwise it will be dependent on libxml when tested with objects... and that thing goes mad 😂 (it sometimes removes spaces, sometimes just a few of them ;))

Expected :'<!DOCTYPE html><html>   <head><title></title></head>    <body><p>invalid</p>  </body>  </html>'
Actual   :'<!DOCTYPE html><html><head><title></title></head>    <body><p>invalid</p>    </body></html>'

But I added a test for that case with a string.

As long as those regexps will detect <!DOCTYPE html><html> <head><title></title></head> <body><p>invalid</p> </body> </html> as valid html, I would not even test the output to make tests less fragile.

All I would test is that following string: <!DOCTYPE html><html> <head><title></title></head> <body><p>invalid</p> </body> </html> is not throwing InvalidArgumentException

@stloyd
Copy link
Member Author

stloyd commented Nov 2, 2025

There is a test:

    public function test_create_with_string_with_multiple_spaces() : void
    {
        $html = '<!DOCTYPE html><html>   <head><title></title></head>    <body><p>invalid</p>  </body>  </html>';

        $document = new HTMLDocument($html);

        self::assertSame(
            $html,
            $document->toString(),
        );
    }

@norberttech norberttech merged commit 9e7c5f7 into flow-php:1.x Nov 2, 2025
17 checks passed
@stloyd stloyd deleted the bugfix/html-document branch November 2, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants