Description
When upgrading from PHP8.1.22 to PHP8.3.4, there is inconsistent output between versions -- specifically for how blank nodes are handled.
Example
Input
<table>
<caption>
Cool table
</caption>
<tfoot>
<tr>
<th>I can do so much!</th>
</tr>
</tfoot>
<tr>
<td style="font-size:16pt;
color:#F00;font-family:sans-serif;
text-align:center;">Wow</td>
</tr>
</table>
PHP8.1.22 output
<table><caption>
Cool table
</caption>
<tfoot><tr><th>I can do so much!</th>
</tr></tfoot><tr><td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
</tr></table>
PHP8.3.4 output
<table>
<caption>
Cool table
</caption>
<tfoot>
<tr>
<th>I can do so much!</th>
</tr>
</tfoot>
<tr>
<td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
</tr>
</table>
Impact
A strong case can be made that the PHP8.3.4 output is "more correct", and I wouldn't argue. The issue is that there is a ton of existing code and applications that maybe relying on the old behavior in order to "work". Having an optional backwards-compatible solution would ease the transition as many upgrade beyond PHP8.1.
Investigation
These steps have been performed:
- verified that both PHP versions used for testing have the same version of libxml (2.9.1)
- localized the behavior change to the
loadHtml call here
- verified that passing the
LIBXML_NOBLANKS option fixed the output discrepancy
I think this php-src commit changed the default behavior of "blank" parsing from "don't keep" to "keep".
Suggested Fix
Much like LIBXML_PARSEHUGE is an optional configuration value that can be supplied here, I propose adding LIBXML_NOBLANKS as an optional value in order to better handle backwards compatibility as mentioned above without impacting existing use cases.
Similar issues
#237
#269
Description
When upgrading from PHP8.1.22 to PHP8.3.4, there is inconsistent output between versions -- specifically for how blank nodes are handled.
Example
Input
PHP8.1.22 output
PHP8.3.4 output
Impact
A strong case can be made that the PHP8.3.4 output is "more correct", and I wouldn't argue. The issue is that there is a ton of existing code and applications that maybe relying on the old behavior in order to "work". Having an optional backwards-compatible solution would ease the transition as many upgrade beyond PHP8.1.
Investigation
These steps have been performed:
loadHtmlcall hereLIBXML_NOBLANKSoption fixed the output discrepancyI think this php-src commit changed the default behavior of "blank" parsing from "don't keep" to "keep".
Suggested Fix
Much like
LIBXML_PARSEHUGEis an optional configuration value that can be supplied here, I propose addingLIBXML_NOBLANKSas an optional value in order to better handle backwards compatibility as mentioned above without impacting existing use cases.Similar issues
#237
#269