Skip to content

The behavior for blank node parsing changed in later versions of PHP. #403

@charlie-curtis

Description

@charlie-curtis

Description

When upgrading from PHP8.1.22 to PHP8.3.4, there is inconsistent output between versions -- specifically for how blank nodes are handled.

Example

Input

<table>
	<caption>
		Cool table
	</caption>
	<tfoot>
	<tr>
		<th>I can do so much!</th>
	</tr>
	</tfoot>
	<tr>
		<td style="font-size:16pt;
      color:#F00;font-family:sans-serif;
      text-align:center;">Wow</td>
	</tr>
</table>

PHP8.1.22 output

<table><caption>
		Cool table
	</caption>
	<tfoot><tr><th>I can do so much!</th>
	</tr></tfoot><tr><td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
	</tr></table>

PHP8.3.4 output

<table>
	<caption>
		Cool table
	</caption>
	<tfoot>
	<tr>
		<th>I can do so much!</th>
	</tr>
	</tfoot>
	<tr>
		<td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
	</tr>
</table>

Impact

A strong case can be made that the PHP8.3.4 output is "more correct", and I wouldn't argue. The issue is that there is a ton of existing code and applications that maybe relying on the old behavior in order to "work". Having an optional backwards-compatible solution would ease the transition as many upgrade beyond PHP8.1.

Investigation

These steps have been performed:

  • verified that both PHP versions used for testing have the same version of libxml (2.9.1)
  • localized the behavior change to the loadHtml call here
  • verified that passing the LIBXML_NOBLANKS option fixed the output discrepancy

I think this php-src commit changed the default behavior of "blank" parsing from "don't keep" to "keep".

Suggested Fix

Much like LIBXML_PARSEHUGE is an optional configuration value that can be supplied here, I propose adding LIBXML_NOBLANKS as an optional value in order to better handle backwards compatibility as mentioned above without impacting existing use cases.

Similar issues

#237
#269

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions