Make WordPress Core

Changeset 60616


Ignore:
Timestamp:
08/07/2025 07:58:33 AM (8 months ago)
Author:
jonsurrell
Message:

KSES: Prevent normalization from unescaping escaped numeric character references.

Fixes an issue where wp_kses_normalize_entities would transform inputs like "'" into "'", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in https://github.com/WordPress/wordpress-develop/pull/9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.

Location:
trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-includes/kses.php

    r60405 r60616  
    19591959    $content = str_replace( '&', '&', $content );
    19601960
    1961     // Change back the allowed entities in our list of allowed entities.
     1961    /*
     1962     * Decode any character references that are now double-encoded.
     1963     *
     1964     * It's important that the following normalizations happen in the correct order.
     1965     *
     1966     * At this point, all `&` have been transformed to `&`. Double-encoded named character
     1967     * references like `&` will be decoded back to their single-encoded form `&`.
     1968     *
     1969     * First, numeric (decimal and hexadecimal) character references must be handled so that
     1970     * `	` becomes `	`. If the named character references were handled first, there
     1971     * would be no way to know whether the double-encoded character reference had been produced
     1972     * in this function or was the original input.
     1973     *
     1974     * Consider the two examples, first with named entity decoding followed by numeric
     1975     * entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
     1976     * string processing from left to right:
     1977     *
     1978     * | Input        | &-encoded        | Named ref double-decoded  | Numeric ref double-decoded |
     1979     * | ------------ | ---------------- | ------------------------- | -------------------------- |
     1980     * | `.`     | `.`     | `.`              | `.`                   |
     1981     * | `.` | `.` | `.`              | `.`                   |
     1982     *
     1983     * Notice in the example above that different inputs result in the same result. The second case
     1984     * was not normalized and produced HTML that is semantically different from the input.
     1985     *
     1986     * | Input        | &-encoded        |  Numeric ref double-decoded | Named ref double-decoded |
     1987     * | ------------ | ---------------- | --------------------------- | ------------------------ |
     1988     * | `.`     | `.`     | `.`                    | `.`                 |
     1989     * | `.` | `.` | `.`            | `.`             |
     1990     *
     1991     * Here, each input is normalized to an appropriate output.
     1992     */
     1993    $content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
     1994    $content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
    19621995    if ( 'xml' === $context ) {
    19631996        $content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
     
    19651998        $content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
    19661999    }
    1967     $content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
    1968     $content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
    19692000
    19702001    return $content;
  • trunk/tests/phpunit/tests/kses.php

    r60486 r60616  
    598598            'Encoded named ref &'        => array( '&', '&' ),
    599599            'Encoded named ref &'       => array( '&', '&' ),
     600            'Encoded numeric ref ''      => array( ''', ''' ),
     601            'Encoded numeric ref ''      => array( ''', ''' ),
     602            'Encoded numeric ref ''     => array( ''', ''' ),
     603            'Encoded hex ref ''         => array( ''', ''' ),
     604            'Encoded hex ref ''         => array( ''', ''' ),
     605            'Encoded hex ref ''        => array( ''', ''' ),
    600606
    601607            /*
     
    610616    /**
    611617     * @ticket 26290
     618     * @ticket 63630
    612619     *
    613620     * @dataProvider data_normalize_entities
Note: See TracChangeset for help on using the changeset viewer.