Adapt two more tests for chardet 6.x compatibility by musicinmybrain · Pull Request #1016 · pydantic/httpx2

musicinmybrain · 2026-06-02T10:06:44Z

Summary

See encode/httpx#3773, which was merged into httpx before httpx2 was forked; this extends that approach to similar tests that were added in httpx2, allowing the entire test suite to pass with chardet 6.x while preserving backwards-compatibility with chardet 5.x. This is helpful in Fedora, where our python-chardet package is at 6.0.0.post1.

While this makes the tests compatible with both chardet 5.x and 6.x, I did not update the version of chardet in the dev dependency group, which is still at ==5.2.0.

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change. N/A; this is a testing change.
I've updated the documentation accordingly. Nothing to update

Additional information

This fixes the following two test failures with chardet==6.0.0.post1.

========================================================================================= FAILURES =========================================================================================
_________________________________________________________________________ test_client_decode_text_using_autodetect _________________________________________________________________________

    def test_client_decode_text_using_autodetect() -> None:
        # Ensure that a 'default_encoding=autodetect' on the response allows for
        # encoding autodetection to be used when no "Content-Type: text/plain; charset=..."
        # info is present.
        #
        # Here we have some french text encoded with ISO-8859-1, rather than UTF-8.
        text = (
            "Non-seulement Despréaux ne se trompait pas, mais de tous les écrivains "
            "que la France a produits, sans excepter Voltaire lui-même, imprégné de "
            "l'esprit anglais par son séjour à Londres, c'est incontestablement "
            "Molière ou Poquelin qui reproduit avec l'exactitude la plus vive et la "
            "plus complète le fond du génie français."
        )

        def cp1252_but_no_content_type(request: httpx2.Request) -> httpx2.Response:
            content = text.encode("ISO-8859-1")
            return httpx2.Response(200, content=content)

        transport = httpx2.MockTransport(cp1252_but_no_content_type)
        with httpx2.Client(transport=transport, default_encoding=autodetect) as client:
            response = client.get("http://www.example.com")

            assert response.status_code == 200
            assert response.reason_phrase == "OK"
>           assert response.encoding == "ISO-8859-1"
E           AssertionError: assert 'WINDOWS-1252' == 'ISO-8859-1'
E             
E             - ISO-8859-1
E             + WINDOWS-1252

tests/httpx2/client/test_client.py:432: AssertionError
_____________________________________________________________________ test_client_decode_text_using_explicit_encoding ______________________________________________________________________

    def test_client_decode_text_using_explicit_encoding() -> None:
        # Ensure that a 'default_encoding="..."' on the response is used for text decoding
        # when no "Content-Type: text/plain; charset=..."" info is present.
        #
        # Here we have some french text encoded with ISO-8859-1, rather than UTF-8.
        text = (
            "Non-seulement Despréaux ne se trompait pas, mais de tous les écrivains "
            "que la France a produits, sans excepter Voltaire lui-même, imprégné de "
            "l'esprit anglais par son séjour à Londres, c'est incontestablement "
            "Molière ou Poquelin qui reproduit avec l'exactitude la plus vive et la "
            "plus complète le fond du génie français."
        )

        def cp1252_but_no_content_type(request: httpx2.Request) -> httpx2.Response:
            content = text.encode("ISO-8859-1")
            return httpx2.Response(200, content=content)

        transport = httpx2.MockTransport(cp1252_but_no_content_type)
        with httpx2.Client(transport=transport, default_encoding=autodetect) as client:
            response = client.get("http://www.example.com")

            assert response.status_code == 200
            assert response.reason_phrase == "OK"
>           assert response.encoding == "ISO-8859-1"
E           AssertionError: assert 'WINDOWS-1252' == 'ISO-8859-1'
E             
E             - ISO-8859-1
E             + WINDOWS-1252

tests/httpx2/client/test_client.py:459: AssertionError
================================================================================= short test summary info ==================================================================================
SKIPPED [1] tests/httpx2/client/test_auth.py:253: netrc files without a password are valid from Python >= 3.11
======================================================================== 2 failed, 1658 passed, 1 skipped in 4.09s =========================================================================

See encode/httpx#3773, which was merged into httpx before httpx2 was forked; this commit extends that approach to tests that were added in httpx2.

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

codspeed-hq · 2026-06-02T10:08:22Z

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 7 skipped benchmarks¹

_{Comparing musicinmybrain:chardet6 (44f6684) with main (2b0daf5)²}

7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (30b3837) during the generation of this report, so 2b0daf5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

codspeed-hq · 2026-06-02T10:08:22Z

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 7 skipped benchmarks¹

_{Comparing musicinmybrain:chardet6 (4dab3c2) with main (2b0daf5)²}

7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (30b3837) during the generation of this report, so 2b0daf5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

musicinmybrain · 2026-06-02T10:18:17Z

We haven’t updated Fedora’s python-chardet package to the (LLM-rewritten and controversially relicensed) 7.x series. It hasn’t been ruled out, but I’m not eager to do so right now.

I did do a quick check in the virtualenv, and observed that testing with the latest chardet 7.4.3 would bring several new regressions:

The three tests from this PR and from Adapt test_response_decode_text_using_autodetect for chardet 6.0 encode/httpx#3773, which had their detected encodings change from ISO-8859-1 to WINDOWS-1252 between chardet 5 and 6, are detected as iso8859-15 in chardet 7.
Three tests involving the euro currency symbol € fail, below. The discussion in chardet 6.0.0 bug? - change since chardet 5.2.0 chardet/chardet#317, while it concerns chardet 6.x, may still be somewhat relevant here. All of these seem to have to do with changes in the detected encodings for strings that are short enough that their encodings could be considered ambiguous.

test_response_no_charset_with_cp_1252_content

E       AssertionError: assert 'Euro Currenc...nopqrstuzwxyz' == 'Euro Currenc...nopqrstuzwxyz'
E         
E         - Euro Currency: € abcdefghijklmnopqrstuzwxyz
E         ?                ^
E         + Euro Currency: א abcdefghijklmnopqrstuzwxyz
E         ?                ^

test_text_decoder_with_autodetect[asyncio-data2-cp1252]

E       AssertionError: assert 'Euro charact...nopqrstuvwxyz' == 'Euro charact...nopqrstuvwxyz'
E         
E         - Euro character: ˆ! abcdefghijklmnopqrstuvwxyz
E         ?                 ^
E         + Euro character: à! abcdefghijklmnopqrstuvwxyz
E         ?                 ^

test_text_decoder_with_autodetect[trio-data2-cp1252]

E       AssertionError: assert 'Euro charact...nopqrstuvwxyz' == 'Euro charact...nopqrstuvwxyz'
E         
E         - Euro character: ˆ! abcdefghijklmnopqrstuvwxyz
E         ?                 ^
E         + Euro character: à! abcdefghijklmnopqrstuvwxyz
E         ?                 ^

I therefore decided not to try to add chardet 7.x compatibility to the test suite at this time.

Kludex · 2026-06-02T11:03:43Z

I don't think we should have "in" in assertions. Can you bump chardet instead? That version is 3 years old anyway 😬

musicinmybrain · 2026-06-02T12:51:14Z

I don't think we should have "in" in assertions. Can you bump chardet instead? That version is 3 years old anyway 😬

Sure, gladly. I’ve opened a new PR, #1017, with this approach.

Adapt two more tests for chardet 6.x compatibility

44f6684

See encode/httpx#3773, which was merged into httpx before httpx2 was forked; this commit extends that approach to tests that were added in httpx2.

musicinmybrain force-pushed the chardet6 branch from 4dab3c2 to 44f6684 Compare June 2, 2026 10:07

cubic-dev-ai Bot reviewed Jun 2, 2026

View reviewed changes

musicinmybrain mentioned this pull request Jun 2, 2026

Test with chardet 6.0.0.post1 #1017

Merged

3 tasks

musicinmybrain closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt two more tests for chardet 6.x compatibility#1016

Adapt two more tests for chardet 6.x compatibility#1016
musicinmybrain wants to merge 1 commit into
pydantic:mainfrom
musicinmybrain:chardet6

musicinmybrain commented Jun 2, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

codspeed-hq Bot commented Jun 2, 2026

Uh oh!

codspeed-hq Bot commented Jun 2, 2026

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Kludex commented Jun 2, 2026

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

musicinmybrain commented Jun 2, 2026

Summary

Checklist

Additional information

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codspeed-hq Bot commented Jun 2, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

codspeed-hq Bot commented Jun 2, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Kludex commented Jun 2, 2026

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants