Skip to content

Adapt two more tests for chardet 6.x compatibility#1016

Closed
musicinmybrain wants to merge 1 commit into
pydantic:mainfrom
musicinmybrain:chardet6
Closed

Adapt two more tests for chardet 6.x compatibility#1016
musicinmybrain wants to merge 1 commit into
pydantic:mainfrom
musicinmybrain:chardet6

Conversation

@musicinmybrain

Copy link
Copy Markdown
Contributor

Summary

See encode/httpx#3773, which was merged into httpx before httpx2 was forked; this extends that approach to similar tests that were added in httpx2, allowing the entire test suite to pass with chardet 6.x while preserving backwards-compatibility with chardet 5.x. This is helpful in Fedora, where our python-chardet package is at 6.0.0.post1.

While this makes the tests compatible with both chardet 5.x and 6.x, I did not update the version of chardet in the dev dependency group, which is still at ==5.2.0.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change. N/A; this is a testing change.
  • I've updated the documentation accordingly. Nothing to update

Additional information

This fixes the following two test failures with chardet==6.0.0.post1.

========================================================================================= FAILURES =========================================================================================
_________________________________________________________________________ test_client_decode_text_using_autodetect _________________________________________________________________________

    def test_client_decode_text_using_autodetect() -> None:
        # Ensure that a 'default_encoding=autodetect' on the response allows for
        # encoding autodetection to be used when no "Content-Type: text/plain; charset=..."
        # info is present.
        #
        # Here we have some french text encoded with ISO-8859-1, rather than UTF-8.
        text = (
            "Non-seulement Despréaux ne se trompait pas, mais de tous les écrivains "
            "que la France a produits, sans excepter Voltaire lui-même, imprégné de "
            "l'esprit anglais par son séjour à Londres, c'est incontestablement "
            "Molière ou Poquelin qui reproduit avec l'exactitude la plus vive et la "
            "plus complète le fond du génie français."
        )

        def cp1252_but_no_content_type(request: httpx2.Request) -> httpx2.Response:
            content = text.encode("ISO-8859-1")
            return httpx2.Response(200, content=content)

        transport = httpx2.MockTransport(cp1252_but_no_content_type)
        with httpx2.Client(transport=transport, default_encoding=autodetect) as client:
            response = client.get("http://www.example.com")

            assert response.status_code == 200
            assert response.reason_phrase == "OK"
>           assert response.encoding == "ISO-8859-1"
E           AssertionError: assert 'WINDOWS-1252' == 'ISO-8859-1'
E             
E             - ISO-8859-1
E             + WINDOWS-1252

tests/httpx2/client/test_client.py:432: AssertionError
_____________________________________________________________________ test_client_decode_text_using_explicit_encoding ______________________________________________________________________

    def test_client_decode_text_using_explicit_encoding() -> None:
        # Ensure that a 'default_encoding="..."' on the response is used for text decoding
        # when no "Content-Type: text/plain; charset=..."" info is present.
        #
        # Here we have some french text encoded with ISO-8859-1, rather than UTF-8.
        text = (
            "Non-seulement Despréaux ne se trompait pas, mais de tous les écrivains "
            "que la France a produits, sans excepter Voltaire lui-même, imprégné de "
            "l'esprit anglais par son séjour à Londres, c'est incontestablement "
            "Molière ou Poquelin qui reproduit avec l'exactitude la plus vive et la "
            "plus complète le fond du génie français."
        )

        def cp1252_but_no_content_type(request: httpx2.Request) -> httpx2.Response:
            content = text.encode("ISO-8859-1")
            return httpx2.Response(200, content=content)

        transport = httpx2.MockTransport(cp1252_but_no_content_type)
        with httpx2.Client(transport=transport, default_encoding=autodetect) as client:
            response = client.get("http://www.example.com")

            assert response.status_code == 200
            assert response.reason_phrase == "OK"
>           assert response.encoding == "ISO-8859-1"
E           AssertionError: assert 'WINDOWS-1252' == 'ISO-8859-1'
E             
E             - ISO-8859-1
E             + WINDOWS-1252

tests/httpx2/client/test_client.py:459: AssertionError
================================================================================= short test summary info ==================================================================================
SKIPPED [1] tests/httpx2/client/test_auth.py:253: netrc files without a password are valid from Python >= 3.11
======================================================================== 2 failed, 1658 passed, 1 skipped in 4.09s =========================================================================

See encode/httpx#3773, which was merged into
httpx before httpx2 was forked; this commit extends that approach to
tests that were added in httpx2.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

@codspeed-hq

codspeed-hq Bot commented Jun 2, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 7 skipped benchmarks1


Comparing musicinmybrain:chardet6 (44f6684) with main (2b0daf5)2

Open in CodSpeed

Footnotes

  1. 7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (30b3837) during the generation of this report, so 2b0daf5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codspeed-hq

codspeed-hq Bot commented Jun 2, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 7 skipped benchmarks1


Comparing musicinmybrain:chardet6 (4dab3c2) with main (2b0daf5)2

Open in CodSpeed

Footnotes

  1. 7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (30b3837) during the generation of this report, so 2b0daf5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@musicinmybrain

Copy link
Copy Markdown
Contributor Author

We haven’t updated Fedora’s python-chardet package to the (LLM-rewritten and controversially relicensed) 7.x series. It hasn’t been ruled out, but I’m not eager to do so right now.

I did do a quick check in the virtualenv, and observed that testing with the latest chardet 7.4.3 would bring several new regressions:

test_response_no_charset_with_cp_1252_content

E       AssertionError: assert 'Euro Currenc...nopqrstuzwxyz' == 'Euro Currenc...nopqrstuzwxyz'
E         
E         - Euro Currency: € abcdefghijklmnopqrstuzwxyz
E         ?                ^
E         + Euro Currency: א abcdefghijklmnopqrstuzwxyz
E         ?                ^

test_text_decoder_with_autodetect[asyncio-data2-cp1252]

E       AssertionError: assert 'Euro charact...nopqrstuvwxyz' == 'Euro charact...nopqrstuvwxyz'
E         
E         - Euro character: ˆ! abcdefghijklmnopqrstuvwxyz
E         ?                 ^
E         + Euro character: à! abcdefghijklmnopqrstuvwxyz
E         ?                 ^

test_text_decoder_with_autodetect[trio-data2-cp1252]

E       AssertionError: assert 'Euro charact...nopqrstuvwxyz' == 'Euro charact...nopqrstuvwxyz'
E         
E         - Euro character: ˆ! abcdefghijklmnopqrstuvwxyz
E         ?                 ^
E         + Euro character: à! abcdefghijklmnopqrstuvwxyz
E         ?                 ^

I therefore decided not to try to add chardet 7.x compatibility to the test suite at this time.

@Kludex

Kludex commented Jun 2, 2026

Copy link
Copy Markdown
Member

I don't think we should have "in" in assertions. Can you bump chardet instead? That version is 3 years old anyway 😬

@musicinmybrain musicinmybrain mentioned this pull request Jun 2, 2026
3 tasks
@musicinmybrain

Copy link
Copy Markdown
Contributor Author

I don't think we should have "in" in assertions. Can you bump chardet instead? That version is 3 years old anyway 😬

Sure, gladly. I’ve opened a new PR, #1017, with this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants