Test with chardet 6.0.0.post1 by musicinmybrain · Pull Request #1017 · pydantic/httpx2

musicinmybrain · 2026-06-02T12:50:31Z

Summary

Run the tests with chardet version 6.0.0.post1 instead of 5.2.0. Adjust expectations in three tests, in which the encoded text is short enough that the encoding may be detected ambiguously, and chardet may validly report a different encoding than the one that was specified when encoding the text.

This is an alternative approach to #1016. Based on feedback, it drops support for testing with chardet 5.x and focuses solely on 6.x.

This also tightens the assertion that was loosened in encode/httpx#3773; in #1016 (comment), @Kludex wrote that “I don't think we should have "in" in assertions.” All three tests now use identical assertions and are documented with identical comments.

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change. N/A; this is a testing change.
I've updated the documentation accordingly. Nothing to update

Additional information

Further adjustments would be needed to use the 7.x branch of chardet, which was rewritten with an LLM and controversially relicensed. See #1016 (comment) for more details.

codspeed-hq · 2026-06-02T12:52:01Z

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 7 skipped benchmarks¹

_{Comparing musicinmybrain:chardet6-again (2591068) with main (2b0daf5)²}

7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (30b3837) during the generation of this report, so 2b0daf5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

cubic-dev-ai

No issues found across 4 files

_{Re-trigger cubic}

The previous French sample used only accented letters in bytes 0xA0-0xFF, where ISO-8859-1 and WINDOWS-1252 are byte-identical, so chardet was free to report either encoding. Adding curly quotes and an em dash (bytes 0x80-0x9F, control characters in ISO-8859-1) makes the byte string unambiguously WINDOWS-1252 across chardet versions, so the expectation no longer needs to change on upgrade.

Kludex

I think it's more understandable if it's unambiguous - so I modify the input a bit.

Thanks!

musicinmybrain · 2026-06-02T15:46:02Z

I think it's more understandable if it's unambiguous - so I modify the input a bit.

I think that this is an excellent improvement. Thanks!

The new texts also detect as Windows-1252 with chardet 7.x, which is nice. Unfortunately, chardet 7.x uses the string 'Windows-1252', vs. chardet 6.x’s 'WINDOWS-1252', so the expectations would still have to be adjusted again, but at least the discrepancy is smaller.

Kludex · 2026-06-02T16:04:23Z

chardet 7.x uses the string 'Windows-1252', vs. chardet 6.x’s 'WINDOWS-1252'

😢

Happy to review yours PRs at any time! 🙏

Test with chardet 6.0.0.post1

7babf38

musicinmybrain mentioned this pull request Jun 2, 2026

Adapt two more tests for chardet 6.x compatibility #1016

Closed

3 tasks

cubic-dev-ai Bot reviewed Jun 2, 2026

View reviewed changes

Kludex approved these changes Jun 2, 2026

View reviewed changes

Kludex merged commit 26dd31e into pydantic:main Jun 2, 2026
12 checks passed

Kludex mentioned this pull request Jun 11, 2026

Version 2.4.0 #1031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test with chardet 6.0.0.post1#1017

Test with chardet 6.0.0.post1#1017
Kludex merged 2 commits into
pydantic:mainfrom
musicinmybrain:chardet6-again

musicinmybrain commented Jun 2, 2026

Uh oh!

codspeed-hq Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Kludex left a comment

Uh oh!

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Kludex commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

musicinmybrain commented Jun 2, 2026

Summary

Checklist

Additional information

Uh oh!

codspeed-hq Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kludex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

musicinmybrain commented Jun 2, 2026

Uh oh!

Kludex commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented Jun 2, 2026 •

edited

Loading