`<format>` assumes strings are encoded in the active code page by CaseyCarter · Pull Request #1834 · microsoft/STL

CaseyCarter · 2021-04-14T03:06:29Z

Refactors all format functionality with knowledge of character encodings into a new class _Fmt_codec. _Fmt_codec internally caches the _Cvtvec structure that other functions previously passed around, and makes use of a new "pure C ABI function" __std_get_cvt to retrieve character conversion info for the active codepage.

The ABI function reuses the __std_code_page and __std_win_error types from <xfilesystem_abi.h>. These could be extracted into a more general __msvc_win_abi.hpp header, but I think <xfilesystem_abi.h> is lightweight enough to just include directly for now.

Drive-by:

BUG: _Parse_precision was not skipping the trailing } in a dynamic precision.
Improvement: test_parse_helper now verifies that the parse consumes the entire input when passed an expected length of npos.

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp

tests/std/tests/P0645R10_text_formatting_utf8/test.cpp

stl/inc/format

miscco

The fact that you cooked that up in a few days makes my imposter syndrome super happy

stl/src/format.cpp

stl/CMakeLists.txt

stl/src/format.cpp

stl/inc/format

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp

barcharcraz

Looks pretty good to me, although the hack to hook into format.cpp to change the codepage needs to change.

I do observe that the ucrt uses the _setmbcp and _getmbcp functions, which seem to set a global (not great).

stl/src/format.cpp

stl/inc/format

tests/std/tests/P0645R10_text_formatting_utf8/test.cpp

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp

CaseyCarter · 2021-04-14T18:42:48Z

Looks pretty good to me, although the hack to hook into format.cpp to change the codepage needs to change.

No lie: this looked even jankier to me today than it did yesterday. I won't mourn its passing.

stl/inc/format

stl/src/format.cpp

Refactors all `format` functionality with knowledge of character encodings into a new class `_Fmt_codec`. `_Fmt_codec` internally caches the `_Cvtvec` structure that other functions previously passed around, and makes use of a new "pure C ABI function" `__std_get_cvt` to retrieve character conversion info for the active codepage. The ABI function reuses the `__std_code_page` and `__std_win_error` types from `<xfilesystem_abi.h>`. These could be extracted into a more general `__msvc_win_abi.hpp` header, but I think `<xfilesystem_abi.h>` is lightweight enough to just include directly for now. Drive-by: * BUG: `_Parse_precision` was not skipping the trailing `}` in a dynamic precision. * Improvement: `test_parse_helper` now verifies that the parse consumes the entire input when passed an expected length of `npos`.

StephanTLavavej · 2021-04-15T08:17:29Z

Now that #1821 has merged, I've force-pushed and retargeted this to main. I inspected the history carefully - acp had accumulated 3 commits beyond feature/format. Because we took feature/format, added more commits to it in merging_format, and then squashed everything into main (squashing is awesome for linear history and I wouldn't give it up for anything, but it means that situations like this need extra care), I determined that the easiest and safest thing to do was to start from main and cherry-pick acp's 3 unique commits. This cherry-picked 99.9% cleanly, with the only conflict being this PR adding inclusion of <xfilesystem_abi.h> next to the removal of <variant>, which was trivial to resolve.

Apologies for the force-push! 😸

stl/inc/format

... fix 2 bugs exposed by that coverage, and address review comments. Drive-by: Remove the `_FORMAT_CODEPAGE` override from the `formatting_utf8` test which is unnecessary since we compile it with `/utf-8`.

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp

tests/std/tests/P0645R10_text_formatting_utf8/test.cpp

StephanTLavavej · 2021-04-20T07:22:59Z

Thanks for fixing this bug! 😻 🎉

CaseyCarter added cxx20 C++20 feature format C++20/23 format labels Apr 14, 2021

CaseyCarter requested a review from a team as a code owner April 14, 2021 03:06

CaseyCarter commented Apr 14, 2021

View reviewed changes

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp Outdated Show resolved Hide resolved

CaseyCarter commented Apr 14, 2021

View reviewed changes

tests/std/tests/P0645R10_text_formatting_utf8/test.cpp Outdated Show resolved Hide resolved

miscco reviewed Apr 14, 2021

View reviewed changes

stl/inc/format Outdated Show resolved Hide resolved

miscco reviewed Apr 14, 2021

View reviewed changes

stl/inc/format Show resolved Hide resolved

miscco reviewed Apr 14, 2021

View reviewed changes

stl/inc/format Outdated Show resolved Hide resolved

miscco approved these changes Apr 14, 2021

View reviewed changes

StephanTLavavej reviewed Apr 14, 2021

View reviewed changes

barcharcraz suggested changes Apr 14, 2021

View reviewed changes

CaseyCarter commented Apr 14, 2021

View reviewed changes

tests/std/tests/P0645R10_text_formatting_legacy_text_encoding/test.cpp Show resolved Hide resolved

CaseyCarter requested review from StephanTLavavej and barcharcraz April 14, 2021 18:43

miscco reviewed Apr 14, 2021

View reviewed changes

stl/inc/format Outdated Show resolved Hide resolved

miscco approved these changes Apr 14, 2021

View reviewed changes

barcharcraz reviewed Apr 14, 2021

View reviewed changes

stl/src/format.cpp Show resolved Hide resolved

CaseyCarter added 3 commits April 15, 2021 01:06

More hygienic testing method

eaa454f

Review comments

9494a9c

StephanTLavavej force-pushed the acp branch from 86f9f2c to 9494a9c Compare April 15, 2021 08:08

StephanTLavavej changed the base branch from feature/format to main April 15, 2021 08:08

CaseyCarter changed the title ~~format assumes strings are encoded in the active code page~~ <format> assumes strings are encoded in the active code page Apr 15, 2021

barcharcraz approved these changes Apr 15, 2021

View reviewed changes

Remove confusing comment

799a55a

StephanTLavavej requested changes Apr 15, 2021

View reviewed changes

stl/inc/format Outdated Show resolved Hide resolved

stl/inc/format Outdated Show resolved Hide resolved

StephanTLavavej mentioned this pull request Apr 16, 2021

P0645R10 <format> Text Formatting #30

Closed

Add test coverage for width estimation

c6efbdb

... fix 2 bugs exposed by that coverage, and address review comments. Drive-by: Remove the `_FORMAT_CODEPAGE` override from the `formatting_utf8` test which is unnecessary since we compile it with `/utf-8`.

CaseyCarter requested a review from StephanTLavavej April 16, 2021 18:37

*sigh* commit the fix for the second bug

16c599c

StephanTLavavej requested changes Apr 16, 2021

View reviewed changes

More review comments

3f2944f

StephanTLavavej approved these changes Apr 16, 2021

View reviewed changes

StephanTLavavej assigned StephanTLavavej and CaseyCarter and unassigned StephanTLavavej Apr 17, 2021

StephanTLavavej merged commit ccc5aaa into microsoft:main Apr 20, 2021

CaseyCarter removed their assignment Apr 20, 2021

CaseyCarter deleted the acp branch April 20, 2021 14:18

StephanTLavavej mentioned this pull request Apr 21, 2021

<chrono> Formatting: C++20's Final Boss #1870

Merged

barcharcraz mentioned this pull request Apr 27, 2021

<format>: Avoid unnecessary calls to _Getcvt() #1825

Closed

StephanTLavavej mentioned this pull request Apr 29, 2021

<format>: Incorrect handling of UTF-8 encoded format strings #1820

Closed

mnatsuhara mentioned this pull request May 4, 2021

<chrono>: [time.format] may assume that strings are encoded in the active code page #1908

Open

Conversation

CaseyCarter commented Apr 14, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

barcharcraz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CaseyCarter commented Apr 14, 2021

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Apr 15, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Apr 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants