Skip to content

<format>: Format string parsing needs to be charset aware #1576

@statementreply

Description

@statementreply

Per discussion on Discord.

Some multibyte charsets use 0x7b and 0x7d (the same as '{' and '}', respectively) for the second byte of character encoding. We need to be aware of this issue when implementing #30, and properly parse format strings containing such characters.

For example, the string "日本地図" (meaning "map of Japan" in Japanese) is encoded as "\x93\xfa\x96\x7b\x92\x6e\x90\x7d" in Shift JIS (code page 932), which contains both 0x7b and 0x7d. When running under code page 932, it shouldn't be parsed as "\x93\xfa\x96" + "{\x92n\x90}".

AFAIK, code pages 932, 936, 950, and 54936 contain such encodings.

Command-line test case

D:\Temp>type format_sjis.cpp
#include <format>
#include <iostream>
#include <string_view>

using namespace std;

int main() {
    constexpr auto str = "\x93\xfa\x96\x7b\x92\x6e\x90\x7d"sv;
    cout << str << "\n";
    cout << format(str) << "\n";
}
D:\Temp>chcp 932
Active code page: 932

D:\Temp>cl /EHsc /W4 /WX /std:c++latest format_sjis.cpp
[...]

D:\Temp>.\format_sjis.exe
[...]

Expected behavior

Should print:

日本地図
日本地図

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!formatC++20/23 format

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions