Use narrow UTF-8 as the canonical encoding for paths on Windows

The Native SDK uses narrow UTF-8 encoding throughout the code base as the canonical string encoding. 

The only exception to this rule is the strings representing paths on Windows, which use wide characters that depend on the system or application code page settings when they reach our output boundary (i.e., when writing to files or the console). The reason for handling Windows paths separately was that Win32 APIs exclusively use this encoding. 

By making wide chars the canonical path encoding on Windows, we can prevent conversions back and forth. However, to make this work, we have to rely on using `%ls` or `%S` format specifiers wherever we reach an output boundary, and those require CRT locale configuration or console code-page configuration on the application side (to match the system settings) in order not to get encoding errors when using non-ASCII characters in paths.

This became evident in a recent issue (#1388), which, at its core, was about a different topic related to path encoding on Windows, but also showed that our _logging_ fails to render paths correctly if those paths contain Cyrillic characters, if applications do not maintain correct locale and console code-page settings. The problem, however, is worse than just logging, because we use the same mechanism for serialization.

Let's make narrow UTF-8 the canonical encoding on Windows too, to eliminate any platform-specific issues in the output. This would mean to

* eliminate all uses of `%S`/`%ls` format specifiers in the code base
* using a UTF-8 `char*` as the internal path representation, like we do on all other platforms
* introduce wide-char conversion where it is necessary at the boundary
* provide a cached accessor for the Windows code base so that we do not have to convert back to wide char on every Win32 or public interface boundary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use narrow UTF-8 as the canonical encoding for paths on Windows #1397

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Use narrow UTF-8 as the canonical encoding for paths on Windows #1397

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions