-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Regex options match with inconsistent string encodings #26070
Copy link
Copy link
Closed
Labels
team-CoreSkyframe, bazel query, BEP, options parsing, bazelrcSkyframe, bazel query, BEP, options parsing, bazelrctype: buguntriaged
Description
Description of the bug:
Both Bazel's regex options such as --remote_download_regex and file paths are stored in the special "internal" string encoding which represents strings as raw UTF-8 byte arrays. While this ensures that any literal Unicode character in the pattern will match that character in a path, it makes it so that any kind of Unicode-aware pattern logic (case insensitive match, character ranges etc.) won't behave correctly.
Instead, both patterns and file paths should be transformed with StringEncoding.internalToUnicode before matching.
Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
No response
What is the output of bazel info release?
No response
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
team-CoreSkyframe, bazel query, BEP, options parsing, bazelrcSkyframe, bazel query, BEP, options parsing, bazelrctype: buguntriaged