Skip to content

Draft: Path::is_normalized and is_lexically_bounded#152740

Draft
MikkelPaulson wants to merge 2 commits intorust-lang:mainfrom
MikkelPaulson:path_normalize
Draft

Draft: Path::is_normalized and is_lexically_bounded#152740
MikkelPaulson wants to merge 2 commits intorust-lang:mainfrom
MikkelPaulson:path_normalize

Conversation

@MikkelPaulson
Copy link
Contributor

@MikkelPaulson MikkelPaulson commented Feb 17, 2026

Draft proposal for #134694. See parent task for discussion.

  • Update normalize_lexically to be infallible and return Cow
  • Add (for discussion) is_normalized to shortcut the distinction between Cow::Borrowed and Cow::Owned
  • Add (for discussion) is_lexically_bounded to broadly handle the use case of the Result previously returned by normalize_lexically

To do (if the draft is approved):

  • add unit tests, and move some of the excessive doctests there
  • add tests for non-Unix environments, and fix the inevitable bugs therefrom
  • clean up is_normalized, including Const path separators libs-team#744
    • some logic from is_normalized may need to be moved to Components to keep things from being too brittle

Draft proposal for rust-lang#134694.

* Update `normalize_lexically` to be infallible and return `Cow`
* Add (for discussion) `is_normalized` to shortcut the distinction
  between `Cow::Borrowed` and `Cow::Owned`
* Add (for discussion) `is_lexically_bounded` to broadly handle the use
  case of the `Result` previously returned by `normalize_lexically`

TODO: doctests are handled (maybe excessively), but unit tests are needed
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 17, 2026
@rust-log-analyzer

This comment has been minimized.

Update behaviour of Path::is_lexically_bounded to consider the working
directory unbounded, as promised by the doc comment description but not
by the doctest.
@rust-log-analyzer
Copy link
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
fmt: checked 6703 files
tidy check
tidy [rustdoc_json (src)]: `rustdoc-json-types` modified, checking format version
tidy: Skipping binary file check, read-only filesystem
tidy [style (library)]: /checkout/library/std/src/path.rs:3361: TODO is used for tasks that should be done before merging a PR; If you want to leave a message in the codebase use FIXME
tidy [style (library)]: /checkout/library/std/src/path.rs:3389: TODO is used for tasks that should be done before merging a PR; If you want to leave a message in the codebase use FIXME
tidy [style (library)]: /checkout/library/std/src/path.rs:3413: TODO is used for tasks that should be done before merging a PR; If you want to leave a message in the codebase use FIXME
tidy [style (library)]: FAIL
removing old virtual environment
creating virtual environment at '/checkout/obj/build/venv' using 'python3.10' and 'venv'
creating virtual environment at '/checkout/obj/build/venv' using 'python3.10' and 'virtualenv'
Requirement already satisfied: pip in ./build/venv/lib/python3.10/site-packages (25.3)
Collecting pip
---
info: ES-Check: there were no ES version matching errors!  🎉
typechecking javascript files
tidy: The following check failed: style (library)
Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Command `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools-bin/rust-tidy --root-path=/checkout --cargo-path=/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo --output-dir=/checkout/obj/build --concurrency=4 --npm-path=/node/bin/yarn --extra-checks=py,cpp,js,spellcheck` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1612:23
Executed at: src/bootstrap/src/core/build_steps/test.rs:1365:29

Command has failed. Rerun with -v to see more details.
Build completed unsuccessfully in 0:02:54
  local time: Tue Feb 17 04:57:33 UTC 2026
  network time: Tue, 17 Feb 2026 04:57:33 GMT
##[error]Process completed with exit code 1.
##[group]Run echo "disk usage:"

@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 4, 2026

☔ The latest upstream changes (presumably #153377) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@ChrisDenton ChrisDenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally in favour of returning a Cow from normalize_lexically. Or even for PathBuf to have a version that mutates in place.

As for is_normalized and is_lexically_bounded, I think it's hard to evaluate in isolation. I'd like to see the motivation use cases and how that fits in overall. Usually when it's important to only use the result if a condition is true, we have the API return an Option or Result (or in some cases a closure but that seems less relevant here). Also usually we'd return a separate type so they can't be mixed up but I don't know if people would consider that overkill here.

View changes since this review

///
/// * Starts with either
/// * a prefix followed by root (`C:\`); or
/// * a prefix (`C:`); or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise this is a pre-existing issue but I think a "normalized" path shouldn't have some of the weirder Windows prefixes because they're harder to handle. For example, C: is technically relative but not relative to the current directory. This can break people's assumptions of what is_relative means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of "normalized" is just "is expressed in simplest possible terms". There is no path, however strange, that can't have a boolean response to that question. Although this is a good point for normalize_lexically, which I think in its present implementation will normalize C:..\ to C:\.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think C:foo is expressed In it's simplest term. It's roughly equivalent to doing %=C:%\foo in a command prompt. Simplifying it would require expanding it to get the real path.

/// assert!(Path::new("abc/../def").is_lexically_bounded());
///
/// assert!(!Path::new("").is_lexically_bounded());
/// assert!(!Path::new(".").is_lexically_bounded());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels weird to me; should this not be bounded? I would expect anything below the current directory to not be bounded, but at or above to be bounded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a last-minute change on my part; I previously had it implemented as you describe. But I thought the most likely use case for this would be recreating a directory structure from an untrusted source (eg. extracting at archive), where you wouldn't want the source data to be able to address the pwd directly. I'm not sure if there's a compelling use case for pwd-or-lower.

Copy link
Contributor

@clarfonthey clarfonthey Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair; perhaps is_strictly_lexically_bounded and is_lexically_bounded could make that distinction? They could use the same code internally.

In the extracting files case, extracting multiple files would require a strict bound, but extracting a single file could allow for a loose bound. At least, when talking about the directory the files are in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even when extracting a single file you wouldn't want to allow a "file" called ".".

The more elegant solution might be an enum along the lines of Inner/Pwd/Outer/Absolute. Which could ultimately stand in for is_absolute() as well. But I'm not sure we gain much from the extra complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading your first paragraph before you added the second, I actually wasn't sure if is_strictly_lexically_bounded was meant to be the case with or without .. Which implies some naming confusion if nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, was typing up on the bus so I was trying to brief and then correcting myself as I realised how not-detailed it was.

Your suggestion of having an enum actually does sound like a good idea, since it feels similar to how classify works on floats: you have one method to get every possible class, and other methods to explicitly check which class it is. is_absolute is a special case because it can short-circuit whereas the others have to read the whole path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to clarify on the strictness: I was thinking it in terms of the way "strict" is used in mathematics: a subset of a set could be the set itself, but a strict subset explicitly forbids this.

So, in this sense, it would be whether the path is a "strictly" bounded within the current directory or not, lexically. So, a strict bound excludes the bound, whereas a loose bound allows the bound itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants