Draft: Path::is_normalized and is_lexically_bounded#152740
Draft: Path::is_normalized and is_lexically_bounded#152740MikkelPaulson wants to merge 2 commits intorust-lang:mainfrom
Conversation
Draft proposal for rust-lang#134694. * Update `normalize_lexically` to be infallible and return `Cow` * Add (for discussion) `is_normalized` to shortcut the distinction between `Cow::Borrowed` and `Cow::Owned` * Add (for discussion) `is_lexically_bounded` to broadly handle the use case of the `Result` previously returned by `normalize_lexically` TODO: doctests are handled (maybe excessively), but unit tests are needed
This comment has been minimized.
This comment has been minimized.
Update behaviour of Path::is_lexically_bounded to consider the working directory unbounded, as promised by the doc comment description but not by the doctest.
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
☔ The latest upstream changes (presumably #153377) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
I'm generally in favour of returning a Cow from normalize_lexically. Or even for PathBuf to have a version that mutates in place.
As for is_normalized and is_lexically_bounded, I think it's hard to evaluate in isolation. I'd like to see the motivation use cases and how that fits in overall. Usually when it's important to only use the result if a condition is true, we have the API return an Option or Result (or in some cases a closure but that seems less relevant here). Also usually we'd return a separate type so they can't be mixed up but I don't know if people would consider that overkill here.
| /// | ||
| /// * Starts with either | ||
| /// * a prefix followed by root (`C:\`); or | ||
| /// * a prefix (`C:`); or |
There was a problem hiding this comment.
I realise this is a pre-existing issue but I think a "normalized" path shouldn't have some of the weirder Windows prefixes because they're harder to handle. For example, C: is technically relative but not relative to the current directory. This can break people's assumptions of what is_relative means.
There was a problem hiding this comment.
My interpretation of "normalized" is just "is expressed in simplest possible terms". There is no path, however strange, that can't have a boolean response to that question. Although this is a good point for normalize_lexically, which I think in its present implementation will normalize C:..\ to C:\.
There was a problem hiding this comment.
I don't think C:foo is expressed In it's simplest term. It's roughly equivalent to doing %=C:%\foo in a command prompt. Simplifying it would require expanding it to get the real path.
| /// assert!(Path::new("abc/../def").is_lexically_bounded()); | ||
| /// | ||
| /// assert!(!Path::new("").is_lexically_bounded()); | ||
| /// assert!(!Path::new(".").is_lexically_bounded()); |
There was a problem hiding this comment.
This one feels weird to me; should this not be bounded? I would expect anything below the current directory to not be bounded, but at or above to be bounded.
There was a problem hiding this comment.
That was a last-minute change on my part; I previously had it implemented as you describe. But I thought the most likely use case for this would be recreating a directory structure from an untrusted source (eg. extracting at archive), where you wouldn't want the source data to be able to address the pwd directly. I'm not sure if there's a compelling use case for pwd-or-lower.
There was a problem hiding this comment.
That's fair; perhaps is_strictly_lexically_bounded and is_lexically_bounded could make that distinction? They could use the same code internally.
In the extracting files case, extracting multiple files would require a strict bound, but extracting a single file could allow for a loose bound. At least, when talking about the directory the files are in.
There was a problem hiding this comment.
But even when extracting a single file you wouldn't want to allow a "file" called ".".
The more elegant solution might be an enum along the lines of Inner/Pwd/Outer/Absolute. Which could ultimately stand in for is_absolute() as well. But I'm not sure we gain much from the extra complexity.
There was a problem hiding this comment.
Reading your first paragraph before you added the second, I actually wasn't sure if is_strictly_lexically_bounded was meant to be the case with or without .. Which implies some naming confusion if nothing else.
There was a problem hiding this comment.
Yeah, sorry, was typing up on the bus so I was trying to brief and then correcting myself as I realised how not-detailed it was.
Your suggestion of having an enum actually does sound like a good idea, since it feels similar to how classify works on floats: you have one method to get every possible class, and other methods to explicitly check which class it is. is_absolute is a special case because it can short-circuit whereas the others have to read the whole path.
There was a problem hiding this comment.
Also to clarify on the strictness: I was thinking it in terms of the way "strict" is used in mathematics: a subset of a set could be the set itself, but a strict subset explicitly forbids this.
So, in this sense, it would be whether the path is a "strictly" bounded within the current directory or not, lexically. So, a strict bound excludes the bound, whereas a loose bound allows the bound itself.
Draft proposal for #134694. See parent task for discussion.
normalize_lexicallyto be infallible and returnCowis_normalizedto shortcut the distinction betweenCow::BorrowedandCow::Ownedis_lexically_boundedto broadly handle the use case of theResultpreviously returned bynormalize_lexicallyTo do (if the draft is approved):
is_normalized, including Const path separators libs-team#744is_normalizedmay need to be moved toComponentsto keep things from being too brittle