Allow filesystem commands to access files with glob metachars in name#10557
Allow filesystem commands to access files with glob metachars in name#10557bobhy wants to merge 33 commits intonushell:mainfrom
Conversation
|
Our friendly neighborhood windows CI strikes again. Without looking further, I have 2 guesses why it's failing.
Generally, I think this PR is cool. |
|
I see a couple of problems. Firing up my Windows dev environment, which is a performance hit all of its own... |
crates/nu-cmd-base/src/arg_glob.rs
Outdated
| let processed_pattern = pattern | ||
| .item | ||
| .replace('\\', "/") | ||
| .replace(r#":/"#, r#"\:/"#) // must come after above | ||
| .replace('(', r#"\("#) // when path includes Program Files (x86) | ||
| .replace(')', r#"\)"#); |
There was a problem hiding this comment.
I'm wondering what effect this will have on paths with spaces like Program Files and Program Files (x86). I see that lots of things are being replaced here but I'm not sure how that affects things here with these particular paths.
There was a problem hiding this comment.
Embedded spaces won't be affected by these changes. A pattern or simple path with an embedded space would have to be enclosed in quotes anyway. It has been tested and works with C:\Program Files (x86)\*!
There was a problem hiding this comment.
ok, thanks for the explanation. I'm wondering if it matters that this is different than how we cd to folders with spaces in them now, and that is with surrounding the path in backticks, which I think, does some bare word magic. I don't really follow how JT made it work. Just wondering if it matters here.
esp rooted patterns with device or unc share into legit glob paths.
|
We have to be careful with adding glob syntax. Anything that's part of a valid filename on any of our platforms can't be used. Can you confirm that |
|
A key point of this change is to be able to use a simple file path or a glob pattern as argument to a filesystem command like That's a problem on Windows, were backslashes and colons in absolute (rooted) patterns will cause glob parse errors. This is a well-known issue: For now, I'm including a hack for path/pattern arguments on windows only that covers many cases. // Sanitize a glob pattern for use on windows.
// this allows filesystem commands like `cp` to accept an arg that can be a path or a glob.
// Deal with 3 aspects of windows paths that can confound glob patterns:
// * path separator: change `\` to `/`. Rust file API on windows supports this in folder separators and in UNC paths.
// * drive letter colon: change `C:` to `C\:`
// * quote parens (such as the infamous `Program Files (x86)`).
// But this means user cannot use `\` to quote other metachars.
// (Windows) users of glob should adopt the habit of quoting with single char classes, e.g instead of `\*`, use `[*]`.
// e.g transform something like: `C:\Users\me\.../{abc,def}*.txt` to `C\:/Users/me/.../{abc,def}*.txt`
//todo: investigate enhancing [wax] to support this more elegantly.
pub fn windows_pattern_hack(pat: &str) -> String {
pat.replace('\\', "/")
.replace(r#":/"#, r#"\:/"#) // must come after above
.replace('(', r#"\("#) // when path includes Program Files (x86)
.replace(')', r#"\)"#)
}But there will always be paths you might want to copy data from that are unnatural / impossible to quote so they parse as a glob. And telling users that they have to do glob quoting to access a file with Maybe we should change arg parsing to check first whether the arg has any glob metachars? But that's expensive (needs a fat regex to cover all the cases). Maybe just I'd like to hear how far this hack gets us... Windows users, please give this a try (remember, |
due to recent code change!
but metachar left brace is.
|
@jntrnr On the contrary, I can confirm that But paths with these special characters to cause a problem: if you do have a path with these glob chars and you specify a similar glob. Example: User could reasonably complain that the command should have copied file |
|
... Looking into this situation a bit more, it's a bigger problem. Opening #10571 to discuss. I propose we merge this PR (since it's a step in a direction we (or at least, I) want to go and doesn't make the new issue any worse), then focus on resolving the new issue. |
Then let's not go this direction. I'd rather make the built-in glob conservative, and if people want a fancier glob capability, that they would use the #10498 is not a bug. (I comment on why in the issue) |
|
the last checkin fixes the problem of #10571, but reverts to the problem of absolute glob patterns on Windows: e.g |
|
3rd step done!
|
|
Now that I'm down to the bit of the project that brought me here, I'm pretty sure it's a bad idea. So, bottom line, I'm submitting this PR for review as it now stands, updating the title to match what it actually accomplishes and moving on to other work. |
{} glob metachars in command arguments|
Last note: behavior of |
|
@bobhy can you update the issue body to explain what exactly this does. there have been quite a few changes and a reverse of direction here. It's difficult to consume this PR as it sits because it's hard to wade through all the subsequent steps you went through to get where you're at here. I'd like to point the core-team here and ask them to make a decision based on what the body/description says and then reviewing the code. Maybe you've already done that. Please let me know. |
|
Squashing the history and reposting as a new clean PR #10694. |
…#10694) (squashed version of #10557, clean commit history and review thread) Fixes #10571, also potentially: #10364, #10211, #9558, #9310, # Description Changes processing of arguments to filesystem commands that are source paths or globs. Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a different globbing interface) or `glob` (because it uses a different globbing library). The core of the change is to lookup the argument first as a file and only glob if it is not. That way, a path containing glob metacharacters can be referenced without glob quoting, though it will have to be single quoted to avoid nushell parsing. Before: A file path that looks like a glob is not matched by the glob specified as a (source) argument and takes some thinking about to access. You might say the glob pattern shadows a file with the same spelling. ``` > ls a* ╭───┬────────┬──────┬──────┬────────────────╮ │ # │ name │ type │ size │ modified │ ├───┼────────┼──────┼──────┼────────────────┤ │ 0 │ a[bc]d │ file │ 0 B │ 34 seconds ago │ │ 1 │ abd │ file │ 0 B │ now │ │ 2 │ acd │ file │ 0 B │ now │ ╰───┴────────┴──────┴──────┴────────────────╯ > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd > ## Note -- a[bc]d *not* copied, and seemingly hard to access. > cp --verbose 'a\[bc\]d' dest Error: × No matches found ╭─[entry #33:1:1] 1 │ cp --verbose 'a\[bc\]d' dest · ─────┬──── · ╰── no matches found ╰──── > #.. but is accessible with enough glob quoting. > cp --verbose 'a[[]bc[]]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d ``` Before_2: if file has glob metachars but isn't a valid pattern, user gets a confusing error: ``` > touch 'a[b' > cp 'a[b' dest Error: × Pattern syntax error near position 30: invalid range pattern ╭─[entry #13:1:1] 1 │ cp 'a[b' dest · ──┬── · ╰── invalid pattern ╰──── ``` After: Args to cp, mv, etc. are tried first as literal files, and only as globs if not found to be files. ``` > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d > cp --verbose '[a][bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd ``` After_2: file with glob metachars but invalid pattern just works. (though Windows does not allow file name to contain `*`.). ``` > cp --verbose 'a[b' dest copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b ``` So, with this fix, a file shadows a glob pattern with the same spelling. If you have such a file and really want to use the glob pattern, you will have to glob quote some of the characters in the pattern. I think that's less confusing to the user: if ls shows a file with a weird name, s/he'll still be able to copy, rename or delete it. # User-Facing Changes Could break some existing scripts. If user happened to have a file with a globbish name but was using a glob pattern with the same spelling, the new version will process the file and not expand the glob. # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use std testing; testing run-tests --path crates/nu-std"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> --------- Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
…nushell#10694) (squashed version of nushell#10557, clean commit history and review thread) Fixes nushell#10571, also potentially: nushell#10364, nushell#10211, nushell#9558, nushell#9310, # Description Changes processing of arguments to filesystem commands that are source paths or globs. Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a different globbing interface) or `glob` (because it uses a different globbing library). The core of the change is to lookup the argument first as a file and only glob if it is not. That way, a path containing glob metacharacters can be referenced without glob quoting, though it will have to be single quoted to avoid nushell parsing. Before: A file path that looks like a glob is not matched by the glob specified as a (source) argument and takes some thinking about to access. You might say the glob pattern shadows a file with the same spelling. ``` > ls a* ╭───┬────────┬──────┬──────┬────────────────╮ │ # │ name │ type │ size │ modified │ ├───┼────────┼──────┼──────┼────────────────┤ │ 0 │ a[bc]d │ file │ 0 B │ 34 seconds ago │ │ 1 │ abd │ file │ 0 B │ now │ │ 2 │ acd │ file │ 0 B │ now │ ╰───┴────────┴──────┴──────┴────────────────╯ > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd > ## Note -- a[bc]d *not* copied, and seemingly hard to access. > cp --verbose 'a\[bc\]d' dest Error: × No matches found ╭─[entry nushell#33:1:1] 1 │ cp --verbose 'a\[bc\]d' dest · ─────┬──── · ╰── no matches found ╰──── > #.. but is accessible with enough glob quoting. > cp --verbose 'a[[]bc[]]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d ``` Before_2: if file has glob metachars but isn't a valid pattern, user gets a confusing error: ``` > touch 'a[b' > cp 'a[b' dest Error: × Pattern syntax error near position 30: invalid range pattern ╭─[entry nushell#13:1:1] 1 │ cp 'a[b' dest · ──┬── · ╰── invalid pattern ╰──── ``` After: Args to cp, mv, etc. are tried first as literal files, and only as globs if not found to be files. ``` > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d > cp --verbose '[a][bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd ``` After_2: file with glob metachars but invalid pattern just works. (though Windows does not allow file name to contain `*`.). ``` > cp --verbose 'a[b' dest copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b ``` So, with this fix, a file shadows a glob pattern with the same spelling. If you have such a file and really want to use the glob pattern, you will have to glob quote some of the characters in the pattern. I think that's less confusing to the user: if ls shows a file with a weird name, s/he'll still be able to copy, rename or delete it. # User-Facing Changes Could break some existing scripts. If user happened to have a file with a globbish name but was using a glob pattern with the same spelling, the new version will process the file and not expand the glob. # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use std testing; testing run-tests --path crates/nu-std"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> --------- Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
…nushell#10694) (squashed version of nushell#10557, clean commit history and review thread) Fixes nushell#10571, also potentially: nushell#10364, nushell#10211, nushell#9558, nushell#9310, # Description Changes processing of arguments to filesystem commands that are source paths or globs. Applies to `cp, cp-old, mv, rm, du` but not `ls` (because it uses a different globbing interface) or `glob` (because it uses a different globbing library). The core of the change is to lookup the argument first as a file and only glob if it is not. That way, a path containing glob metacharacters can be referenced without glob quoting, though it will have to be single quoted to avoid nushell parsing. Before: A file path that looks like a glob is not matched by the glob specified as a (source) argument and takes some thinking about to access. You might say the glob pattern shadows a file with the same spelling. ``` > ls a* ╭───┬────────┬──────┬──────┬────────────────╮ │ # │ name │ type │ size │ modified │ ├───┼────────┼──────┼──────┼────────────────┤ │ 0 │ a[bc]d │ file │ 0 B │ 34 seconds ago │ │ 1 │ abd │ file │ 0 B │ now │ │ 2 │ acd │ file │ 0 B │ now │ ╰───┴────────┴──────┴──────┴────────────────╯ > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd > ## Note -- a[bc]d *not* copied, and seemingly hard to access. > cp --verbose 'a\[bc\]d' dest Error: × No matches found ╭─[entry nushell#33:1:1] 1 │ cp --verbose 'a\[bc\]d' dest · ─────┬──── · ╰── no matches found ╰──── > #.. but is accessible with enough glob quoting. > cp --verbose 'a[[]bc[]]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d ``` Before_2: if file has glob metachars but isn't a valid pattern, user gets a confusing error: ``` > touch 'a[b' > cp 'a[b' dest Error: × Pattern syntax error near position 30: invalid range pattern ╭─[entry nushell#13:1:1] 1 │ cp 'a[b' dest · ──┬── · ╰── invalid pattern ╰──── ``` After: Args to cp, mv, etc. are tried first as literal files, and only as globs if not found to be files. ``` > cp --verbose 'a[bc]d' dest copied /home/bobhy/src/rust/work/r4/a[bc]d to /home/bobhy/src/rust/work/r4/dest/a[bc]d > cp --verbose '[a][bc]d' dest copied /home/bobhy/src/rust/work/r4/abd to /home/bobhy/src/rust/work/r4/dest/abd copied /home/bobhy/src/rust/work/r4/acd to /home/bobhy/src/rust/work/r4/dest/acd ``` After_2: file with glob metachars but invalid pattern just works. (though Windows does not allow file name to contain `*`.). ``` > cp --verbose 'a[b' dest copied /home/bobhy/src/rust/work/r4/a[b to /home/bobhy/src/rust/work/r4/dest/a[b ``` So, with this fix, a file shadows a glob pattern with the same spelling. If you have such a file and really want to use the glob pattern, you will have to glob quote some of the characters in the pattern. I think that's less confusing to the user: if ls shows a file with a weird name, s/he'll still be able to copy, rename or delete it. # User-Facing Changes Could break some existing scripts. If user happened to have a file with a globbish name but was using a glob pattern with the same spelling, the new version will process the file and not expand the glob. # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use std testing; testing run-tests --path crates/nu-std"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> --------- Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
MEGA-EDIT:
Completely changing goals of this PR!
Work to date has found a fix for a good bug, but the original idea of supporting
{}brace expansion likebashno longer seems achievable as a globbing enhancement. So this PR is now all about:Fix problem #10571.
Description
before: Args to
cp,mv,rmare assumed to be globs within each command. That means a file that looks like a glob is not matched by the glob and takes some thinking about to access. That is, the glob pattern shadows a file with the same spelling.Before_2: if file has glob metachars but isn't a valid pattern, user gets a confusing error:
After: Args to
cp,mv, etc. are tried first as literal files, and only as globs if not found to be files.After_2: file with glob metachars but invalid pattern just works (though the name probably requires string quotes)
So, with this fix, a file shadows a glob pattern with the same spelling. You must invent some other spelling for the glob pattern to match the desired files. I think that's less confusing to the user: if
lsshows a file with a weird name, s/he'll still be able to copy, rename or delete it.;tl;dr
Original title: Enable {} glob metachars in command arguments
... To make
nufeel more likecsh.Fixes #10498.
So far, only
ucpcommand has been touched, as a proof of concept. But we should proceed through the rest of the filesystem commands.This PR uses crate
waxfor globbing instead of fixing the internal cratenu_glob. Once all the clients are upgraded,nu_globcould be deprecated.Description
Enables glob alternation to be used in command line arguments (where arguments are supposed to refer to paths in filesystem).
e.g:
{and}are nu parser tokens, so must be quoted if they are not embedded within the glob.Note that alternation works for more than 2 alternatives, as shown above.
User-Facing Changes
Currently, the new metachars can only be used in
ucpcommand.All uses of the old metachars should give same results as they used to, so no breaking changes.
Small changes to error messages when a glob fails to match anything.
Tests + Formatting
toolkit fmttoolkit clippytoolkit testtoolkit test stdlibAfter Submitting