Unify glob behavior on open, rm, cp-old, mv, umv, cp and du commands#11621
Unify glob behavior on open, rm, cp-old, mv, umv, cp and du commands#11621WindSoilder merged 11 commits intonushell:mainfrom
open, rm, cp-old, mv, umv, cp and du commands#11621Conversation
|
This is awesome! For me, the exact semantics are not as important as the fact that the behavior is unified, meaning that if we need to reconsider the globbing behavior at some point, we'd change it in one place instead of 10.
|
|
Hmmm,
We're trying to solve a problem here that traditional shells could never
solve, but this particular solution doesn't feel right.
In a traditional shell, the commands don't glob and rely on the shell to
expand globs. The shell itself makes no attempt to be aware of the
particular command it's expanding, it just expands whatever unescaped glob
characters it sees. And we've all internalized the peculiarities: we
accept, for example, that, in bash, `cp *` could be a perfectly legit
command (but would be hard pressed to name the new file that is created).
Nu already handles the `cp` example in a more natural way. The Nu `cp`
command insists on at least 2 arguments and just knows that the last
argument is a single path, not a glob.
But now we have a new situation where we want to suppress expanding a glob
appearing *anywhere* in the (input arguments of the) command line. It's
not just the last argument. As I say, this is not a situation we have
experience with, and our experience from traditional shells doesn't give us
a model, so we have to be careful what we innovate.
I think it's a mistake to use quotes to suppress globbing. We already want
to use quotes as the natural way to specify paths containing spaces or
other characters special for the parsing of the command line, like
whitespace and `;`.
I wish Nu allowed named parameters to be positional, so we could have
(command-specific) syntax like:
```
cp able --noglob bak*er charlie
```
Where the `--noglob` named param could appear anywhere in the argument list
and apply to just that argument. That feels like a very natural way to
specify special handling for the argument.
As it is right now though, you could support this as the *first and only*
input argument to the file system commands, e.g `cp --noglob able charlie`.
I actually feel that's good enough -- dealing with files whose names
contain glob characters should be rare and the commands for dealing with
them can be a bit limited. It's OK (with me) if the user has to review the
command help when dealing with them.
But if you wanted to provide full generality, I'd suggest some convention
other than quoting, perhaps protocol-ish syntax?
```
cp able noglob://bak*er charlie
```
This would be specific to arguments of type "path" only.
…On Tue, Jan 23, 2024 at 11:36 AM Jakub Žádník ***@***.***> wrote:
This is awesome! For me, the exact semantics are not as important as the
fact that the behavior is *unified*, meaning that if we need to
reconsider the globbing behavior at some point, we'd change it in one place
instead of 10.
glob escape sounds good as well as a way to escape globbing. We could
even land just the unification and leave glob escape to a separate PR.
—
Reply to this email directly, view it on GitHub
<#11621 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPBIZEK4X47CMA237PZENDYP7RHXAVCNFSM6AAAAABCHHDWLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGQ3DAMJUGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I thought double/single quoting was the standard way of telling any shell "this is a string. don't mess with it"? |
28e1a93 to
bad1b1e
Compare
3c9b193 to
46141cd
Compare
Actually it remains the same, the last argument is still a single path. I think something like: |
I agree with this too. However, double quotes still have to escape things to be json compliant I believe. |
|
As I mentioned, the main value of this PR is the unification of globbing across commands. If we need to change how globbing works, this PR makes it easier to do by changing it only in one place. I'd vote for landing this. |
|
I'm fine with landing it too,
touch 1.txt
let a = '1'
rm $"($a).txt" |
It shouldn't work, because we treated backtick quoted as a bare word.
For such file, it's because we auto strip ansi string, maybe we need to find a way to fix such issue |
I am thinking it probably should because backticks are what we use with paths with spaces in them. I think that's why backticks work in this situation, because there is a space in the filename.
I think this is a special issue that could be handled in a separate PR but it's interesting to me that other shells do not just create a folder literally, the encode the escape from |
Maybe we can discuss it in next meeting :-)
That's interesting, maybe we can fix it in that direction, like modify |
ya, i think the fix needs to be like what you're saying. we intercept escapes and encode them somehow before trying to create a file. I just texted this again and rm with backticks works like i thought it should. maybe i did something different yesterday? ❯ touch "[a] b.c"
❯ rm `[a] b.c` |
|
Yeah, I agree, we can iterate on the glob semantic in separate PRs. Now it's at least uniform. IMO, backticked word is bare word and thus should be glob-expanded. That makes sense to me. |
# Description Fix a breaking change which is introduced by #11621 `rm -f /tmp/aaa` shouldn't return error if `/tmp/aaa/` doesn't exist. # User-Facing Changes NaN # Tests + Formatting Done
# Description This pr is a follow up to #11621, it introduces a `str escape-glob` command as a workaround for the case: ```nushell let f = "a[123]b" ls $f ``` It will glob `a[123]b`, we can get rid of the behavior through `str escape-glob` command: ```nushll let f = "a[123]b" ls ($f | str escape-glob) ``` It's more useful in the `each` context: `ls | get name | str escape-glob | each {|it| ls $it}` # User-Facing Changes NaN # Tests + Formatting Done # After Submitting --------- Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
|
Drive by comment, but I was reading the last changelog, and
Feels very surprising. It seems to me that this behavior is very error prone? To put it in other way, I posit that with the current semantics:
Is there any way to make it work the opposite way? Such that Note: although I feel very strongly about the isssue, my level of confidence here is low, I might be misunderstanding something about nu which makes this pattern not as problematic as it seems. |
|
In supporting globbin of `rm` arguments, nushell is just following the lead
of traditional shells (both linux and windows). Technically it could work
the other way, but then nu would be odd man out.
It is a valid concern that `rm *` might do more than you expected, and that
you might not realize it till later.
Perhaps nushell `rm` should integrate with the "trash" mechanism on the
underlying platform?
…On Sun, Feb 11, 2024 at 4:09 AM Alex Kladov ***@***.***> wrote:
Drive by comment, but I was reading the last changelog, and
So I'll introduce a new command called glob escape, then if user doesn't
want to auto-expand the glob pattern, he can just do this: rm ($path | glob
escape)
Feels *very* surprising.
It seems to me that this behavior is very error prone? rm $file looks
like it removes a single file. What’s more, it *works* as if it removes
only a single file most of the time, so the user won’t learn about mistakes
in their mental model until the day the file is named *.
To put it in other way, I posit that with the current semantics:
- most arguments should be using | str glob-escape
- almost none will be
Is there any way to make it work the opposite way? Such that rm $file is
never a glob, and rm ($file | str glob) is?
Note: although I feel very strongly about the isssue, my level of
confidence here is low, I might be misunderstanding something about nu
which makes this pattern not as problematic as it seems.
—
Reply to this email directly, view it on GitHub
<#11621 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPBIZDY2UTKYKTAZFDVFJLYTCDF7AVCNFSM6AAAAABCHHDWLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXGQ4DIMZSGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Indeed, and my understanding is that this behavior is a confirmed footgun in those shells. For example shellcheck lints against unquoted variables: https://www.shellcheck.net/wiki/SC2086 So, for a shell language which isn’t trying to be compatible with POSIX shell, I’d expect a different, safer semantics by default. |
|
IMHO, this change breaks the principle of least astonishment. Considering the three following snippets:
The first two are equivalent, but the last one isn't. I would expect (looking strictly at quotes) that the 1st and 3rd statement are equivalent, but instead they behave differently. Given a variable
Which looks very differently from 3. Nu seems to pride itself in approaching things differently. Notably, given a string with spaces like let a = "-rm ."Then this command is safe to execute and will delete a file named rm $aHowever, this assumption that using a variable to execute a command is safe is now broken following this change; additional arguments/flags will not be given to
IMO this doesn't solve it. |
|
@matklad How should the safer semantics work in practice? The rest of the world seems to have decided that the best balance between "convenience" and "safety" here is to allow globbing but to enable wastebasket or similarly named mechanism to recover from mistakes. Building on this, I just discovered that |
|
We've been asked before to make |
|
@bobhy I personally find the behavior of fish shell quite reasonable: |
|
Numbering and labeling these to reference them. Let me know if I interpreted them wrong.
|
|
Yup, that's correct, just "removes nothing" -> "removes only the file whose name is literally '*'": |
The spread operator could be made to work, e.g.
A principled interpretation of the behavior would be to consistently distinguish between "bareword string" and "quoted string" shapes (the The most directly equivalent constructs written in Nu would be (checked means expands glob):
Nu-specific fun:
|
|
Thank you for all feedback! I can try to adjust the behavior when we meet variables |
|
Maybe look at how My crazy-concept idea would be for a |
…`du` commands (nushell#11621) # Description This pr is a follow up to [nushell#11569](nushell#11569 (comment)) > Revert the logic in nushell#10694 and apply the logic in this pr to mv, cp, rv will require a larger change, I need to think how to achieve the bahavior And sorry @bobhy for reverting some of your changes. This pr is going to unify glob behavior on the given commands: * open * rm * cp-old * mv * umv * cp * du So they have the same behavior to `ls`, which is: If given parameter is quoted by single quote(`'`) or double quote(`"`), don't auto-expand the glob pattern. If not quoted, auto-expand the glob pattern. Fixes: nushell#9558 Fixes: nushell#10211 Fixes: nushell#9310 Fixes: nushell#10364 # TODO But there is one thing remains: if we give a variable to the command, it will always auto-expand the glob pattern, e.g: ```nushell let path = "a[123]b" rm $path ``` I don't think it's expected. But I also think user might want to auto-expand the glob pattern in variables. So I'll introduce a new command called `glob escape`, then if user doesn't want to auto-expand the glob pattern, he can just do this: `rm ($path | glob escape)` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting Done # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> ## NOTE This pr changes the semantic of `GlobPattern`, before this pr, it will `expand path` after evaluated, this makes `nu_engine::glob_from` have no chance to glob things right if a path contains glob pattern. e.g: [nushell#9310 ](nushell#9310 (comment)) nushell#10211 I think changing the semantic is fine, because it makes glob works if path contains something like '*'. It maybe a breaking change if a custom command's argument are annotated by `: glob`.
…l#11656) # Description Fix a breaking change which is introduced by nushell#11621 `rm -f /tmp/aaa` shouldn't return error if `/tmp/aaa/` doesn't exist. # User-Facing Changes NaN # Tests + Formatting Done
# Description This pr is a follow up to nushell#11621, it introduces a `str escape-glob` command as a workaround for the case: ```nushell let f = "a[123]b" ls $f ``` It will glob `a[123]b`, we can get rid of the behavior through `str escape-glob` command: ```nushll let f = "a[123]b" ls ($f | str escape-glob) ``` It's more useful in the `each` context: `ls | get name | str escape-glob | each {|it| ls $it}` # User-Facing Changes NaN # Tests + Formatting Done # After Submitting --------- Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
…ariables or string interpolation (#11886) # Description This is a follow up to #11621 (comment) Also Fixes: #11838 ## About the code change It applys the same logic when we pass variables to external commands: https://github.com/nushell/nushell/blob/0487e9ffcbc57c2d5feca606e10c3f8221ff5e00/crates/nu-command/src/system/run_external.rs#L162-L170 That is: if user input dynamic things(like variables, sub-expression, or string interpolation), it returns a quoted `NuPath`, then user input won't be globbed # User-Facing Changes Given two input files: `a*c.txt`, `abc.txt` * `let f = "a*c.txt"; rm $f` will remove one file: `a*c.txt`. ~* `let f = "a*c.txt"; rm --glob $f` will remove `a*c.txt` and `abc.txt`~ * `let f: glob = "a*c.txt"; rm $f` will remove `a*c.txt` and `abc.txt` ## Rules about globbing with *variable* Given two files: `a*c.txt`, `abc.txt` | Cmd Type | example | Result | | ----- | ------------------ | ------ | | builtin | let f = "a*c.txt"; rm $f | remove `a*c.txt` | | builtin | let f: glob = "a*c.txt"; rm $f | remove `a*c.txt` and `abc.txt` | builtin | let f = "a*c.txt"; rm ($f \| into glob) | remove `a*c.txt` and `abc.txt` | custom | def crm [f: glob] { rm $f }; let f = "a*c.txt"; crm $f | remove `a*c.txt` and `abc.txt` | custom | def crm [f: glob] { rm ($f \| into string) }; let f = "a*c.txt"; crm $f | remove `a*c.txt` | custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm $f | remove `a*c.txt` | custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm ($f \| into glob) | remove `a*c.txt` and `abc.txt` In general, if a variable is annotated with `glob` type, nushell will expand glob pattern. Or else, we need to use `into | glob` to expand glob pattern # Tests + Formatting Done # After Submitting I think `str glob-escape` command will be no-longer required. We can remove it.
…ariables or string interpolation (nushell#11886) # Description This is a follow up to nushell#11621 (comment) Also Fixes: nushell#11838 ## About the code change It applys the same logic when we pass variables to external commands: https://github.com/nushell/nushell/blob/0487e9ffcbc57c2d5feca606e10c3f8221ff5e00/crates/nu-command/src/system/run_external.rs#L162-L170 That is: if user input dynamic things(like variables, sub-expression, or string interpolation), it returns a quoted `NuPath`, then user input won't be globbed # User-Facing Changes Given two input files: `a*c.txt`, `abc.txt` * `let f = "a*c.txt"; rm $f` will remove one file: `a*c.txt`. ~* `let f = "a*c.txt"; rm --glob $f` will remove `a*c.txt` and `abc.txt`~ * `let f: glob = "a*c.txt"; rm $f` will remove `a*c.txt` and `abc.txt` ## Rules about globbing with *variable* Given two files: `a*c.txt`, `abc.txt` | Cmd Type | example | Result | | ----- | ------------------ | ------ | | builtin | let f = "a*c.txt"; rm $f | remove `a*c.txt` | | builtin | let f: glob = "a*c.txt"; rm $f | remove `a*c.txt` and `abc.txt` | builtin | let f = "a*c.txt"; rm ($f \| into glob) | remove `a*c.txt` and `abc.txt` | custom | def crm [f: glob] { rm $f }; let f = "a*c.txt"; crm $f | remove `a*c.txt` and `abc.txt` | custom | def crm [f: glob] { rm ($f \| into string) }; let f = "a*c.txt"; crm $f | remove `a*c.txt` | custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm $f | remove `a*c.txt` | custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm ($f \| into glob) | remove `a*c.txt` and `abc.txt` In general, if a variable is annotated with `glob` type, nushell will expand glob pattern. Or else, we need to use `into | glob` to expand glob pattern # Tests + Formatting Done # After Submitting I think `str glob-escape` command will be no-longer required. We can remove it.
Description
This pr is a follow up to #11569
And sorry @bobhy for reverting some of your changes.
This pr is going to unify glob behavior on the given commands:
So they have the same behavior to
ls, which is:If given parameter is quoted by single quote(
') or double quote("), don't auto-expand the glob pattern. If not quoted, auto-expand the glob pattern.Fixes: #9558 Fixes: #10211 Fixes: #9310 Fixes: #10364
TODO
But there is one thing remains: if we give a variable to the command, it will always auto-expand the glob pattern, e.g:
I don't think it's expected. But I also think user might want to auto-expand the glob pattern in variables.
So I'll introduce a new command called
glob escape, then if user doesn't want to auto-expand the glob pattern, he can just do this:rm ($path | glob escape)User-Facing Changes
Tests + Formatting
Done
After Submitting
NOTE
This pr changes the semantic of
GlobPattern, before this pr, it willexpand pathafter evaluated, this makesnu_engine::glob_fromhave no chance to glob things right if a path contains glob pattern.e.g: #9310 #10211
I think changing the semantic is fine, because it makes glob works if path contains something like '*'.
It maybe a breaking change if a custom command's argument are annotated by
: glob.