Skip to content

Unify glob behavior on open, rm, cp-old, mv, umv, cp and du commands#11621

Merged
WindSoilder merged 11 commits intonushell:mainfrom
WindSoilder:arg_globs
Jan 26, 2024
Merged

Unify glob behavior on open, rm, cp-old, mv, umv, cp and du commands#11621
WindSoilder merged 11 commits intonushell:mainfrom
WindSoilder:arg_globs

Conversation

@WindSoilder
Copy link
Copy Markdown
Contributor

@WindSoilder WindSoilder commented Jan 23, 2024

Description

This pr is a follow up to #11569

Revert the logic in #10694 and apply the logic in this pr to mv, cp, rv will require a larger change, I need to think how to achieve the bahavior

And sorry @bobhy for reverting some of your changes.

This pr is going to unify glob behavior on the given commands:

  • open
  • rm
  • cp-old
  • mv
  • umv
  • cp
  • du

So they have the same behavior to ls, which is:
If given parameter is quoted by single quote(') or double quote("), don't auto-expand the glob pattern. If not quoted, auto-expand the glob pattern.

Fixes: #9558 Fixes: #10211 Fixes: #9310 Fixes: #10364

TODO

But there is one thing remains: if we give a variable to the command, it will always auto-expand the glob pattern, e.g:

let path = "a[123]b"
rm $path

I don't think it's expected. But I also think user might want to auto-expand the glob pattern in variables.

So I'll introduce a new command called glob escape, then if user doesn't want to auto-expand the glob pattern, he can just do this: rm ($path | glob escape)

User-Facing Changes

Tests + Formatting

Done

After Submitting

NOTE

This pr changes the semantic of GlobPattern, before this pr, it will expand path after evaluated, this makes nu_engine::glob_from have no chance to glob things right if a path contains glob pattern.

e.g: #9310 #10211

I think changing the semantic is fine, because it makes glob works if path contains something like '*'.

It maybe a breaking change if a custom command's argument are annotated by : glob.

@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Jan 23, 2024

This is awesome! For me, the exact semantics are not as important as the fact that the behavior is unified, meaning that if we need to reconsider the globbing behavior at some point, we'd change it in one place instead of 10.

glob escape sounds good as well as a way to escape globbing. We could even land just the unification and leave glob escape to a separate PR.

@bobhy
Copy link
Copy Markdown
Contributor

bobhy commented Jan 23, 2024 via email

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 23, 2024

I thought double/single quoting was the standard way of telling any shell "this is a string. don't mess with it"?

@WindSoilder WindSoilder force-pushed the arg_globs branch 2 times, most recently from 3c9b193 to 46141cd Compare January 24, 2024 05:36
@WindSoilder
Copy link
Copy Markdown
Contributor Author

WindSoilder commented Jan 24, 2024

The Nu cp command insists on at least 2 arguments and just knows that the last argument is a single path, not a glob.

Actually it remains the same, the last argument is still a single path.

I think something like: cp able noglob://bak*er charlie required user typing too much things, cp able "bak*er" charlie is more readable. Maybe it's more comfortable to users from other shells?
After reading relative issues, I think quoting something to say "this is just a string, no need to escape it" is natural.

@WindSoilder WindSoilder marked this pull request as ready for review January 24, 2024 10:24
@WindSoilder WindSoilder marked this pull request as draft January 24, 2024 13:57
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 24, 2024

After reading relative issues, I think quoting something to say "this is just a string, no need to escape it" is natural.

I agree with this too. However, double quotes still have to escape things to be json compliant I believe.

@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Jan 24, 2024

As I mentioned, the main value of this PR is the unification of globbing across commands. If we need to change how globbing works, this PR makes it easier to do by changing it only in one place. I'd vote for landing this.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 24, 2024

I'm fine with landing it too,

  • main needs to be merged, since we removed cp-old.
  • touch "[a] b.c" works, rm `[a] b.c` also works (maybe add a test) (windows)
  • rm '[a] b.c' and rm "[a] b.c" work fine (windows)
  • perhaps touch should be added (could be another pr)
  • perhaps ls should be added (could be another pr)
  • touch "\u{1b}[31mnushell\u{1b}[0m" works but then you see a red nushell that you don't know how to remove. (real use case someone had) (ubuntu)
  • rm "\u{1b}[31mnushell\u{1b}[0m" and rm '\u{1b}[31mnushell\u{1b}[0m' doesn't work (ubuntu)
  • i can only see the odd file in zsh as ''$'\033''[31mnushell'$'\033''[0m' and rm $'\033'\[31mnushell$'\033'\[0m got rid of it
  • string interpolation seems to work. here's my test
touch 1.txt
let a = '1'
rm $"($a).txt"

@WindSoilder WindSoilder marked this pull request as ready for review January 25, 2024 04:53
@WindSoilder
Copy link
Copy Markdown
Contributor Author

WindSoilder commented Jan 25, 2024

touch "[a] b.c" works, but rm [a] b.c does not work (maybe fix and add a test) (windows)

It shouldn't work, because we treated backtick quoted as a bare word.

touch "\u{1b}[31mnushell\u{1b}[0m" works but then you see a red nushell that you don't know how to remove. (real use case someone had) (ubuntu)

For such file, it's because we auto strip ansi string, maybe we need to find a way to fix such issue

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 25, 2024

It shouldn't work, because we treated backtick quoted as a bare word.

I am thinking it probably should because backticks are what we use with paths with spaces in them. I think that's why backticks work in this situation, because there is a space in the filename.

For such file, it's because we auto strip ansi string, maybe we need to find a way to fix such issue

I think this is a special issue that could be handled in a separate PR but it's interesting to me that other shells do not just create a folder literally, the encode the escape from \u{1b} to '$'\033'' before trying to create a file with "bad" characters.

@WindSoilder
Copy link
Copy Markdown
Contributor Author

I am thinking it probably should because backticks are what we use with paths with spaces in them. I think that's why backticks work in this situation, because there is a space in the filename.

Maybe we can discuss it in next meeting :-)

other shells do not just create a folder literally, the encode the escape from \u{1b} to '$'\033'' before trying to create a file with "bad" characters.

That's interesting, maybe we can fix it in that direction, like modify mkdir command. And keep in mind on touch command while we are adding it to nushell.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jan 25, 2024

That's interesting, maybe we can fix it in that direction, like modify mkdir command. And keep in mind on touch command while we are adding it to nushell.

ya, i think the fix needs to be like what you're saying. we intercept escapes and encode them somehow before trying to create a file.

I just texted this again and rm with backticks works like i thought it should. maybe i did something different yesterday?

 touch "[a] b.c"  
 rm `[a] b.c`

@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Jan 26, 2024

Yeah, I agree, we can iterate on the glob semantic in separate PRs. Now it's at least uniform.

IMO, backticked word is bare word and thus should be glob-expanded. That makes sense to me.

fdncred pushed a commit that referenced this pull request Jan 28, 2024
# Description
Fix a breaking change which is introduced by #11621

`rm -f /tmp/aaa` shouldn't return error if `/tmp/aaa/` doesn't exist.

# User-Facing Changes
NaN

# Tests + Formatting
Done
WindSoilder added a commit that referenced this pull request Jan 29, 2024
# Description
This pr is a follow up to #11621, it introduces a `str escape-glob`
command as a workaround for the case:

```nushell
let f = "a[123]b"
ls $f
```

It will glob `a[123]b`, we can get rid of the behavior through `str
escape-glob` command:

```nushll
let f = "a[123]b"
ls ($f | str escape-glob)
```

It's more useful in the `each` context:
`ls | get name | str escape-glob | each {|it| ls $it}`

# User-Facing Changes
NaN

# Tests + Formatting
Done

# After Submitting

---------

Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
@hustcer hustcer added this to the v0.90.0 milestone Feb 3, 2024
@matklad
Copy link
Copy Markdown

matklad commented Feb 11, 2024

Drive by comment, but I was reading the last changelog, and

So I'll introduce a new command called glob escape, then if user doesn't want to auto-expand the glob pattern, he can just do this: rm ($path | glob escape)

Feels very surprising.

It seems to me that this behavior is very error prone? rm $file looks like it removes a single file. What’s more, it works as if it removes only a single file most of the time, so the user won’t learn about mistakes in their mental model until the day the file is named *.

To put it in other way, I posit that with the current semantics:

  • most arguments should be using | str glob-escape
  • almost none will be

Is there any way to make it work the opposite way? Such that rm $file is never a glob, and rm ($file | str glob) is?

Note: although I feel very strongly about the isssue, my level of confidence here is low, I might be misunderstanding something about nu which makes this pattern not as problematic as it seems.

@bobhy
Copy link
Copy Markdown
Contributor

bobhy commented Feb 11, 2024 via email

@matklad
Copy link
Copy Markdown

matklad commented Feb 11, 2024

In supporting globbin of rm arguments, nushell is just following the lead of traditional shells

Indeed, and my understanding is that this behavior is a confirmed footgun in those shells.

For example shellcheck lints against unquoted variables:

https://www.shellcheck.net/wiki/SC2086

So, for a shell language which isn’t trying to be compatible with POSIX shell, I’d expect a different, safer semantics by default.

@71
Copy link
Copy Markdown

71 commented Feb 12, 2024

IMHO, this change breaks the principle of least astonishment. Considering the three following snippets:

  1. let a = "a?c"
    rm $a
  2. rm a?c
  3. rm "a?c"

The first two are equivalent, but the last one isn't. I would expect (looking strictly at quotes) that the 1st and 3rd statement are equivalent, but instead they behave differently. Given a variable a, now the way to achieve the same behavior as 3. is:

  1. let a = "a?c"
    rm ($a | str glob-escape)
    

Which looks very differently from 3.


Nu seems to pride itself in approaching things differently. Notably, given a string with spaces like

let a = "-rm ."

Then this command is safe to execute and will delete a file named -rm ., since $a is a single string:

rm $a

However, this assumption that using a variable to execute a command is safe is now broken following this change; additional arguments/flags will not be given to rm (a strength of Nushell), but $a will still be expanded.


Perhaps nushell rm should integrate with the "trash" mechanism on the underlying platform?

IMO this doesn't solve it. mv and other such commands will still move (and potentially overwrite) items, which doesn't involve the system trash.

@bobhy
Copy link
Copy Markdown
Contributor

bobhy commented Feb 12, 2024

@matklad How should the safer semantics work in practice?
If Nu does not glob rm arguments, how should the knowledgeable and careful user actually delete multiple files?

> rm a.txt b.txt c.txt                            # works, but is typo-prone and tedious

> ls *.txt | rm                                   # nope, rm doesn't accept pipeline input
Error: nu::parser::missing_positional

  × Missing required positional argument.
   ╭─[entry #9:1:1]
 1 │ ls *.txt | rm
   ╰────
  help: Usage: rm {flags} <filename> ...(rest) . Use `--help` for more information.

> ls *.txt | each  {|x| rm $x.name}              # works, is nushell - normative, but intuitive?  
╭────────────╮                                   # And what is this 'empty list' result?
│ empty list │
╰────────────╯

The rest of the world seems to have decided that the best balance between "convenience" and "safety" here is to allow globbing but to enable wastebasket or similarly named mechanism to recover from mistakes.

Building on this, I just discovered that rm --trash is a thing, and actually does work on linux. Perhaps it should be the default?

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Feb 12, 2024

We've been asked before to make --trash the default and I think there was a PR for it but IIRC, there was something that stopped it from landing because every platform we support didn't support the trash functionality. That was years ago so that could've changed. There is already a $env.config.rm.always_trash config setting too (defaults to false).

@matklad
Copy link
Copy Markdown

matklad commented Feb 12, 2024

@bobhy I personally find the behavior of fish shell quite reasonable:

15:52:45|~/tmp
λ touch '*' 'a.txt' 'b.txt' 'c.txt' && rm * && eza

15:52:56|~/tmp
λ touch '*' 'a.txt' 'b.txt' 'c.txt' && rm '*' && eza
a.txt  b.txt  c.txt

15:52:59|~/tmp
λ touch '*' 'a.txt' 'b.txt' 'c.txt' && FILE='*' rm $FILE && eza
a.txt  b.txt  c.txt

15:53:06|~/tmp
λ touch '*' 'a.txt' 'b.txt' 'c.txt' && FILE='*' eval rm $FILE && eza

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Feb 12, 2024

Numbering and labeling these to reference them. Let me know if I interpreted them wrong.

  1. globbing *, removes everything
  2. non-globbing, looking for literal '*', removes nothing but literal '*' file
  3. non-globbing, since it's a variable, I guess? removes nothing
  4. still non-globbing, but uses eval to turn it into a glob, I guess? removes everything

@matklad
Copy link
Copy Markdown

matklad commented Feb 12, 2024

Yup, that's correct, just "removes nothing" -> "removes only the file whose name is literally '*'":

λ touch '*' 'a.txt' 'b.txt' 'c.txt'  && eza
*  a.txt  b.txt  c.txt

@CAD97
Copy link
Copy Markdown
Contributor

CAD97 commented Feb 12, 2024

If Nu does not glob rm arguments, how should the knowledgeable and careful user actually delete multiple files?

The spread operator could be made to work, e.g. glob *.txt | rm ...$in. It doesn't currently because rm has a required positional argument and the spread operator only spread into rest arguments (or other list contexts).

the behavior of fish shell

A principled interpretation of the behavior would be to consistently distinguish between "bareword string" and "quoted string" shapes (the any shape preserving the distinction), treating the content of "quoted strings" as final, applying the same rules to all shapes that impact semantics (at least glob and path).

The most directly equivalent constructs written in Nu would be (checked means expands glob):

  • ls *
  • ls '*' (0.90 feature
  • with-env [FILE *] { ls $env.FILE }
  • with-env [FILE *] { ^$nu.current-exe -c $'ls ($env.FILE)' } (little a eval, as a treat)

Nu-specific fun:

  • ls ('*')
  • '*' | ls $in
  • def f [file] { ls $file }; f '*'
  • def f [file: glob] { ls $file }; f '*' (syntax shape changes semantics)
  • def f [file: glob] { ls $file }; f ('*') (but not through other constructs)

@WindSoilder
Copy link
Copy Markdown
Contributor Author

Thank you for all feedback!

I can try to adjust the behavior when we meet variables

@CAD97
Copy link
Copy Markdown
Contributor

CAD97 commented Feb 14, 2024

Maybe look at how path is handled: a syntax shape of path will expand barewords but not strings (quoted, expression, or variable). It would mean flipping from "use str escape-glob to prevent glob expansion" to "use glob to get glob expansion from pipelines" though.

My crazy-concept idea would be for a ...pattern: glob shape to expand bareword globs directly, with everything else interacting with it as just ...pattern: path. That is, writing ls *.nu would be translated by the syntax shape into the equivalent of ls ...(glob '*.nu'). (It would probably want lazy/deferred expansion support as an optimization, though.)

dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
…`du` commands (nushell#11621)

# Description
This pr is a follow up to
[nushell#11569](nushell#11569 (comment))
> Revert the logic in nushell#10694 and
apply the logic in this pr to mv, cp, rv will require a larger change, I
need to think how to achieve the bahavior

And sorry @bobhy for reverting some of your changes.

This pr is going to unify glob behavior on the given commands:
* open
* rm
* cp-old
* mv
* umv
* cp
* du

So they have the same behavior to `ls`, which is:
If given parameter is quoted by single quote(`'`) or double quote(`"`),
don't auto-expand the glob pattern. If not quoted, auto-expand the glob
pattern.

Fixes: nushell#9558  Fixes: nushell#10211 Fixes: nushell#9310 Fixes: nushell#10364 

# TODO
But there is one thing remains: if we give a variable to the command, it
will always auto-expand the glob pattern, e.g:
```nushell
let path = "a[123]b"
rm $path
```
I don't think it's expected. But I also think user might want to
auto-expand the glob pattern in variables.

So I'll introduce a new command called `glob escape`, then if user
doesn't want to auto-expand the glob pattern, he can just do this: `rm
($path | glob escape)`

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
Done

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->

## NOTE
This pr changes the semantic of `GlobPattern`, before this pr, it will
`expand path` after evaluated, this makes `nu_engine::glob_from` have no
chance to glob things right if a path contains glob pattern.

e.g: [nushell#9310
](nushell#9310 (comment))
nushell#10211

I think changing the semantic is fine, because it makes glob works if
path contains something like '*'.

It maybe a breaking change if a custom command's argument are annotated
by `: glob`.
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
…l#11656)

# Description
Fix a breaking change which is introduced by nushell#11621

`rm -f /tmp/aaa` shouldn't return error if `/tmp/aaa/` doesn't exist.

# User-Facing Changes
NaN

# Tests + Formatting
Done
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
This pr is a follow up to nushell#11621, it introduces a `str escape-glob`
command as a workaround for the case:

```nushell
let f = "a[123]b"
ls $f
```

It will glob `a[123]b`, we can get rid of the behavior through `str
escape-glob` command:

```nushll
let f = "a[123]b"
ls ($f | str escape-glob)
```

It's more useful in the `each` context:
`ls | get name | str escape-glob | each {|it| ls $it}`

# User-Facing Changes
NaN

# Tests + Formatting
Done

# After Submitting

---------

Co-authored-by: Darren Schroeder <343840+fdncred@users.noreply.github.com>
WindSoilder added a commit that referenced this pull request Feb 23, 2024
…ariables or string interpolation (#11886)

# Description
This is a follow up to
#11621 (comment)

Also Fixes: #11838 

## About the code change
It applys the same logic when we pass variables to external commands:


https://github.com/nushell/nushell/blob/0487e9ffcbc57c2d5feca606e10c3f8221ff5e00/crates/nu-command/src/system/run_external.rs#L162-L170

That is: if user input dynamic things(like variables, sub-expression, or
string interpolation), it returns a quoted `NuPath`, then user input
won't be globbed
 
# User-Facing Changes
Given two input files: `a*c.txt`, `abc.txt`

* `let f = "a*c.txt"; rm $f` will remove one file: `a*c.txt`. 
~* `let f = "a*c.txt"; rm --glob $f` will remove `a*c.txt` and
`abc.txt`~
* `let f: glob = "a*c.txt"; rm $f` will remove `a*c.txt` and `abc.txt`

## Rules about globbing with *variable*
Given two files: `a*c.txt`, `abc.txt`
| Cmd Type | example | Result |
| ----- | ------------------ | ------ |
| builtin | let f = "a*c.txt"; rm $f | remove `a*c.txt` |
| builtin | let f: glob = "a*c.txt"; rm $f | remove `a*c.txt` and
`abc.txt`
| builtin | let f = "a*c.txt"; rm ($f \| into glob) | remove `a*c.txt`
and `abc.txt`
| custom | def crm [f: glob] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt` and `abc.txt`
| custom | def crm [f: glob] { rm ($f \| into string) }; let f =
"a*c.txt"; crm $f | remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm ($f \|
into glob) | remove `a*c.txt` and `abc.txt`

In general, if a variable is annotated with `glob` type, nushell will
expand glob pattern. Or else, we need to use `into | glob` to expand
glob pattern

# Tests + Formatting
Done

# After Submitting
I think `str glob-escape` command will be no-longer required. We can
remove it.
@WindSoilder WindSoilder deleted the arg_globs branch February 27, 2024 05:49
@fdncred fdncred added the A:glob Behavior around file-system globbing with regular commands or `glob`. See also A:quoting/expansion label Feb 27, 2024
kik4444 pushed a commit to kik4444/nushell-fork that referenced this pull request Feb 28, 2024
…ariables or string interpolation (nushell#11886)

# Description
This is a follow up to
nushell#11621 (comment)

Also Fixes: nushell#11838 

## About the code change
It applys the same logic when we pass variables to external commands:


https://github.com/nushell/nushell/blob/0487e9ffcbc57c2d5feca606e10c3f8221ff5e00/crates/nu-command/src/system/run_external.rs#L162-L170

That is: if user input dynamic things(like variables, sub-expression, or
string interpolation), it returns a quoted `NuPath`, then user input
won't be globbed
 
# User-Facing Changes
Given two input files: `a*c.txt`, `abc.txt`

* `let f = "a*c.txt"; rm $f` will remove one file: `a*c.txt`. 
~* `let f = "a*c.txt"; rm --glob $f` will remove `a*c.txt` and
`abc.txt`~
* `let f: glob = "a*c.txt"; rm $f` will remove `a*c.txt` and `abc.txt`

## Rules about globbing with *variable*
Given two files: `a*c.txt`, `abc.txt`
| Cmd Type | example | Result |
| ----- | ------------------ | ------ |
| builtin | let f = "a*c.txt"; rm $f | remove `a*c.txt` |
| builtin | let f: glob = "a*c.txt"; rm $f | remove `a*c.txt` and
`abc.txt`
| builtin | let f = "a*c.txt"; rm ($f \| into glob) | remove `a*c.txt`
and `abc.txt`
| custom | def crm [f: glob] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt` and `abc.txt`
| custom | def crm [f: glob] { rm ($f \| into string) }; let f =
"a*c.txt"; crm $f | remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm $f |
remove `a*c.txt`
| custom | def crm [f: string] { rm $f }; let f = "a*c.txt"; crm ($f \|
into glob) | remove `a*c.txt` and `abc.txt`

In general, if a variable is annotated with `glob` type, nushell will
expand glob pattern. Or else, we need to use `into | glob` to expand
glob pattern

# Tests + Formatting
Done

# After Submitting
I think `str glob-escape` command will be no-longer required. We can
remove it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:glob Behavior around file-system globbing with regular commands or `glob`. See also A:quoting/expansion

Projects

None yet

8 participants