Rework sorting and add cell path and closure comparators to sort-by#13154
Rework sorting and add cell path and closure comparators to sort-by#13154IanManske merged 37 commits intonushell:mainfrom
sort-by#13154Conversation
|
Thanks for this. So far, so good! I'd like to see the closure parameter built into |
|
Good idea! Maybe |
|
Exactly what I came here (from Discord) to suggest. |
|
Thanks for the examples, they go a long way to helping us to understand. Looks great!
For now, I'd like to have just |
|
Just realized this doesn't entirely close #8322 as it doesn't cover group-by. |
|
Thanks for picking this up! |
…hell#13160) # Description Fixes nushell#13143 by returning an empty list when there are no results found by `std help --find/-f` # User-Facing Changes In addition, prints a message to stderr. # Tests + Formatting - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
sort-by
|
Had to do some major refactoring of I saw the note at the top of I'm not sure if these |
|
I still want to be able to sort-by columns too. Does that still work? |
|
For interactive use, you can do |
|
ok, sounds good so far. thanks. |
This reverts commit cab953b.
|
Do you have time to get the ci green? |
|
The test failures are intentional since I was doing some test-driven development, so I'll get those passing the next time I can take a look. I'll make sure to address the spellcheck and clippy as well. |
|
Finally cleared out enough of my massive backlog of tasks, so I should be able to wrap this up finally! I just added a good number of unit tests to define behavior for edge cases and prevent regressions to behavior currently observable on
|
|
Thank you for the feedback! I'll take a look through it in the next couple days. Regarding detecting number of closure parameters to determine key closure or closure closure, I proposed the idea in the Discord and @devyn advised against it:
I'm still not sure how I feel about it. It would be better ergonomics, but it's a bit magic-y. It would also require removing is equivalent to with the drawback that |
|
Great points! I would also assume that custom comparators would be the rarer case, so having it behind a flag doesn't seem too bad. So, disregard my previous comment, the current behavior/interface looks pretty good 👍 . |
|
Are we ready to land this then? |
|
I've got to go through Ian's review still, hopefully sometime today. Other than that, I'm good as long as no one else has any concerns before then. |
|
ok, i'm marking it draft until you're done. then we can try and get it landed. |
Co-authored-by: Ian Manske <ian.manske@pm.me>
|
I think we're good to go on this @fdncred |
|
ok, just need @IanManske's sign-off since he had a bunch of comments. |
IanManske
left a comment
There was a problem hiding this comment.
Thanks for the fixes, this looks good 🚀
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> <type>`. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table<a: int>`, whereas the type returned by `Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list<any> and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> * Play whack-a-mole with the commands and scripts this will inevitably break
Description
Closes #12535
Implements sort-by functionality of #8322
Fixes sort-by part of #8667
This PR does two main things: add a new cell path and closure parameter to
sort-by, and attempt to make Nushell's sorting behavior well-defined.sort-byfeaturesThe
columnsparameter is replaced with acomparatorparameter, which can be a cell path or a closure. Examples are from docs PR.Cell paths
The basic interactive usage of
sort-byis the same. For example,ls | sort-by modifiedstill works the same as before. It is not quite a drop-in replacement, see behavior changes.Here's an example of how the cell path comparator might be useful:
Key closures
You can supply a closure which will transform each value into a sorting key (without changing the underlying data). Here's an example of a key closure, where we want to sort a list of assignments by their average grade:
Custom sort closure
The
--custom, or-c, flag will tellsort-byto interpret closures as custom sort closures. A custom sort closure has two parameters, and returns a boolean. The closure should returntrueif the first parameter comes before the second parameter in the sort order.For a simple example, we could rewrite a cell path sort as a custom sort (see here for a more complex example):
Making sort more consistent
I think it's important for something as essential as
sortto have well-defined semantics. This PR contains some changes to try to make the behavior ofsortandsort-byconsistent. In addition, after working with the internals of sorting code, I have a much deeper understanding of all of the edge cases. Here is my attempt to try to better define some of the semantics of sorting (if you are just interested in changes, skip to "User-Facing changes")sort,sort -v, andsort-bynow all work the same. Each individual sort implementation has been refactored into two functions insort_utils.rs:sort, andsort_by. These can also be used in other parts of Nushell where values need to be sorted.sortandsort-byused to handle-iand-ndifferently.sort -nwould consider all values which can't be coerced into a string to be equalsort-by -iandsort-by -nwould only work if all values were stringssortandsort-bysupport sorting mixed types. There was a lot of discussion about potentially makingsortandsort-byonly work on lists of homogeneous types, but the general consensus was thatsortshould not error just because its input contains incompatible types.nullvalues easier, I changed the PartialOrd order to sortNothingvalues to the end of a list, regardless of what other types the list contains. Before,nullwould be sorted beforeBinary,CellPath, andCustomvalues.[0x[1] (date now) "a" ("yesterday" | into datetime) "b" 0x[0]]will be sorted as["a", "b", a day ago, now, [0], [1]], where sorted strings appear first, then sorted datetimes, etc.Ints andFloats, which will intermix,StringsandGlobs, which will intermix, andNoneas described above. Additionally, natural sort will intermix strings with ints and floats (see below).User-Facing Changes
New features
columnsstring parameter ofsort-bywith a cell path or a closure.closureparameter acts as a "key sort"; that is, each element is transformed by the closure into a sorting key--custom(-c) parameter, you can define a comparison function for completely custom sorting order.Behavior changes
sort -vdoes not coerce record values to stringsThis was a bit of a surprising behavior, and is now unified with the behavior of
sortandsort-by. Here's an example where you can observe the values being implicitly coerced into strings for sorting, as they are sorted like strings rather than numbers:Old behavior:
$ {foo: 9 bar: 10} | sort -v ╭─────┬────╮ │ bar │ 10 │ │ foo │ 9 │ ╰─────┴────╯New behavior:
$ {foo: 9 bar: 10} | sort -v ╭─────┬────╮ │ foo │ 9 │ │ bar │ 10 │ ╰─────┴────╯Changed
sort-byparameters fromstringtocell-pathorclosure. Typical interactive usage is the same as before, but if passing a variable tosort-byit must be a cell path (or closure), not a stringOld behavior:
New behavior:
Insensitve and natural sorting behavior reworked
Previously, the
-iand-nworked differently forsortandsort-by(see "Making sort more consistent"). Here are examples of how these options result in different sorts now:sort -nsort-by -isort-by -nSorting a list of non-record values with a non-existent column/path now errors instead of sorting the values directly (
sortshould be used for this, notsort-by)Old behavior:
New behavior:
sortandsort-byoutputListinstead ofListStreamThis isn't a meaningful change (unless I misunderstand the purpose of ListStream), since
sortandsort-byboth need to collect in order to do the sorting anyway, but is user observable.Old behavior:
New behavior:
sortnow errors when nothing is piped in (sort-byalready did this)Tests + Formatting
I added lots of unit tests on the new sort implementation to enforce new sort behaviors and prevent regressions.
After Submitting
See docs PR, which is ~2/3 finished.