Deprecate split-by command#14019
Conversation
|
It's probably niche but it's one of @andrasio's commands that I'm guessing he used in his covid analysis long ago. Out of respect, I'd rather get his opinion, if he has any, before doing anything to it. I'm not really sure how to use it on anything but a record of tables. |
|
Ah, this should be an example too, showing that you need to group-by first. ❯ [[first_name, last_name, rusty_at, type];
[Andrés, Robalino, "10/11/2013", A],
[JT, Turner, "10/12/2013", B],
[Yehuda, Katz, "10/11/2013", A]] | group-by rusty_at | split-by type | table -e
╭───┬─────────────────────────────────────────────────────────────────────╮
│ │ ╭────────────┬────────────────────────────────────────────────────╮ │
│ A │ │ │ ╭─#─┬─first_name─┬─last_name─┬──rusty_at──┬─type─╮ │ │
│ │ │ 10/11/2013 │ │ 0 │ Andrés │ Robalino │ 10/11/2013 │ A │ │ │
│ │ │ │ │ 1 │ Yehuda │ Katz │ 10/11/2013 │ A │ │ │
│ │ │ │ ╰───┴────────────┴───────────┴────────────┴──────╯ │ │
│ │ ╰────────────┴────────────────────────────────────────────────────╯ │
│ │ ╭────────────┬────────────────────────────────────────────────────╮ │
│ B │ │ │ ╭─#─┬─first_name─┬─last_name─┬──rusty_at──┬─type─╮ │ │
│ │ │ 10/12/2013 │ │ 0 │ JT │ Turner │ 10/12/2013 │ B │ │ │
│ │ │ │ ╰───┴────────────┴───────────┴────────────┴──────╯ │ │
│ │ ╰────────────┴────────────────────────────────────────────────────╯ │
╰───┴─────────────────────────────────────────────────────────────────────╯ |
|
I kind of like how this groups and splits by two different things. ls | update modified {format date %D} | group-by type | split-by modified | table -e
╭──────────┬────────────────────────────────────────────────────────────────────╮
│ │ ╭──────┬─────────────────────────────────────────────────────────╮ │
│ 09/14/24 │ │ │ ╭─#─┬────────name─────────┬─type─┬──size───┬─modified─╮ │ │
│ │ │ file │ │ 0 │ CITATION.cff │ file │ 812 B │ 09/14/24 │ │ │
│ │ │ │ │ 1 │ CODE_OF_CONDUCT.md │ file │ 3.4 KB │ 09/14/24 │ │ │
│ │ │ │ │ 2 │ CONTRIBUTING.md │ file │ 11.2 KB │ 09/14/24 │ │ │
│ │ │ │ │ 3 │ Cross.toml │ file │ 666 B │ 09/14/24 │ │ │
│ │ │ │ │ 4 │ LICENSE │ file │ 1.1 KB │ 09/14/24 │ │ │
│ │ │ │ │ 5 │ README.md │ file │ 12.3 KB │ 09/14/24 │ │ │
│ │ │ │ │ 6 │ SECURITY.md │ file │ 2.7 KB │ 09/14/24 │ │ │
│ │ │ │ │ 7 │ rust-toolchain.toml │ file │ 1.1 KB │ 09/14/24 │ │ │
│ │ │ │ │ 8 │ typos.toml │ file │ 513 B │ 09/14/24 │ │ │
│ │ │ │ ╰─#─┴────────name─────────┴─type─┴──size───┴─modified─╯ │ │
│ │ │ │ ╭─#─┬──name───┬─type─┬──size──┬─modified─╮ │ │
│ │ │ dir │ │ 0 │ assets │ dir │ 160 B │ 09/14/24 │ │ │
│ │ │ │ │ 1 │ crates │ dir │ 1.3 KB │ 09/14/24 │ │ │
│ │ │ │ │ 2 │ devdocs │ dir │ 224 B │ 09/14/24 │ │ │
│ │ │ │ │ 3 │ scripts │ dir │ 416 B │ 09/14/24 │ │ │
│ │ │ │ │ 4 │ tests │ dir │ 544 B │ 09/14/24 │ │ │
│ │ │ │ │ 5 │ wix │ dir │ 160 B │ 09/14/24 │ │ │
│ │ │ │ ╰─#─┴──name───┴─type─┴──size──┴─modified─╯ │ │
│ │ ╰──────┴─────────────────────────────────────────────────────────╯ │
│ │ ╭──────┬─────────────────────────────────────────────────╮ │
│ 10/06/24 │ │ │ ╭─#─┬────name────┬─type─┬───size───┬─modified─╮ │ │
│ │ │ file │ │ 0 │ Cargo.lock │ file │ 179.6 KB │ 10/06/24 │ │ │
│ │ │ │ │ 1 │ Cargo.toml │ file │ 9.2 KB │ 10/06/24 │ │ │
│ │ │ │ ╰───┴────────────┴──────┴──────────┴──────────╯ │ │
│ │ ╰──────┴─────────────────────────────────────────────────╯ │
│ │ ╭──────┬────────────────────────────────────────────────╮ │
│ 10/02/24 │ │ │ ╭─#─┬────name────┬─type─┬──size───┬─modified─╮ │ │
│ │ │ file │ │ 0 │ toolkit.nu │ file │ 19.7 KB │ 10/02/24 │ │ │
│ │ │ │ ╰───┴────────────┴──────┴─────────┴──────────╯ │ │
│ │ ╰──────┴────────────────────────────────────────────────╯ │
│ │ ╭─────┬───────────────────────────────────────────╮ │
│ 09/15/24 │ │ │ ╭─#─┬──name───┬─type─┬─size──┬─modified─╮ │ │
│ │ │ dir │ │ 0 │ benches │ dir │ 128 B │ 09/15/24 │ │ │
│ │ │ │ ╰───┴─────────┴──────┴───────┴──────────╯ │ │
│ │ ╰─────┴───────────────────────────────────────────╯ │
│ │ ╭─────┬──────────────────────────────────────────╮ │
│ 10/04/24 │ │ │ ╭─#─┬──name──┬─type─┬─size──┬─modified─╮ │ │
│ │ │ dir │ │ 0 │ docker │ dir │ 96 B │ 10/04/24 │ │ │
│ │ │ │ │ 1 │ src │ dir │ 384 B │ 10/04/24 │ │ │
│ │ │ │ ╰───┴────────┴──────┴───────┴──────────╯ │ │
│ │ ╰─────┴──────────────────────────────────────────╯ │
│ │ ╭─────┬──────────────────────────────────────────╮ │
│ 10/03/24 │ │ │ ╭─#─┬──name──┬─type─┬─size──┬─modified─╮ │ │
│ │ │ dir │ │ 0 │ target │ dir │ 224 B │ 10/03/24 │ │ │
│ │ │ │ ╰───┴────────┴──────┴───────┴──────────╯ │ │
│ │ ╰─────┴──────────────────────────────────────────╯ │
╰──────────┴────────────────────────────────────────────────────────────────────╯ |
|
I don't want to lose this command. Let's just keep it. |
|
Hmm, I still don't see the use case.
|
|
We can add a warning message for this command in the following release to tell the user this command will be removed in the future, and remove it after a few releases if there aren't too many objections |
That works for me 👍 |
|
Recreated as a custom-command in case anyone misses it: def split-by [ splitter: string ]: record -> record {
transpose -d record_split_keys values
| flatten --all
| group-by --to-table ([$splitter] | into cell-path)
| update items {
group-by --to-table record_split_keys
| update items { reject record_split_keys }
| transpose -dr
}
| transpose -dr
}
Tested with the It won't return identical results if the input happened to have a key named |
b0fede8 to
de273e2
Compare
de273e2 to
1524939
Compare
- closes #14330 Related: - #2607 - #14019 - #14316 # Description This PR changes `group-by` to support grouping by multiple `grouper` arguments. # Changes - No grouper: no change in behavior - Single grouper - `--to-table=false`: no change in behavior - `--to-table=true`: - closure grouper: named group0 - cell-path grouper: named after the cell-path - Multiple groupers: - `--to-table=false`: nested groups - `--to-table=true`: one column for each grouper argument, followed by the `items` column - columns corresponding to cell-paths are named after them - columns corresponding to closure groupers are named `group{i}` where `i` is the index of the grouper argument # Examples ```nushell > [1 3 1 3 2 1 1] | group-by ╭───┬───────────╮ │ │ ╭───┬───╮ │ │ 1 │ │ 0 │ 1 │ │ │ │ │ 1 │ 1 │ │ │ │ │ 2 │ 1 │ │ │ │ │ 3 │ 1 │ │ │ │ ╰───┴───╯ │ │ │ ╭───┬───╮ │ │ 3 │ │ 0 │ 3 │ │ │ │ │ 1 │ 3 │ │ │ │ ╰───┴───╯ │ │ │ ╭───┬───╮ │ │ 2 │ │ 0 │ 2 │ │ │ │ ╰───┴───╯ │ ╰───┴───────────╯ > [1 3 1 3 2 1 1] | group-by --to-table ╭─#─┬─group─┬───items───╮ │ 0 │ 1 │ ╭───┬───╮ │ │ │ │ │ 0 │ 1 │ │ │ │ │ │ 1 │ 1 │ │ │ │ │ │ 2 │ 1 │ │ │ │ │ │ 3 │ 1 │ │ │ │ │ ╰───┴───╯ │ │ 1 │ 3 │ ╭───┬───╮ │ │ │ │ │ 0 │ 3 │ │ │ │ │ │ 1 │ 3 │ │ │ │ │ ╰───┴───╯ │ │ 2 │ 2 │ ╭───┬───╮ │ │ │ │ │ 0 │ 2 │ │ │ │ │ ╰───┴───╯ │ ╰─#─┴─group─┴───items───╯ > [1 3 1 3 2 1 1] | group-by { $in >= 2 } ╭───────┬───────────╮ │ │ ╭───┬───╮ │ │ false │ │ 0 │ 1 │ │ │ │ │ 1 │ 1 │ │ │ │ │ 2 │ 1 │ │ │ │ │ 3 │ 1 │ │ │ │ ╰───┴───╯ │ │ │ ╭───┬───╮ │ │ true │ │ 0 │ 3 │ │ │ │ │ 1 │ 3 │ │ │ │ │ 2 │ 2 │ │ │ │ ╰───┴───╯ │ ╰───────┴───────────╯ > [1 3 1 3 2 1 1] | group-by { $in >= 2 } --to-table ╭─#─┬─group0─┬───items───╮ │ 0 │ false │ ╭───┬───╮ │ │ │ │ │ 0 │ 1 │ │ │ │ │ │ 1 │ 1 │ │ │ │ │ │ 2 │ 1 │ │ │ │ │ │ 3 │ 1 │ │ │ │ │ ╰───┴───╯ │ │ 1 │ true │ ╭───┬───╮ │ │ │ │ │ 0 │ 3 │ │ │ │ │ │ 1 │ 3 │ │ │ │ │ │ 2 │ 2 │ │ │ │ │ ╰───┴───╯ │ ╰─#─┴─group0─┴───items───╯ ``` ```nushell let data = [ [name, lang, year]; [andres, rb, "2019"], [jt, rs, "2019"], [storm, rs, "2021"] ] > $data ╭─#─┬──name──┬─lang─┬─year─╮ │ 0 │ andres │ rb │ 2019 │ │ 1 │ jt │ rs │ 2019 │ │ 2 │ storm │ rs │ 2021 │ ╰─#─┴──name──┴─lang─┴─year─╯ ``` ```nushell > $data | group-by lang ╭────┬──────────────────────────────╮ │ │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │ rb │ │ 0 │ andres │ rb │ 2019 │ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │ │ ╭─#─┬─name──┬─lang─┬─year─╮ │ │ rs │ │ 0 │ jt │ rs │ 2019 │ │ │ │ │ 1 │ storm │ rs │ 2021 │ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │ ╰────┴──────────────────────────────╯ ``` Group column is now named after the grouper, to allow multiple groupers. ```nushell > $data | group-by lang --to-table # column names changed! ╭─#─┬─lang─┬────────────items─────────────╮ │ 0 │ rb │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │ │ │ │ 0 │ andres │ rb │ 2019 │ │ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │ 1 │ rs │ ╭─#─┬─name──┬─lang─┬─year─╮ │ │ │ │ │ 0 │ jt │ rs │ 2019 │ │ │ │ │ │ 1 │ storm │ rs │ 2021 │ │ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │ ╰─#─┴─lang─┴────────────items─────────────╯ ``` Grouping by multiple columns makes finer grained aggregations possible. ```nushell > $data | group-by lang year --to-table ╭─#─┬─lang─┬─year─┬────────────items─────────────╮ │ 0 │ rb │ 2019 │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │ │ │ │ │ 0 │ andres │ rb │ 2019 │ │ │ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │ 1 │ rs │ 2019 │ ╭─#─┬─name─┬─lang─┬─year─╮ │ │ │ │ │ │ 0 │ jt │ rs │ 2019 │ │ │ │ │ │ ╰─#─┴─name─┴─lang─┴─year─╯ │ │ 2 │ rs │ 2021 │ ╭─#─┬─name──┬─lang─┬─year─╮ │ │ │ │ │ │ 0 │ storm │ rs │ 2021 │ │ │ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │ ╰─#─┴─lang─┴─year─┴────────────items─────────────╯ ``` Grouping by multiple columns, without `--to-table` returns a nested structure. This is equivalent to `$data | group-by year | split-by lang`, making `split-by` obsolete. ```nushell > $data | group-by lang year ╭────┬─────────────────────────────────────────╮ │ │ ╭──────┬──────────────────────────────╮ │ │ rb │ │ │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │ │ │ │ 2019 │ │ 0 │ andres │ rb │ 2019 │ │ │ │ │ │ │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │ │ │ ╰──────┴──────────────────────────────╯ │ │ │ ╭──────┬─────────────────────────────╮ │ │ rs │ │ │ ╭─#─┬─name─┬─lang─┬─year─╮ │ │ │ │ │ 2019 │ │ 0 │ jt │ rs │ 2019 │ │ │ │ │ │ │ ╰─#─┴─name─┴─lang─┴─year─╯ │ │ │ │ │ │ ╭─#─┬─name──┬─lang─┬─year─╮ │ │ │ │ │ 2021 │ │ 0 │ storm │ rs │ 2021 │ │ │ │ │ │ │ ╰─#─┴─name──┴─lang─┴─year─╯ │ │ │ │ ╰──────┴─────────────────────────────╯ │ ╰────┴─────────────────────────────────────────╯ ``` From #2607: > Here's a couple more examples without much explanation. This one shows adding two grouping keys. I'm always wanting to add more columns when using group-by and it just-work:tm: `gb.exe -f movies-2.csv -k 3,2 -s 7 --skip_header` > > ``` > k:3 | k:2 | count | sum:7 > -----------------------+-----------+-------+-------------------- > 20th Century Fox | Drama | 1 | 117.09 > 20th Century Fox | Romance | 1 | 39.66 > CBS | Comedy | 1 | 77.09 > Disney | Animation | 4 | 1264.23 > Disney | Comedy | 4 | 950.27 > Fox | Comedy | 5 | 661.85 > Independent | Comedy | 7 | 399.07 > Independent | Drama | 4 | 69.75 > Independent | Romance | 7 | 1048.75 > Independent | romance | 1 | 29.37 > ... > ``` This example can be achieved like this: ```nushell > open movies-2.csv | group-by "Lead Studio" Genre --to-table | insert count {get items | length} | insert sum { get items."Worldwide Gross" | math sum} | reject items | sort-by "Lead Studio" Genre ╭─#──┬──────Lead Studio──────┬───Genre───┬─count─┬───sum───╮ │ 0 │ 20th Century Fox │ Drama │ 1 │ 117.09 │ │ 1 │ 20th Century Fox │ Romance │ 1 │ 39.66 │ │ 2 │ CBS │ Comedy │ 1 │ 77.09 │ │ 3 │ Disney │ Animation │ 4 │ 1264.23 │ │ 4 │ Disney │ Comedy │ 4 │ 950.27 │ │ 5 │ Fox │ Comedy │ 5 │ 661.85 │ │ 6 │ Fox │ comedy │ 1 │ 60.72 │ │ 7 │ Independent │ Comedy │ 7 │ 399.07 │ │ 8 │ Independent │ Drama │ 4 │ 69.75 │ │ 9 │ Independent │ Romance │ 7 │ 1048.75 │ │ 10 │ Independent │ romance │ 1 │ 29.37 │ ... ```
|
It's probably safe to deprecate
|
|
I've tested the command above and changed it slightly to make it work. ls | update modified {format date %D} | group-by modified type | table -eI'm fine with deprecating |
| }, | ||
|
|
||
| #[error("Deprecated: {old_command}")] | ||
| #[diagnostic(help("for more info see {url}"))] |
There was a problem hiding this comment.
Did we ever satisfy that with useful URLs :P ?
Slightly worried with those half sentence format strings, when folks may treat that as free text and never check the format string but deprecation warnings shouldn't live too long.
I'm not quite sure what the point of the `split-by` command is. The only example for the command seems to suggest it's an additional grouping command. I.e., a record that seems to be the output of the `group-by` command is passed to `split-by` which then adds an additional layer of grouping based on a different column. Breaking change, deprecated the command.
# Description #14019 deprecated the `split-by` command. This sets its doc-category to "deprecated" so that it will display that way in the in-shell and online help # User-Facing Changes `split-by` will now show as a deprecated command in Help. Will also be reported using: ```nushell help commands | where category == deprecated ``` # Tests + Formatting - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting N/A
# Description Remove commands which were deprecated in 0.101: * `split-by` (#14019) * `date to-record` and `date to-table` (#14319) # User-Facing Changes - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting TODO: `grep` (`ag`) doc repo for any usage of these commands
Description
I'm not quite sure what the point of the
split-bycommand is. The only example for the command seems to suggest it's an additional grouping command. I.e., a record that seems to be the output of thegroup-bycommand is passed tosplit-bywhich then adds an additional layer of grouping based on a different column.User-Facing Changes
Breaking change, deprecated the command.