Skip to content

Deprecate split-by command#14019

Merged
sholderbach merged 3 commits intonushell:mainfrom
IanManske:remove-split-by
Nov 21, 2024
Merged

Deprecate split-by command#14019
sholderbach merged 3 commits intonushell:mainfrom
IanManske:remove-split-by

Conversation

@IanManske
Copy link
Copy Markdown
Member

@IanManske IanManske commented Oct 7, 2024

Description

I'm not quite sure what the point of the split-by command is. The only example for the command seems to suggest it's an additional grouping command. I.e., a record that seems to be the output of the group-by command is passed to split-by which then adds an additional layer of grouping based on a different column.

User-Facing Changes

Breaking change, deprecated the command.

@IanManske IanManske added notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes deprecated:pr-commands (deprecated: too vague) This PR changes our commands in some way labels Oct 7, 2024
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

It's probably niche but it's one of @andrasio's commands that I'm guessing he used in his covid analysis long ago. Out of respect, I'd rather get his opinion, if he has any, before doing anything to it.

I'm not really sure how to use it on anything but a record of tables.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

Ah, this should be an example too, showing that you need to group-by first.

 [[first_name, last_name, rusty_at, type];
   [Andrés, Robalino, "10/11/2013", A],
   [JT, Turner, "10/12/2013", B],
   [Yehuda, Katz, "10/11/2013", A]] | group-by rusty_at | split-by type | table -e
╭───┬─────────────────────────────────────────────────────────────────────╮
    ╭────────────┬────────────────────────────────────────────────────╮ 
 A               ╭─#─┬─first_name─┬─last_name─┬──rusty_at──┬─type─╮  
     10/11/2013   0  Andrés      Robalino   10/11/2013  A      
                  1  Yehuda      Katz       10/11/2013  A      
                 ╰───┴────────────┴───────────┴────────────┴──────╯  
    ╰────────────┴────────────────────────────────────────────────────╯ 
    ╭────────────┬────────────────────────────────────────────────────╮ 
 B               ╭─#─┬─first_name─┬─last_name─┬──rusty_at──┬─type─╮  
     10/12/2013   0  JT          Turner     10/12/2013  B      
                 ╰───┴────────────┴───────────┴────────────┴──────╯  
    ╰────────────┴────────────────────────────────────────────────────╯ 
╰───┴─────────────────────────────────────────────────────────────────────╯

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 7, 2024

I kind of like how this groups and splits by two different things.

ls | update modified {format date %D} | group-by type | split-by modified | table -e
╭──────────┬────────────────────────────────────────────────────────────────────╮
           ╭──────┬─────────────────────────────────────────────────────────╮ 
 09/14/24         ╭─#─┬────────name─────────┬─type─┬──size───┬─modified─╮  
            file   0  CITATION.cff         file    812 B  09/14/24   
                   1  CODE_OF_CONDUCT.md   file   3.4 KB  09/14/24   
                   2  CONTRIBUTING.md      file  11.2 KB  09/14/24   
                   3  Cross.toml           file    666 B  09/14/24   
                   4  LICENSE              file   1.1 KB  09/14/24   
                   5  README.md            file  12.3 KB  09/14/24   
                   6  SECURITY.md          file   2.7 KB  09/14/24   
                   7  rust-toolchain.toml  file   1.1 KB  09/14/24   
                   8  typos.toml           file    513 B  09/14/24   
                  ╰─#─┴────────name─────────┴─type─┴──size───┴─modified─╯  
                  ╭─#─┬──name───┬─type─┬──size──┬─modified─╮               
            dir    0  assets   dir    160 B  09/14/24                
                   1  crates   dir   1.3 KB  09/14/24                
                   2  devdocs  dir    224 B  09/14/24                
                   3  scripts  dir    416 B  09/14/24                
                   4  tests    dir    544 B  09/14/24                
                   5  wix      dir    160 B  09/14/24                
                  ╰─#─┴──name───┴─type─┴──size──┴─modified─╯               
           ╰──────┴─────────────────────────────────────────────────────────╯ 
           ╭──────┬─────────────────────────────────────────────────╮         
 10/06/24         ╭─#─┬────name────┬─type─┬───size───┬─modified─╮          
            file   0  Cargo.lock  file  179.6 KB  10/06/24           
                   1  Cargo.toml  file    9.2 KB  10/06/24           
                  ╰───┴────────────┴──────┴──────────┴──────────╯          
           ╰──────┴─────────────────────────────────────────────────╯         
           ╭──────┬────────────────────────────────────────────────╮          
 10/02/24         ╭─#─┬────name────┬─type─┬──size───┬─modified─╮           
            file   0  toolkit.nu  file  19.7 KB  10/02/24            
                  ╰───┴────────────┴──────┴─────────┴──────────╯           
           ╰──────┴────────────────────────────────────────────────╯          
           ╭─────┬───────────────────────────────────────────╮                
 09/15/24        ╭─#─┬──name───┬─type─┬─size──┬─modified─╮                 
            dir   0  benches  dir   128 B  09/15/24                  
                 ╰───┴─────────┴──────┴───────┴──────────╯                 
           ╰─────┴───────────────────────────────────────────╯                
           ╭─────┬──────────────────────────────────────────╮                 
 10/04/24        ╭─#─┬──name──┬─type─┬─size──┬─modified─╮                  
            dir   0  docker  dir    96 B  10/04/24                   
                  1  src     dir   384 B  10/04/24                   
                 ╰───┴────────┴──────┴───────┴──────────╯                  
           ╰─────┴──────────────────────────────────────────╯                 
           ╭─────┬──────────────────────────────────────────╮                 
 10/03/24        ╭─#─┬──name──┬─type─┬─size──┬─modified─╮                  
            dir   0  target  dir   224 B  10/03/24                   
                 ╰───┴────────┴──────┴───────┴──────────╯                  
           ╰─────┴──────────────────────────────────────────╯                 
╰──────────┴────────────────────────────────────────────────────────────────────╯

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Oct 8, 2024

I don't want to lose this command. Let's just keep it.

@IanManske
Copy link
Copy Markdown
Member Author

IanManske commented Oct 8, 2024

Hmm, I still don't see the use case.

  • It only works on group-by output.
  • They cannot be chained one after another (e.g., split-by a | split-by b).
  • It's essentially another group-by command and probably does not accomplish anything group-by cannot already.
  • Anecdotally, I have seen no one use the command.

@hustcer
Copy link
Copy Markdown
Contributor

hustcer commented Oct 8, 2024

We can add a warning message for this command in the following release to tell the user this command will be removed in the future, and remove it after a few releases if there aren't too many objections

@IanManske
Copy link
Copy Markdown
Member Author

We can add a warning message for this command in the following release to tell the user this command will be removed in the future, and remove it after a few releases if there aren't too many objections

That works for me 👍

@NotTheDr01ds
Copy link
Copy Markdown
Contributor

NotTheDr01ds commented Oct 9, 2024

Recreated as a custom-command in case anyone misses it:

def split-by [ splitter: string ]: record -> record {
  transpose -d record_split_keys values
  | flatten --all
  | group-by --to-table ([$splitter] | into cell-path)
  | update items {
      group-by --to-table record_split_keys
      | update items { reject record_split_keys }
      | transpose -dr
    }
  | transpose -dr
}

Tested with the help split-by example as well as both of @fdncred's scenarios above.

It won't return identical results if the input happened to have a key named record_split_keys, but that's why I didn't name it something simple like keys ;-). Otherwise, it should return the same results as the internal split-by.

@IanManske IanManske changed the title Remove split-by command Deprecate split-by command Oct 11, 2024
@IanManske IanManske marked this pull request as ready for review October 11, 2024 04:58
@IanManske IanManske marked this pull request as draft October 11, 2024 05:02
fdncred pushed a commit that referenced this pull request Nov 15, 2024
- closes #14330 

Related:
- #2607 
- #14019
- #14316 

# Description
This PR changes `group-by` to support grouping by multiple `grouper`
arguments.

# Changes

- No grouper: no change in behavior 
- Single grouper
  - `--to-table=false`: no change in behavior
  - `--to-table=true`:
    - closure grouper: named group0
    - cell-path grouper: named after the cell-path
- Multiple groupers:
  - `--to-table=false`: nested groups
- `--to-table=true`: one column for each grouper argument, followed by
the `items` column
    - columns corresponding to cell-paths are named after them
- columns corresponding to closure groupers are named `group{i}` where
`i` is the index of the grouper argument

# Examples
```nushell
> [1 3 1 3 2 1 1] | group-by
╭───┬───────────╮
│   │ ╭───┬───╮ │
│ 1 │ │ 0 │ 1 │ │
│   │ │ 1 │ 1 │ │
│   │ │ 2 │ 1 │ │
│   │ │ 3 │ 1 │ │
│   │ ╰───┴───╯ │
│   │ ╭───┬───╮ │
│ 3 │ │ 0 │ 3 │ │
│   │ │ 1 │ 3 │ │
│   │ ╰───┴───╯ │
│   │ ╭───┬───╮ │
│ 2 │ │ 0 │ 2 │ │
│   │ ╰───┴───╯ │
╰───┴───────────╯

> [1 3 1 3 2 1 1] | group-by --to-table
╭─#─┬─group─┬───items───╮
│ 0 │ 1     │ ╭───┬───╮ │
│   │       │ │ 0 │ 1 │ │
│   │       │ │ 1 │ 1 │ │
│   │       │ │ 2 │ 1 │ │
│   │       │ │ 3 │ 1 │ │
│   │       │ ╰───┴───╯ │
│ 1 │ 3     │ ╭───┬───╮ │
│   │       │ │ 0 │ 3 │ │
│   │       │ │ 1 │ 3 │ │
│   │       │ ╰───┴───╯ │
│ 2 │ 2     │ ╭───┬───╮ │
│   │       │ │ 0 │ 2 │ │
│   │       │ ╰───┴───╯ │
╰─#─┴─group─┴───items───╯

> [1 3 1 3 2 1 1] | group-by { $in >= 2 }
╭───────┬───────────╮
│       │ ╭───┬───╮ │
│ false │ │ 0 │ 1 │ │
│       │ │ 1 │ 1 │ │
│       │ │ 2 │ 1 │ │
│       │ │ 3 │ 1 │ │
│       │ ╰───┴───╯ │
│       │ ╭───┬───╮ │
│ true  │ │ 0 │ 3 │ │
│       │ │ 1 │ 3 │ │
│       │ │ 2 │ 2 │ │
│       │ ╰───┴───╯ │
╰───────┴───────────╯

> [1 3 1 3 2 1 1] | group-by { $in >= 2 } --to-table
╭─#─┬─group0─┬───items───╮
│ 0 │ false  │ ╭───┬───╮ │
│   │        │ │ 0 │ 1 │ │
│   │        │ │ 1 │ 1 │ │
│   │        │ │ 2 │ 1 │ │
│   │        │ │ 3 │ 1 │ │
│   │        │ ╰───┴───╯ │
│ 1 │ true   │ ╭───┬───╮ │
│   │        │ │ 0 │ 3 │ │
│   │        │ │ 1 │ 3 │ │
│   │        │ │ 2 │ 2 │ │
│   │        │ ╰───┴───╯ │
╰─#─┴─group0─┴───items───╯
```

```nushell
let data = [
    [name, lang, year];
    [andres, rb, "2019"],
    [jt, rs, "2019"],
    [storm, rs, "2021"]
]

> $data
╭─#─┬──name──┬─lang─┬─year─╮
│ 0 │ andres │ rb   │ 2019 │
│ 1 │ jt     │ rs   │ 2019 │
│ 2 │ storm  │ rs   │ 2021 │
╰─#─┴──name──┴─lang─┴─year─╯
```

```nushell
> $data | group-by lang
╭────┬──────────────────────────────╮
│    │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│ rb │ │ 0 │ andres │ rb   │ 2019 │ │
│    │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│    │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│ rs │ │ 0 │ jt    │ rs   │ 2019 │  │
│    │ │ 1 │ storm │ rs   │ 2021 │  │
│    │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰────┴──────────────────────────────╯
```

Group column is now named after the grouper, to allow multiple groupers.
```nushell
> $data | group-by lang --to-table  # column names changed!
╭─#─┬─lang─┬────────────items─────────────╮
│ 0 │ rb   │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│   │      │ │ 0 │ andres │ rb   │ 2019 │ │
│   │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs   │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│   │      │ │ 0 │ jt    │ rs   │ 2019 │  │
│   │      │ │ 1 │ storm │ rs   │ 2021 │  │
│   │      │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰─#─┴─lang─┴────────────items─────────────╯
```

Grouping by multiple columns makes finer grained aggregations possible.
```nushell
> $data | group-by lang year --to-table
╭─#─┬─lang─┬─year─┬────────────items─────────────╮
│ 0 │ rb   │ 2019 │ ╭─#─┬──name──┬─lang─┬─year─╮ │
│   │      │      │ │ 0 │ andres │ rb   │ 2019 │ │
│   │      │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │
│ 1 │ rs   │ 2019 │ ╭─#─┬─name─┬─lang─┬─year─╮   │
│   │      │      │ │ 0 │ jt   │ rs   │ 2019 │   │
│   │      │      │ ╰─#─┴─name─┴─lang─┴─year─╯   │
│ 2 │ rs   │ 2021 │ ╭─#─┬─name──┬─lang─┬─year─╮  │
│   │      │      │ │ 0 │ storm │ rs   │ 2021 │  │
│   │      │      │ ╰─#─┴─name──┴─lang─┴─year─╯  │
╰─#─┴─lang─┴─year─┴────────────items─────────────╯
```

Grouping by multiple columns, without `--to-table` returns a nested
structure.
This is equivalent to `$data | group-by year | split-by lang`, making
`split-by` obsolete.
```nushell
> $data | group-by lang year
╭────┬─────────────────────────────────────────╮
│    │ ╭──────┬──────────────────────────────╮ │
│ rb │ │      │ ╭─#─┬──name──┬─lang─┬─year─╮ │ │
│    │ │ 2019 │ │ 0 │ andres │ rb   │ 2019 │ │ │
│    │ │      │ ╰─#─┴──name──┴─lang─┴─year─╯ │ │
│    │ ╰──────┴──────────────────────────────╯ │
│    │ ╭──────┬─────────────────────────────╮  │
│ rs │ │      │ ╭─#─┬─name─┬─lang─┬─year─╮  │  │
│    │ │ 2019 │ │ 0 │ jt   │ rs   │ 2019 │  │  │
│    │ │      │ ╰─#─┴─name─┴─lang─┴─year─╯  │  │
│    │ │      │ ╭─#─┬─name──┬─lang─┬─year─╮ │  │
│    │ │ 2021 │ │ 0 │ storm │ rs   │ 2021 │ │  │
│    │ │      │ ╰─#─┴─name──┴─lang─┴─year─╯ │  │
│    │ ╰──────┴─────────────────────────────╯  │
╰────┴─────────────────────────────────────────╯
```

From #2607:
> Here's a couple more examples without much explanation. This one shows
adding two grouping keys. I'm always wanting to add more columns when
using group-by and it just-work:tm: `gb.exe -f movies-2.csv -k 3,2 -s 7
--skip_header`
> 
> ```
>  k:3                   | k:2       | count | sum:7
> -----------------------+-----------+-------+--------------------
>  20th Century Fox      | Drama     | 1     | 117.09
>  20th Century Fox      | Romance   | 1     | 39.66
>  CBS                   | Comedy    | 1     | 77.09
>  Disney                | Animation | 4     | 1264.23
>  Disney                | Comedy    | 4     | 950.27
>  Fox                   | Comedy    | 5     | 661.85
>  Independent           | Comedy    | 7     | 399.07
>  Independent           | Drama     | 4     | 69.75
>  Independent           | Romance   | 7     | 1048.75
>  Independent           | romance   | 1     | 29.37
> ...
> ```

This example can be achieved like this:
```nushell
> open movies-2.csv
  | group-by "Lead Studio" Genre --to-table
  | insert count {get items | length}
  | insert sum { get items."Worldwide Gross" | math sum}
  | reject items
  | sort-by "Lead Studio" Genre
╭─#──┬──────Lead Studio──────┬───Genre───┬─count─┬───sum───╮
│ 0  │ 20th Century Fox      │ Drama     │     1 │  117.09 │
│ 1  │ 20th Century Fox      │ Romance   │     1 │   39.66 │
│ 2  │ CBS                   │ Comedy    │     1 │   77.09 │
│ 3  │ Disney                │ Animation │     4 │ 1264.23 │
│ 4  │ Disney                │ Comedy    │     4 │  950.27 │
│ 5  │ Fox                   │ Comedy    │     5 │  661.85 │
│ 6  │ Fox                   │ comedy    │     1 │   60.72 │
│ 7  │ Independent           │ Comedy    │     7 │  399.07 │
│ 8  │ Independent           │ Drama     │     4 │   69.75 │
│ 9  │ Independent           │ Romance   │     7 │ 1048.75 │
│ 10 │ Independent           │ romance   │     1 │   29.37 │
...
```
@Bahex
Copy link
Copy Markdown
Member

Bahex commented Nov 20, 2024

It's probably safe to deprecate split-by now, since group-by completely covers its use case.

#14337

Grouping by multiple columns, without --to-table returns a nested structure. This is equivalent to $data | group-by year | split-by lang, making split-by obsolete.

> $data | group-by lang year
╭────┬─────────────────────────────────────────╮
     ╭──────┬──────────────────────────────╮ 
 rb         ╭─#─┬──name──┬─lang─┬─year─╮  
      2019   0  andres  rb    2019   
            ╰─#─┴──name──┴─lang─┴─year─╯  
     ╰──────┴──────────────────────────────╯ 
     ╭──────┬─────────────────────────────╮  
 rs         ╭─#─┬─name─┬─lang─┬─year─╮    
      2019   0  jt    rs    2019     
            ╰─#─┴─name─┴─lang─┴─year─╯    
            ╭─#─┬─name──┬─lang─┬─year─╮   
      2021   0  storm  rs    2021    
            ╰─#─┴─name──┴─lang─┴─year─╯   
     ╰──────┴─────────────────────────────╯  
╰────┴─────────────────────────────────────────╯

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Nov 20, 2024

I've tested the command above and changed it slightly to make it work.

ls | update modified {format date %D} | group-by modified type | table -e

I'm fine with deprecating split-by now. Thanks!

@IanManske IanManske marked this pull request as ready for review November 21, 2024 07:16
},

#[error("Deprecated: {old_command}")]
#[diagnostic(help("for more info see {url}"))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ever satisfy that with useful URLs :P ?
Slightly worried with those half sentence format strings, when folks may treat that as free text and never check the format string but deprecation warnings shouldn't live too long.

@sholderbach sholderbach added the category:deprecation Related to the deprecation of commands/features/options label Nov 21, 2024
@sholderbach sholderbach merged commit d8c2493 into nushell:main Nov 21, 2024
@github-actions github-actions bot added this to the v0.101.0 milestone Nov 21, 2024
schrieveslaach pushed a commit to schrieveslaach/nushell that referenced this pull request Nov 21, 2024
I'm not quite sure what the point of the `split-by` command is. The only
example for the command seems to suggest it's an additional grouping
command. I.e., a record that seems to be the output of the `group-by`
command is passed to `split-by` which then adds an additional layer of
grouping based on a different column.

Breaking change, deprecated the command.
fdncred pushed a commit that referenced this pull request Dec 19, 2024
# Description

#14019 deprecated the `split-by` command. This sets its doc-category to
"deprecated" so that it will display that way in the in-shell and online
help

# User-Facing Changes

`split-by` will now show as a deprecated command in Help. Will also be
reported using:

```nushell
help commands | where category == deprecated
```

# Tests + Formatting

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting

N/A
WindSoilder pushed a commit that referenced this pull request Jan 6, 2025
# Description

Remove commands which were deprecated in 0.101:

* `split-by` (#14019)
* `date to-record` and `date to-table` (#14319)

# User-Facing Changes

- 🟢 `toolkit fmt`
- 🟢 `toolkit clippy`
- 🟢 `toolkit test`
- 🟢 `toolkit test stdlib`

# After Submitting

TODO: `grep` (`ag`) doc repo for any usage of these commands
@IanManske IanManske deleted the remove-split-by branch March 9, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category:deprecation Related to the deprecation of commands/features/options deprecated:pr-commands (deprecated: too vague) This PR changes our commands in some way notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants