Skip to content

allow group-by and split-by to work with other values#14086

Merged
fdncred merged 3 commits intonushell:mainfrom
fdncred:update_group_by_split_by
Oct 17, 2024
Merged

allow group-by and split-by to work with other values#14086
fdncred merged 3 commits intonushell:mainfrom
fdncred:update_group_by_split_by

Conversation

@fdncred
Copy link
Copy Markdown
Contributor

@fdncred fdncred commented Oct 14, 2024

Description

This PR updates group-by and split-by to allow other nushell Values to be used, namely bools.

the group-by command makes some assumptions:
- if the input data is not a string, the grouper will convert the key to string but the values will remain in their original format. e.g. with bools, "true" and true would be in the same group (see example).
- datetime is formatted based on your configuration setting. use format date to change the format.
- filesize is formatted based on your configuration setting. use format filesize to change the format.
- some nushell values are not supported, such as closures.

Before

 [false, false, true, false, true, false] | group-by | table -e
Error: nu::shell::cant_convert

  × Can't convert to string.
   ╭─[entry #1:1:2]
 1 │ [false, false, true, false, true, false] | group-by | table -e
   ·  ──┬──
   ·    ╰── can't convert bool to string
   ╰────

After

 [false, false, true, false, true, false] | group-by | table -e
╭───────┬───────────────╮
        ╭───┬───────╮ 
 false   0  false  
         1  false  
         2  false  
         3  false  
        ╰───┴───────╯ 
        ╭───┬──────╮  
 true    0  true   
         1  true   
        ╰───┴──────╯  
╰───────┴───────────────╯

User-Facing Changes

Tests + Formatting

After Submitting

let head = call.head;
let grouper: Option<Value> = call.opt(engine_state, stack, 0)?;
let to_table = call.has_flag(engine_state, stack, "to-table")?;
let config = engine_state.get_config();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the group-by results can change depending on the user config sounds like bad design.

If you have a string around that matches the one particular potential formatting based on the config, this could also change the group assignment, but at the very least the columns of the resulting record if a value that has configurable formatting rules.

In general we run into two limitations of the current state of the type system/internal representation.
One we don't easily change even from a user perspective is that our records have string keys and there is no equivalent Value -> Value map type we could use not requiring a string representation and having problems when multiple values collide in the string space.
The second one is also gnarly is defining a Hash/Eq implementation pair for Value so operations like grouping or uniq could happen on the Value level. There the challenge is partially driven by floats and choices like should equivalent floats and ints hash the same if they are numerically equal. Anything there will be a compromise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Discord discussion about this topic, I was complaining about the many functions we have to convert from value to string and not knowing when each one is appropriate when all I want is a string. @IanManske suggested using to_abbreviated_string() instead of adding more match arms to coerce_str/string, which is where the problem is right now with grouping by bools without this PR. The config is used because we allow datetime and filesize formats to be changed via config.

I also looked at just using a Value for grouping but quickly stumbled upon the Hash problem you mentioned above.

The other option is to have yet another value to string function that just uses a consistent format for datetime and filesize regardless of config, but I'm sure people would complain. I'd be ok with that as long as we get rid of some of the other value to string functions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current signature for group-by we will have to accept some level of weird. I think we should clearly state what weirdness we are OK with and the users have to be aware of.
This could mean:

  • some types are not supported and you need to prepare the values specifically yourself based on the semantics the user wants.
  • true and "true" will cluster into the same bucket or alternatively cause an error.
  • User config changes stuff for the formattable values or there is a canonical config with which things are string keyed. Then if you as a user want to retrieve based on one e.g. datetime value you need to convert that value using the same universal default formatting to a string to retrieve the group-by bucket.
  • (making sure that there is nothing in the string conversion that is dependent on factors outside the (static) config. i.e. relative time info on showing datetime, localization env vars.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I agree. This is less about coding and more about deciding what we want for nushell and then putting it in the extra_description()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some extra_description text. Feel free to tweak it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The text is great in my view. A lot more of our commands should have the semantics written on the packaging.

I am still not a fan of the config dependence on the other hand maybe we should make it available and explicitly elicit user feedback on that. I don't have a clear idea what the behavior should be for all the types, and maybe we gain more by collecting several specific complaints and after we get the feedback can make one change to a more static solution. (having python hash() flashbacks right now :D )

@fdncred fdncred merged commit 299d199 into nushell:main Oct 17, 2024
@fdncred fdncred deleted the update_group_by_split_by branch October 17, 2024 21:14
@github-actions github-actions bot added this to the v0.100.0 milestone Oct 17, 2024
@fdncred fdncred added notes:fixes Include the release notes summary in the "Bug fixes" section deprecated:pr-commands (deprecated: too vague) This PR changes our commands in some way labels Oct 18, 2024
@sholderbach
Copy link
Copy Markdown
Member

Next time please mention the expanded semantics in the commit message/PR description, the bool mentioned is kind of a red herrring with it defining its own semantics for all stringable values.

As it wasn't implemented before I disagree on the labelling as bug fix. this is new behavior.

@sholderbach sholderbach removed the notes:fixes Include the release notes summary in the "Bug fixes" section label Oct 18, 2024
sholderbach pushed a commit that referenced this pull request Oct 19, 2024
# Description

This PR updates `group-by` and `split-by` to allow other nushell Values
to be used, namely bools.

### Before
```nushell
❯ [false, false, true, false, true, false] | group-by | table -e
Error: nu::shell::cant_convert

  × Can't convert to string.
   ╭─[entry #1:1:2]
 1 │ [false, false, true, false, true, false] | group-by | table -e
   ·  ──┬──
   ·    ╰── can't convert bool to string
   ╰────
```
### After
```nushell
❯ [false, false, true, false, true, false] | group-by | table -e
╭───────┬───────────────╮
│       │ ╭───┬───────╮ │
│ false │ │ 0 │ false │ │
│       │ │ 1 │ false │ │
│       │ │ 2 │ false │ │
│       │ │ 3 │ false │ │
│       │ ╰───┴───────╯ │
│       │ ╭───┬──────╮  │
│ true  │ │ 0 │ true │  │
│       │ │ 1 │ true │  │
│       │ ╰───┴──────╯  │
╰───────┴───────────────╯
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
sholderbach pushed a commit that referenced this pull request Oct 20, 2024
# Description

This PR updates `group-by` and `split-by` to allow other nushell Values
to be used, namely bools.

### Before
```nushell
❯ [false, false, true, false, true, false] | group-by | table -e
Error: nu::shell::cant_convert

  × Can't convert to string.
   ╭─[entry #1:1:2]
 1 │ [false, false, true, false, true, false] | group-by | table -e
   ·  ──┬──
   ·    ╰── can't convert bool to string
   ╰────
```
### After
```nushell
❯ [false, false, true, false, true, false] | group-by | table -e
╭───────┬───────────────╮
│       │ ╭───┬───────╮ │
│ false │ │ 0 │ false │ │
│       │ │ 1 │ false │ │
│       │ │ 2 │ false │ │
│       │ │ 3 │ false │ │
│       │ ╰───┴───────╯ │
│       │ ╭───┬──────╮  │
│ true  │ │ 0 │ true │  │
│       │ │ 1 │ true │  │
│       │ ╰───┴──────╯  │
╰───────┴───────────────╯
```

# User-Facing Changes
<!-- List of all changes that impact the user experience here. This
helps us keep track of breaking changes. -->

# Tests + Formatting
<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu # or use an `env_change` hook to activate it
automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the
documentation](https://github.com/nushell/nushell.github.io) after the
PR is merged, if necessary. This will help us keep the docs up to date.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecated:pr-commands (deprecated: too vague) This PR changes our commands in some way

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants