allow group-by and split-by to work with other values#14086
allow group-by and split-by to work with other values#14086fdncred merged 3 commits intonushell:mainfrom
group-by and split-by to work with other values#14086Conversation
| let head = call.head; | ||
| let grouper: Option<Value> = call.opt(engine_state, stack, 0)?; | ||
| let to_table = call.has_flag(engine_state, stack, "to-table")?; | ||
| let config = engine_state.get_config(); |
There was a problem hiding this comment.
That the group-by results can change depending on the user config sounds like bad design.
If you have a string around that matches the one particular potential formatting based on the config, this could also change the group assignment, but at the very least the columns of the resulting record if a value that has configurable formatting rules.
In general we run into two limitations of the current state of the type system/internal representation.
One we don't easily change even from a user perspective is that our records have string keys and there is no equivalent Value -> Value map type we could use not requiring a string representation and having problems when multiple values collide in the string space.
The second one is also gnarly is defining a Hash/Eq implementation pair for Value so operations like grouping or uniq could happen on the Value level. There the challenge is partially driven by floats and choices like should equivalent floats and ints hash the same if they are numerically equal. Anything there will be a compromise.
There was a problem hiding this comment.
In the Discord discussion about this topic, I was complaining about the many functions we have to convert from value to string and not knowing when each one is appropriate when all I want is a string. @IanManske suggested using to_abbreviated_string() instead of adding more match arms to coerce_str/string, which is where the problem is right now with grouping by bools without this PR. The config is used because we allow datetime and filesize formats to be changed via config.
I also looked at just using a Value for grouping but quickly stumbled upon the Hash problem you mentioned above.
The other option is to have yet another value to string function that just uses a consistent format for datetime and filesize regardless of config, but I'm sure people would complain. I'd be ok with that as long as we get rid of some of the other value to string functions.
There was a problem hiding this comment.
In the current signature for group-by we will have to accept some level of weird. I think we should clearly state what weirdness we are OK with and the users have to be aware of.
This could mean:
- some types are not supported and you need to prepare the values specifically yourself based on the semantics the user wants.
trueand"true"will cluster into the same bucket or alternatively cause an error.- User config changes stuff for the formattable values or there is a canonical config with which things are string keyed. Then if you as a user want to retrieve based on one e.g.
datetimevalue you need to convert that value using the same universal default formatting to a string to retrieve the group-by bucket. - (making sure that there is nothing in the string conversion that is dependent on factors outside the (static) config. i.e. relative time info on showing datetime, localization env vars.)
There was a problem hiding this comment.
Ya, I agree. This is less about coding and more about deciding what we want for nushell and then putting it in the extra_description()
There was a problem hiding this comment.
I added some extra_description text. Feel free to tweak it.
There was a problem hiding this comment.
Thanks! The text is great in my view. A lot more of our commands should have the semantics written on the packaging.
I am still not a fan of the config dependence on the other hand maybe we should make it available and explicitly elicit user feedback on that. I don't have a clear idea what the behavior should be for all the types, and maybe we gain more by collecting several specific complaints and after we get the feedback can make one change to a more static solution. (having python hash() flashbacks right now :D )
|
Next time please mention the expanded semantics in the commit message/PR description, the bool mentioned is kind of a red herrring with it defining its own semantics for all stringable values. As it wasn't implemented before I disagree on the labelling as bug fix. this is new behavior. |
# Description This PR updates `group-by` and `split-by` to allow other nushell Values to be used, namely bools. ### Before ```nushell ❯ [false, false, true, false, true, false] | group-by | table -e Error: nu::shell::cant_convert × Can't convert to string. ╭─[entry #1:1:2] 1 │ [false, false, true, false, true, false] | group-by | table -e · ──┬── · ╰── can't convert bool to string ╰──── ``` ### After ```nushell ❯ [false, false, true, false, true, false] | group-by | table -e ╭───────┬───────────────╮ │ │ ╭───┬───────╮ │ │ false │ │ 0 │ false │ │ │ │ │ 1 │ false │ │ │ │ │ 2 │ false │ │ │ │ │ 3 │ false │ │ │ │ ╰───┴───────╯ │ │ │ ╭───┬──────╮ │ │ true │ │ 0 │ true │ │ │ │ │ 1 │ true │ │ │ │ ╰───┴──────╯ │ ╰───────┴───────────────╯ ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
# Description This PR updates `group-by` and `split-by` to allow other nushell Values to be used, namely bools. ### Before ```nushell ❯ [false, false, true, false, true, false] | group-by | table -e Error: nu::shell::cant_convert × Can't convert to string. ╭─[entry #1:1:2] 1 │ [false, false, true, false, true, false] | group-by | table -e · ──┬── · ╰── can't convert bool to string ╰──── ``` ### After ```nushell ❯ [false, false, true, false, true, false] | group-by | table -e ╭───────┬───────────────╮ │ │ ╭───┬───────╮ │ │ false │ │ 0 │ false │ │ │ │ │ 1 │ false │ │ │ │ │ 2 │ false │ │ │ │ │ 3 │ false │ │ │ │ ╰───┴───────╯ │ │ │ ╭───┬──────╮ │ │ true │ │ 0 │ true │ │ │ │ │ 1 │ true │ │ │ │ ╰───┴──────╯ │ ╰───────┴───────────────╯ ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->
Description
This PR updates
group-byandsplit-byto allow other nushell Values to be used, namely bools.the group-by command makes some assumptions:
- if the input data is not a string, the grouper will convert the key to string but the values will remain in their original format. e.g. with bools, "true" and true would be in the same group (see example).
- datetime is formatted based on your configuration setting. use
format dateto change the format.- filesize is formatted based on your configuration setting. use
format filesizeto change the format.- some nushell values are not supported, such as closures.
Before
After
User-Facing Changes
Tests + Formatting
After Submitting