-
Notifications
You must be signed in to change notification settings - Fork 2.1k
sort-by and uniq-by handle missing values inconsistently #8667
Description
Describe the bug
sort-by and uniq-by both take a column name and perform some error checking that the column exists. Currently, the only check is that the first element of the list is a record with the specified column. When passed a heterogenous list (aka table with missing values), these commands have inconsistent behavior depending on whether or not the first row contains a missing value.
How to reproduce
[{a: 1} {}] | sort-by a # case 1
[{} {a: 1}] | sort-by b # case 2
[{a: 1} {}] | uniq-by a # case 3
[{} {a: 1}] | uniq-by b # case 4results:
Case 1:
╭───┬────╮
│ # │ a │
├───┼────┤
│ 0 │ 1 │
│ 1 │ ❎ │
╰───┴────╯
Case 2:
Error: nu::shell::column_not_found
× Cannot find column
╭─[entry #12:1:1]
1 │ [{} {a: 1}] | sort-by b # case 2
· ─┬ ───┬───
· │ ╰── cannot find column 'b'
· ╰── value originates here
╰────
Case 3:
╭───┬────╮
│ # │ a │
├───┼────┤
│ 0 │ 1 │
│ 1 │ ❎ │
╰───┴────╯
Case 4:
Error: nu::shell::column_not_found
× Cannot find column
╭─[entry #15:1:1]
1 │ [{} {a: 1}] | uniq-by b # case 4
· ─┬ ───┬───
· │ ╰── cannot find column 'b'
· ╰── value originates here
╰────
Expected behavior
Case 1 and 2 should produce the same output (or both should produce an error). Same for cases 3 and 4.
I'm not sure which choice makes more sense, but would lean towards both producing output. I think it's a little weird that we're giving comparison semantics to ❎, which isn't a real Value, but these semantics already exist in the current implementation.
Screenshots
No response
Configuration
| key | value |
|---|---|
| version | 0.77.0 |
| branch | |
| commit_hash | |
| build_os | macos-aarch64 |
| build_target | aarch64-apple-darwin |
| rust_version | rustc 1.67.1 (d5a82bbd2 2023-02-07) (built from a source tarball) |
| cargo_version | cargo 1.67.0 |
| build_time | 1980-01-01 00:00:00 +00:00 |
| build_rust_channel | release |
| features | default, zip |
| installed_plugins |
Additional context
The current comparison semantics for ❎ are also quite surprising.
[{a: 1} {} {a: 2} {a: null}] | sort-by a╭───┬────╮
│ # │ a │
├───┼────┤
│ 0 │ 1 │
│ 1 │ 2 │
│ 2 │ ❎ │
│ 3 │ │
╰───┴────╯
Looks like $x < ❎ < null for all $x != null. I would expect $x < ❎ for all $x or $x > ❎ for all $x.
It's possible that the change to fix this issue will eliminate these comparisons entirely, so I haven't created a separate issue for it yet.