Skip to content

sort-by and uniq-by handle missing values inconsistently #8667

@olivia-fl

Description

@olivia-fl

Describe the bug

sort-by and uniq-by both take a column name and perform some error checking that the column exists. Currently, the only check is that the first element of the list is a record with the specified column. When passed a heterogenous list (aka table with missing values), these commands have inconsistent behavior depending on whether or not the first row contains a missing value.

How to reproduce

[{a: 1} {}] | sort-by a # case 1
[{} {a: 1}] | sort-by b # case 2
[{a: 1} {}] | uniq-by a # case 3
[{} {a: 1}] | uniq-by b # case 4

results:

Case 1:

╭───┬────╮
│ # │ a  │
├───┼────┤
│ 0 │  1 │
│ 1 │ ❎ │
╰───┴────╯

Case 2:

Error: nu::shell::column_not_found

  × Cannot find column
   ╭─[entry #12:1:1]
 1 │ [{} {a: 1}] | sort-by b # case 2
   ·  ─┬           ───┬───
   ·   │              ╰── cannot find column 'b'
   ·   ╰── value originates here
   ╰────

Case 3:

╭───┬────╮
│ # │ a  │
├───┼────┤
│ 0 │  1 │
│ 1 │ ❎ │
╰───┴────╯

Case 4:

Error: nu::shell::column_not_found

  × Cannot find column
   ╭─[entry #15:1:1]
 1 │ [{} {a: 1}] | uniq-by b # case 4
   ·  ─┬           ───┬───
   ·   │              ╰── cannot find column 'b'
   ·   ╰── value originates here
   ╰────

Expected behavior

Case 1 and 2 should produce the same output (or both should produce an error). Same for cases 3 and 4.

I'm not sure which choice makes more sense, but would lean towards both producing output. I think it's a little weird that we're giving comparison semantics to ❎, which isn't a real Value, but these semantics already exist in the current implementation.

Screenshots

No response

Configuration

key value
version 0.77.0
branch
commit_hash
build_os macos-aarch64
build_target aarch64-apple-darwin
rust_version rustc 1.67.1 (d5a82bbd2 2023-02-07) (built from a source tarball)
cargo_version cargo 1.67.0
build_time 1980-01-01 00:00:00 +00:00
build_rust_channel release
features default, zip
installed_plugins

Additional context

The current comparison semantics for ❎ are also quite surprising.

[{a: 1} {} {a: 2} {a: null}] | sort-by a
╭───┬────╮
│ # │ a  │
├───┼────┤
│ 0 │  1 │
│ 1 │  2 │
│ 2 │ ❎ │
│ 3 │    │
╰───┴────╯

Looks like $x < ❎ < null for all $x != null. I would expect $x < ❎ for all $x or $x > ❎ for all $x.

It's possible that the change to fix this issue will eliminate these comparisons entirely, so I haven't created a separate issue for it yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    category:inconsistent-behaviorBehavior between different commands or types inconsistent/unexpectedsemanticsPlaces where we should define/clarify nushell's semantics

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions