Skip to content

Disallow duplicated columns in table literals#10875

Merged
sholderbach merged 6 commits intonushell:mainfrom
sholderbach:deny-table-column-duplication
Nov 1, 2023
Merged

Disallow duplicated columns in table literals#10875
sholderbach merged 6 commits intonushell:mainfrom
sholderbach:deny-table-column-duplication

Conversation

@sholderbach
Copy link
Copy Markdown
Member

@sholderbach sholderbach commented Oct 29, 2023

Description

Pretty much all operations/commands in Nushell assume that the column names/keys in a record and thus also in a table (which consists of a list of records) are unique.
Access through a string-like cell path should refer to a single column or key/value pair and our output through table will only show the last mention of a repeated column name.

[[a a]; [1 2]]
╭─#─┬─a─╮02 │
╰───┴───╯

While the record parsing already either errors with the ShellError::ColumnDefinedTwice or silently overwrites the first occurence with the second occurence, the table literal syntax [[header columns]; [val1 val2]] currently still allowed the creation of tables (and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how we can efficiently perform operations or in the past lead to outright bugs (e.g. #8431 fixed by #8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records with non-unique column names will be plugged.

Parts

  • Fix SE::ColumnDefinedTwice error code
  • Remove previous tests permitting duplicate columns
  • Deny duplicate column in table literal eval
  • Deny duplicate column in const eval
  • Deny duplicate column in from nuon

User-Facing Changes

[[a a]; [1 2]] will now return an error:

Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry #2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously contained tables with repeated column names.

Tests + Formatting

Added tests for each of the different evaluation paths that materialize tables.

@sholderbach sholderbach added the notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes label Oct 29, 2023
sholderbach added a commit that referenced this pull request Oct 30, 2023
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see #10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of #10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
@sholderbach sholderbach merged commit 0569a9c into nushell:main Nov 1, 2023
@sholderbach sholderbach deleted the deny-table-column-duplication branch November 1, 2023 20:25
sholderbach added a commit that referenced this pull request Nov 8, 2023
# Description

This is pretty complementary/orthogonal to @IanManske 's changes to
`Value` cellpath accessors in:
- #10925
- to a lesser extent #10926

## Steps
- Use `R.remove` in `Value.remove_data_at_cell_path`
- Pretty sound after #10875 (tests mentioned in commit message have been
removed by that)
- Update `did_you_mean` helper to use iterator
- Change `Value::columns` to return iterator
  - This is not a place of honor
- Use `Record::get` in `Value::get_data_by_key`
# User-Facing Changes
None intentional, potential edge cases on duplicated columns could
change (considered undefined behavior)

# Tests + Formatting
(-)
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Dec 2, 2023

I'm wondering if this change has degraded performance a tiny bit. It seems like nushell was pretty consistently starting in 50ms-70ms with my setup and now it's in the 90ms-100ms. Any ideas if it could? This is hard to judge because the startup-stats below could easily be influenced by crap on my SSD or what's going on while I'm starting up, not to mention, it's not a huge sampling size. Just wondering your thoughts @sholderbach.
image

@sholderbach
Copy link
Copy Markdown
Member Author

If you don't have a bunch of large table literals in the [[a b]; [c d]] form, no this PR shouldn't affect startup.

There could be some impact of the Record related PRs happening around the same time so that may be worth bisecting. But this looks like an outlier with a general jump happening at the same time and a slow slowdown of the whole thing.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Dec 2, 2023

I don't have a bunch of large table literals so maybe it is an impact of the Record changes. I may look into that a little more. Thanks!

@sholderbach
Copy link
Copy Markdown
Member Author

The only effect imaginable would be on from nuon with a ton of individual wide tables.

hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see nushell#10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of nushell#10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
Pretty much all operations/commands in Nushell assume that the column
names/keys in a record and thus also in a table (which consists of a
list of records) are unique.
Access through a string-like cell path should refer to a single column
or key/value pair and our output through `table` will only show the last
mention of a repeated column name.

```nu
[[a a]; [1 2]]
╭─#─┬─a─╮
│ 0 │ 2 │
╰───┴───╯
```

While the record parsing already either errors with the
`ShellError::ColumnDefinedTwice` or silently overwrites the first
occurence with the second occurence, the table literal syntax `[[header
columns]; [val1 val2]]` currently still allowed the creation of tables
(and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how
we can efficiently perform operations or in the past lead to outright
bugs (e.g. nushell#8431 fixed by nushell#8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records
with non-unique column names will be plugged.

## Parts
- Fix `SE::ColumnDefinedTwice` error code
- Remove previous tests permitting duplicate columns
- Deny duplicate column in table literal eval
- Deny duplicate column in const eval
- Deny duplicate column in `from nuon`

# User-Facing Changes
`[[a a]; [1 2]]` will now return an error:

```
Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry nushell#2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────
```

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously
contained tables with repeated column names.

# Tests + Formatting
Added tests for each of the different evaluation paths that materialize
tables.
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description

This is pretty complementary/orthogonal to @IanManske 's changes to
`Value` cellpath accessors in:
- nushell#10925
- to a lesser extent nushell#10926

## Steps
- Use `R.remove` in `Value.remove_data_at_cell_path`
- Pretty sound after nushell#10875 (tests mentioned in commit message have been
removed by that)
- Update `did_you_mean` helper to use iterator
- Change `Value::columns` to return iterator
  - This is not a place of honor
- Use `Record::get` in `Value::get_data_by_key`
# User-Facing Changes
None intentional, potential edge cases on duplicated columns could
change (considered undefined behavior)

# Tests + Formatting
(-)
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see nushell#10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of nushell#10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
Pretty much all operations/commands in Nushell assume that the column
names/keys in a record and thus also in a table (which consists of a
list of records) are unique.
Access through a string-like cell path should refer to a single column
or key/value pair and our output through `table` will only show the last
mention of a repeated column name.

```nu
[[a a]; [1 2]]
╭─#─┬─a─╮
│ 0 │ 2 │
╰───┴───╯
```

While the record parsing already either errors with the
`ShellError::ColumnDefinedTwice` or silently overwrites the first
occurence with the second occurence, the table literal syntax `[[header
columns]; [val1 val2]]` currently still allowed the creation of tables
(and internally records with more than one entry with the same name.

This is not only confusing, but also breaks some assumptions around how
we can efficiently perform operations or in the past lead to outright
bugs (e.g. nushell#8431 fixed by nushell#8446).

This PR proposes to make this an error.
After this change another hole which allowed the construction of records
with non-unique column names will be plugged.

## Parts
- Fix `SE::ColumnDefinedTwice` error code
- Remove previous tests permitting duplicate columns
- Deny duplicate column in table literal eval
- Deny duplicate column in const eval
- Deny duplicate column in `from nuon`

# User-Facing Changes
`[[a a]; [1 2]]` will now return an error:

```
Error: nu::shell::column_defined_twice

  × Record field or table column used twice
   ╭─[entry nushell#2:1:1]
 1 │ [[a a]; [1 2]]
   ·   ┬ ┬
   ·   │ ╰── field redefined here
   ·   ╰── field first defined here
   ╰────
```

this may under rare circumstances block code from evaluating.

Furthermore this makes some NUON files invalid if they previously
contained tables with repeated column names.

# Tests + Formatting
Added tests for each of the different evaluation paths that materialize
tables.
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description

This is pretty complementary/orthogonal to @IanManske 's changes to
`Value` cellpath accessors in:
- nushell#10925
- to a lesser extent nushell#10926

## Steps
- Use `R.remove` in `Value.remove_data_at_cell_path`
- Pretty sound after nushell#10875 (tests mentioned in commit message have been
removed by that)
- Update `did_you_mean` helper to use iterator
- Change `Value::columns` to return iterator
  - This is not a place of honor
- Use `Record::get` in `Value::get_data_by_key`
# User-Facing Changes
None intentional, potential edge cases on duplicated columns could
change (considered undefined behavior)

# Tests + Formatting
(-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:breaking-changes This PR implies a change affecting users and has to be noted in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants