Skip to content

Add common map-like API to nu_protocol::Record#10841

Merged
sholderbach merged 13 commits intonushell:mainfrom
sholderbach:record-convenience-api
Oct 28, 2023
Merged

Add common map-like API to nu_protocol::Record#10841
sholderbach merged 13 commits intonushell:mainfrom
sholderbach:record-convenience-api

Conversation

@sholderbach
Copy link
Copy Markdown
Member

Description

Our Record looks like a map, quacks like a map, so let's treat it with the API for a map

Implement common methods found on e.g. std::collections::HashMap or the insertion-ordered indexmap.

This allows contributors to not have to worry about how to get to the relevant items and not mess up the assumptions of a Nushell record.

Record assumptions

  • cols and vals are of equal length
  • for all practical purposes, keys/columns should be unique

End goal

The end goal of the upcoming series of PR's is to allow us to make cols and vals private.
Then it would be possible to exchange the backing datastructure to best fit the expected workload.
This could be statically (by finding the best balance) or dynamically by using an enum of potential representations.

Parts

  • Add validating explicit part constructor Record::from_raw_cols_vals()
  • Add Record.columns() iterator
  • Add Record.values() iterator
  • Add consuming Record.into_values() iterator
  • Add Record.contains() helper
  • Add Record.insert() that respects existing keys
  • Add key-based .get()/.get_mut() to Record
  • Add Record.get_index() for index-based access
  • Implement Extend for Record naively
  • Use checked constructor in record! macro

User-Facing Changes

None directly

Developer facing changes

You don't have to roll your own record handling and can use a familiar API

Tests + Formatting

No explicit unit tests yet. Wouldn't be too tricky to validate core properties directly.
Will be exercised by the following PRs using the new methods/traits/iterators.


Building up the `Record` from its `Vec`s directly is dangerous if the
parts don't have equal length.
`Record::from_raw_cols_vals` will assert equal length.

Using it may not be the best choice in the future if we choose to change
the underlying representation.

For now skip validating properties about key uniqueness
We have a few places where we read the `.cols` field. Let's turn them
into a `Columns` iterator.

End goal, make the internal representation of `Record` opaque.
Avoid exposing `record.vals` directly.
Compared to `Record.values()` gives you owned values.

Primarily to support the `values` command.
The existing `Record.push()` doesn't take into account if a key already
exists. The second insertion could become inaccessible and the data
meaningless. For most purposes `.insert()` is the better choice.

It returns `Some(Value)` with the old value if found.

As we limit ourselves to `String` keys, we can use a `Cow<str>` to only
clone the key if we need to.
These are the common accessors for maps.

Note: as we can't guarantee yet that `cols.len() == vals.len()` the
lookup in `vals` still needs a bounds check.
To enable some existing command implementations that perform indexing
themselves expose a way to get values with an index.
Short-circuits if (partially) out of bounds.
Warning: this `Extend` implementation is equivalent to `.push` and
doesn't deduplicate incoming keys.
In the future it should probably use `.insert` internally.
If we want to provide an escape hatch for performance on the
contiguous `Vec` backing-store we may want use a different API.
To be able to make the fields private we need to use the public checking
`Record::from_raw_cols_vals()` constructor.

This change will introduce a single length assert that can probably be
elided during optimization.
}

pub fn contains(&self, col: impl AsRef<str>) -> bool {
self.cols.iter().any(|k| k == col.as_ref())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making a variable let col_ref = col.as_ref(), instead of calling col.as_ref() inside closure everytime?

let col_ref = col.as_ref();
self.cols.iter().any(|k| k == col_ref)

It can be a performance improvement if we call the function many times.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that .as_ref() is pretty transparent to the optimizer but we could lift that out. Then there might be some optimizer luck necessary so it can short circuit that if cols is empty but we shouldn't worry too much without profiling.

I think there is an aspect that makes me dislike AsRef slightly, so I might look into Borrow as a replacement.

@IanManske
Copy link
Copy Markdown
Member

Looks good! Glad to see more work is being on Record.

One suggestion that I have is regarding insert. We can avoid taking a Cow as input if col is changed to have the type: col: impl AsRef<str> + Into<String>. This will also eliminate any branches that check which Cow case col is. Most string-like types (str, String, Cow<str>, and Box<str>) should satisfy AsRef<str> + Into<String>, so this shouldn't be an issue. Thoughts?

Should lead to more optimized code

h/t @IanManske
Some usages in the public API envisioned in conjunction with
`.get_index`

Also should simplify some internal implementations
Note: It appears that `impl AsRef<str>` has the downside of taking a
directly directly by value if not prefixed with `&`. We may want to look
into `Borrow` for those.
@sholderbach
Copy link
Copy Markdown
Member Author

One suggestion that I have is regarding insert. We can avoid taking a Cow as input if col is changed to have the type: col: impl AsRef<str> + Into<String>. This will also eliminate any branches that check which Cow case col is. Most string-like types (str, String, Cow<str>, and Box<str>) should satisfy AsRef<str> + Into<String>, so this shouldn't be an issue. Thoughts?

Good call, done! Yeah that should monormophize to simpler code. (and maybe make it also more likely to inline)

@amtoine
Copy link
Copy Markdown
Member

amtoine commented Oct 28, 2023

didn't go through the whole PR, but i support the overall idea of treating records like hash-maps 👍

@sholderbach sholderbach merged commit c87bac0 into nushell:main Oct 28, 2023
@sholderbach sholderbach deleted the record-convenience-api branch October 28, 2023 13:18
sholderbach added a commit that referenced this pull request Oct 30, 2023
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see #10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of #10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
sholderbach added a commit that referenced this pull request Nov 1, 2023
# Description
Consequences of #10841

This does not yet make the assumption that columns are always
duplicated. Follow the existing logic here

- Use saner record API in `nu-engine/src/eval.rs`
- Use checked record construction in `nu-engine/src/scope.rs`
- Use `values` iterator in `nu-engine/src/scope.rs`
- Use `columns` iterator in `nu_engine::get_columns()`
- Start using record API in `value/mod.rs`
- Use `.insert` in `eval_const.rs` Record code
- Record API for `eval_const.rs` table code

# User-Facing Changes
None

# Tests + Formatting
None
sholderbach added a commit that referenced this pull request Nov 8, 2023
# Description
Since #10841 the goal is to remove the implementation details of
`Record` outside of core operations.

To this end use Record iterators and map-like accessors in a bunch of
places. In this PR I try to collect the boring cases where I don't
expect any dramatic performance impacts or don't have doubts about the
correctness afterwards

- Use checked record construction in `nu_plugin_example`
- Use `Record::into_iter` in `columns`
- Use `Record` iterators in `headers` cmd
- Use explicit record iterators in `split-by`
- Use `Record::into_iter` in variable completions
- Use `Record::values` iterator in `into sqlite`
- Use `Record::iter_mut` for-loop in `default`
- Change `nu_engine::nonexistent_column` to use iterator
- Use `Record::columns` iter in `nu-cmd-base`
- Use `Record::get_index` in `nu-command/network/http`
- Use `Record.insert()` in `merge`
- Refactor `move` to use encapsulated record API
- Use `Record.insert()` in `explore`
- Use proper `Record` API in `explore`
- Remove defensiveness around record in `explore`
- Use encapsulated record API in more `nu-command`s

# User-Facing Changes
None intentional

# Tests + Formatting
(-)
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description

> Our `Record` looks like a map, quacks like a map, so let's treat it
with the API for a map

Implement common methods found on e.g. `std::collections::HashMap` or
the insertion-ordered [indexmap](https://docs.rs/indexmap).

This allows contributors to not have to worry about how to get to the
relevant items and not mess up the assumptions of a Nushell record.

## Record assumptions
- `cols` and `vals` are of equal length
- for all practical purposes, keys/columns should be unique

## End goal

The end goal of the upcoming series of PR's is to allow us to make
`cols` and `vals` private.
Then it would be possible to exchange the backing datastructure to best
fit the expected workload.
This could be statically (by finding the best balance) or dynamically by
using an `enum` of potential representations.

## Parts
- Add validating explicit part constructor
`Record::from_raw_cols_vals()`
- Add `Record.columns()` iterator
- Add `Record.values()` iterator
- Add consuming `Record.into_values()` iterator
- Add `Record.contains()` helper
- Add `Record.insert()` that respects existing keys
- Add key-based `.get()`/`.get_mut()` to `Record`
- Add `Record.get_index()` for index-based access
- Implement `Extend` for `Record` naively
- Use checked constructor in `record!` macro
- Add `Record.index_of()` to get index by key

# User-Facing Changes
None directly

# Developer facing changes
You don't have to roll your own record handling and can use a familiar
API

# Tests + Formatting
No explicit unit tests yet. Wouldn't be too tricky to validate core
properties directly.
Will be exercised by the following PRs using the new
methods/traits/iterators.
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see nushell#10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of nushell#10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
Consequences of nushell#10841

This does not yet make the assumption that columns are always
duplicated. Follow the existing logic here

- Use saner record API in `nu-engine/src/eval.rs`
- Use checked record construction in `nu-engine/src/scope.rs`
- Use `values` iterator in `nu-engine/src/scope.rs`
- Use `columns` iterator in `nu_engine::get_columns()`
- Start using record API in `value/mod.rs`
- Use `.insert` in `eval_const.rs` Record code
- Record API for `eval_const.rs` table code

# User-Facing Changes
None

# Tests + Formatting
None
hardfau1t pushed a commit to hardfau1t/nushell that referenced this pull request Dec 14, 2023
# Description
Since nushell#10841 the goal is to remove the implementation details of
`Record` outside of core operations.

To this end use Record iterators and map-like accessors in a bunch of
places. In this PR I try to collect the boring cases where I don't
expect any dramatic performance impacts or don't have doubts about the
correctness afterwards

- Use checked record construction in `nu_plugin_example`
- Use `Record::into_iter` in `columns`
- Use `Record` iterators in `headers` cmd
- Use explicit record iterators in `split-by`
- Use `Record::into_iter` in variable completions
- Use `Record::values` iterator in `into sqlite`
- Use `Record::iter_mut` for-loop in `default`
- Change `nu_engine::nonexistent_column` to use iterator
- Use `Record::columns` iter in `nu-cmd-base`
- Use `Record::get_index` in `nu-command/network/http`
- Use `Record.insert()` in `merge`
- Refactor `move` to use encapsulated record API
- Use `Record.insert()` in `explore`
- Use proper `Record` API in `explore`
- Remove defensiveness around record in `explore`
- Use encapsulated record API in more `nu-command`s

# User-Facing Changes
None intentional

# Tests + Formatting
(-)
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description

> Our `Record` looks like a map, quacks like a map, so let's treat it
with the API for a map

Implement common methods found on e.g. `std::collections::HashMap` or
the insertion-ordered [indexmap](https://docs.rs/indexmap).

This allows contributors to not have to worry about how to get to the
relevant items and not mess up the assumptions of a Nushell record.

## Record assumptions
- `cols` and `vals` are of equal length
- for all practical purposes, keys/columns should be unique

## End goal

The end goal of the upcoming series of PR's is to allow us to make
`cols` and `vals` private.
Then it would be possible to exchange the backing datastructure to best
fit the expected workload.
This could be statically (by finding the best balance) or dynamically by
using an `enum` of potential representations.

## Parts
- Add validating explicit part constructor
`Record::from_raw_cols_vals()`
- Add `Record.columns()` iterator
- Add `Record.values()` iterator
- Add consuming `Record.into_values()` iterator
- Add `Record.contains()` helper
- Add `Record.insert()` that respects existing keys
- Add key-based `.get()`/`.get_mut()` to `Record`
- Add `Record.get_index()` for index-based access
- Implement `Extend` for `Record` naively
- Use checked constructor in `record!` macro
- Add `Record.index_of()` to get index by key

# User-Facing Changes
None directly

# Developer facing changes
You don't have to roll your own record handling and can use a familiar
API

# Tests + Formatting
No explicit unit tests yet. Wouldn't be too tricky to validate core
properties directly.
Will be exercised by the following PRs using the new
methods/traits/iterators.
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
While we have now a few ways to add items or iterate over the
collection, we don't have a way to cleanly remove items from `Record`.

This PR fixes that:

- Add `Record.remove()` to remove by key
- makes the assumption that keys are unique, so can not be used
universally, yet (see nushell#10875 for an important example)
- Add naive `Record.retain()` for inplace removal
- This follows the two separate `retain`/`retain_mut` in the Rust std
library types, compared to the value-mutating `retain` in `indexmap`
- Add `Record.retain_mut()` for one-pass pruning

Continuation of nushell#10841 

# User-Facing Changes
None yet.

# Tests + Formatting
Doctests for the `retain`ing fun
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
Consequences of nushell#10841

This does not yet make the assumption that columns are always
duplicated. Follow the existing logic here

- Use saner record API in `nu-engine/src/eval.rs`
- Use checked record construction in `nu-engine/src/scope.rs`
- Use `values` iterator in `nu-engine/src/scope.rs`
- Use `columns` iterator in `nu_engine::get_columns()`
- Start using record API in `value/mod.rs`
- Use `.insert` in `eval_const.rs` Record code
- Record API for `eval_const.rs` table code

# User-Facing Changes
None

# Tests + Formatting
None
dmatos2012 pushed a commit to dmatos2012/nushell that referenced this pull request Feb 20, 2024
# Description
Since nushell#10841 the goal is to remove the implementation details of
`Record` outside of core operations.

To this end use Record iterators and map-like accessors in a bunch of
places. In this PR I try to collect the boring cases where I don't
expect any dramatic performance impacts or don't have doubts about the
correctness afterwards

- Use checked record construction in `nu_plugin_example`
- Use `Record::into_iter` in `columns`
- Use `Record` iterators in `headers` cmd
- Use explicit record iterators in `split-by`
- Use `Record::into_iter` in variable completions
- Use `Record::values` iterator in `into sqlite`
- Use `Record::iter_mut` for-loop in `default`
- Change `nu_engine::nonexistent_column` to use iterator
- Use `Record::columns` iter in `nu-cmd-base`
- Use `Record::get_index` in `nu-command/network/http`
- Use `Record.insert()` in `merge`
- Refactor `move` to use encapsulated record API
- Use `Record.insert()` in `explore`
- Use proper `Record` API in `explore`
- Remove defensiveness around record in `explore`
- Use encapsulated record API in more `nu-command`s

# User-Facing Changes
None intentional

# Tests + Formatting
(-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants