Add common map-like API to `nu_protocol::Record` by sholderbach · Pull Request #10841 · nushell/nushell

sholderbach · 2023-10-25T22:47:34Z

Description

Our Record looks like a map, quacks like a map, so let's treat it with the API for a map

Implement common methods found on e.g. std::collections::HashMap or the insertion-ordered indexmap.

This allows contributors to not have to worry about how to get to the relevant items and not mess up the assumptions of a Nushell record.

Record assumptions

cols and vals are of equal length
for all practical purposes, keys/columns should be unique

End goal

The end goal of the upcoming series of PR's is to allow us to make cols and vals private.
Then it would be possible to exchange the backing datastructure to best fit the expected workload.
This could be statically (by finding the best balance) or dynamically by using an enum of potential representations.

Parts

Add validating explicit part constructor Record::from_raw_cols_vals()
Add Record.columns() iterator
Add Record.values() iterator
Add consuming Record.into_values() iterator
Add Record.contains() helper
Add Record.insert() that respects existing keys
Add key-based .get()/.get_mut() to Record
Add Record.get_index() for index-based access
Implement Extend for Record naively
Use checked constructor in record! macro

User-Facing Changes

None directly

Developer facing changes

You don't have to roll your own record handling and can use a familiar API

Tests + Formatting

No explicit unit tests yet. Wouldn't be too tricky to validate core properties directly.
Will be exercised by the following PRs using the new methods/traits/iterators.

This PR is an @IanManske appreciation PR (Create Record type #10103).
Related to Convert more examples and tests to record! macro #10840

Building up the `Record` from its `Vec`s directly is dangerous if the parts don't have equal length. `Record::from_raw_cols_vals` will assert equal length. Using it may not be the best choice in the future if we choose to change the underlying representation. For now skip validating properties about key uniqueness

We have a few places where we read the `.cols` field. Let's turn them into a `Columns` iterator. End goal, make the internal representation of `Record` opaque.

Avoid exposing `record.vals` directly.

Compared to `Record.values()` gives you owned values. Primarily to support the `values` command.

The existing `Record.push()` doesn't take into account if a key already exists. The second insertion could become inaccessible and the data meaningless. For most purposes `.insert()` is the better choice. It returns `Some(Value)` with the old value if found. As we limit ourselves to `String` keys, we can use a `Cow<str>` to only clone the key if we need to.

These are the common accessors for maps. Note: as we can't guarantee yet that `cols.len() == vals.len()` the lookup in `vals` still needs a bounds check.

To enable some existing command implementations that perform indexing themselves expose a way to get values with an index. Short-circuits if (partially) out of bounds.

Warning: this `Extend` implementation is equivalent to `.push` and doesn't deduplicate incoming keys. In the future it should probably use `.insert` internally. If we want to provide an escape hatch for performance on the contiguous `Vec` backing-store we may want use a different API.

To be able to make the fields private we need to use the public checking `Record::from_raw_cols_vals()` constructor. This change will introduce a single length assert that can probably be elided during optimization.

WindSoilder · 2023-10-25T23:51:40Z

crates/nu-protocol/src/value/record.rs

+    }
+
+    pub fn contains(&self, col: impl AsRef<str>) -> bool {
+        self.cols.iter().any(|k| k == col.as_ref())


What about making a variable let col_ref = col.as_ref(), instead of calling col.as_ref() inside closure everytime?

let col_ref = col.as_ref(); self.cols.iter().any(|k| k == col_ref)

It can be a performance improvement if we call the function many times.

I assumed that .as_ref() is pretty transparent to the optimizer but we could lift that out. Then there might be some optimizer luck necessary so it can short circuit that if cols is empty but we shouldn't worry too much without profiling.

I think there is an aspect that makes me dislike AsRef slightly, so I might look into Borrow as a replacement.

crates/nu-protocol/src/value/record.rs

IanManske · 2023-10-26T00:05:45Z

Looks good! Glad to see more work is being on Record.

One suggestion that I have is regarding insert. We can avoid taking a Cow as input if col is changed to have the type: col: impl AsRef<str> + Into<String>. This will also eliminate any branches that check which Cow case col is. Most string-like types (str, String, Cow<str>, and Box<str>) should satisfy AsRef<str> + Into<String>, so this shouldn't be an issue. Thoughts?

@IanManske

Should lead to more optimized code h/t @IanManske

Some usages in the public API envisioned in conjunction with `.get_index` Also should simplify some internal implementations

Note: It appears that `impl AsRef<str>` has the downside of taking a directly directly by value if not prefixed with `&`. We may want to look into `Borrow` for those.

sholderbach · 2023-10-26T08:42:44Z

One suggestion that I have is regarding insert. We can avoid taking a Cow as input if col is changed to have the type: col: impl AsRef<str> + Into<String>. This will also eliminate any branches that check which Cow case col is. Most string-like types (str, String, Cow<str>, and Box<str>) should satisfy AsRef<str> + Into<String>, so this shouldn't be an issue. Thoughts?

Good call, done! Yeah that should monormophize to simpler code. (and maybe make it also more likely to inline)

amtoine · 2023-10-28T11:51:06Z

didn't go through the whole PR, but i support the overall idea of treating records like hash-maps 👍

# Description While we have now a few ways to add items or iterate over the collection, we don't have a way to cleanly remove items from `Record`. This PR fixes that: - Add `Record.remove()` to remove by key - makes the assumption that keys are unique, so can not be used universally, yet (see #10875 for an important example) - Add naive `Record.retain()` for inplace removal - This follows the two separate `retain`/`retain_mut` in the Rust std library types, compared to the value-mutating `retain` in `indexmap` - Add `Record.retain_mut()` for one-pass pruning Continuation of #10841 # User-Facing Changes None yet. # Tests + Formatting Doctests for the `retain`ing fun

# Description Consequences of #10841 This does not yet make the assumption that columns are always duplicated. Follow the existing logic here - Use saner record API in `nu-engine/src/eval.rs` - Use checked record construction in `nu-engine/src/scope.rs` - Use `values` iterator in `nu-engine/src/scope.rs` - Use `columns` iterator in `nu_engine::get_columns()` - Start using record API in `value/mod.rs` - Use `.insert` in `eval_const.rs` Record code - Record API for `eval_const.rs` table code # User-Facing Changes None # Tests + Formatting None

# Description Since #10841 the goal is to remove the implementation details of `Record` outside of core operations. To this end use Record iterators and map-like accessors in a bunch of places. In this PR I try to collect the boring cases where I don't expect any dramatic performance impacts or don't have doubts about the correctness afterwards - Use checked record construction in `nu_plugin_example` - Use `Record::into_iter` in `columns` - Use `Record` iterators in `headers` cmd - Use explicit record iterators in `split-by` - Use `Record::into_iter` in variable completions - Use `Record::values` iterator in `into sqlite` - Use `Record::iter_mut` for-loop in `default` - Change `nu_engine::nonexistent_column` to use iterator - Use `Record::columns` iter in `nu-cmd-base` - Use `Record::get_index` in `nu-command/network/http` - Use `Record.insert()` in `merge` - Refactor `move` to use encapsulated record API - Use `Record.insert()` in `explore` - Use proper `Record` API in `explore` - Remove defensiveness around record in `explore` - Use encapsulated record API in more `nu-command`s # User-Facing Changes None intentional # Tests + Formatting (-)

# Description > Our `Record` looks like a map, quacks like a map, so let's treat it with the API for a map Implement common methods found on e.g. `std::collections::HashMap` or the insertion-ordered [indexmap](https://docs.rs/indexmap). This allows contributors to not have to worry about how to get to the relevant items and not mess up the assumptions of a Nushell record. ## Record assumptions - `cols` and `vals` are of equal length - for all practical purposes, keys/columns should be unique ## End goal The end goal of the upcoming series of PR's is to allow us to make `cols` and `vals` private. Then it would be possible to exchange the backing datastructure to best fit the expected workload. This could be statically (by finding the best balance) or dynamically by using an `enum` of potential representations. ## Parts - Add validating explicit part constructor `Record::from_raw_cols_vals()` - Add `Record.columns()` iterator - Add `Record.values()` iterator - Add consuming `Record.into_values()` iterator - Add `Record.contains()` helper - Add `Record.insert()` that respects existing keys - Add key-based `.get()`/`.get_mut()` to `Record` - Add `Record.get_index()` for index-based access - Implement `Extend` for `Record` naively - Use checked constructor in `record!` macro - Add `Record.index_of()` to get index by key # User-Facing Changes None directly # Developer facing changes You don't have to roll your own record handling and can use a familiar API # Tests + Formatting No explicit unit tests yet. Wouldn't be too tricky to validate core properties directly. Will be exercised by the following PRs using the new methods/traits/iterators.

# Description While we have now a few ways to add items or iterate over the collection, we don't have a way to cleanly remove items from `Record`. This PR fixes that: - Add `Record.remove()` to remove by key - makes the assumption that keys are unique, so can not be used universally, yet (see nushell#10875 for an important example) - Add naive `Record.retain()` for inplace removal - This follows the two separate `retain`/`retain_mut` in the Rust std library types, compared to the value-mutating `retain` in `indexmap` - Add `Record.retain_mut()` for one-pass pruning Continuation of nushell#10841 # User-Facing Changes None yet. # Tests + Formatting Doctests for the `retain`ing fun

# Description Consequences of nushell#10841 This does not yet make the assumption that columns are always duplicated. Follow the existing logic here - Use saner record API in `nu-engine/src/eval.rs` - Use checked record construction in `nu-engine/src/scope.rs` - Use `values` iterator in `nu-engine/src/scope.rs` - Use `columns` iterator in `nu_engine::get_columns()` - Start using record API in `value/mod.rs` - Use `.insert` in `eval_const.rs` Record code - Record API for `eval_const.rs` table code # User-Facing Changes None # Tests + Formatting None

# Description Since nushell#10841 the goal is to remove the implementation details of `Record` outside of core operations. To this end use Record iterators and map-like accessors in a bunch of places. In this PR I try to collect the boring cases where I don't expect any dramatic performance impacts or don't have doubts about the correctness afterwards - Use checked record construction in `nu_plugin_example` - Use `Record::into_iter` in `columns` - Use `Record` iterators in `headers` cmd - Use explicit record iterators in `split-by` - Use `Record::into_iter` in variable completions - Use `Record::values` iterator in `into sqlite` - Use `Record::iter_mut` for-loop in `default` - Change `nu_engine::nonexistent_column` to use iterator - Use `Record::columns` iter in `nu-cmd-base` - Use `Record::get_index` in `nu-command/network/http` - Use `Record.insert()` in `merge` - Refactor `move` to use encapsulated record API - Use `Record.insert()` in `explore` - Use proper `Record` API in `explore` - Remove defensiveness around record in `explore` - Use encapsulated record API in more `nu-command`s # User-Facing Changes None intentional # Tests + Formatting (-)

# Description > Our `Record` looks like a map, quacks like a map, so let's treat it with the API for a map Implement common methods found on e.g. `std::collections::HashMap` or the insertion-ordered [indexmap](https://docs.rs/indexmap). This allows contributors to not have to worry about how to get to the relevant items and not mess up the assumptions of a Nushell record. ## Record assumptions - `cols` and `vals` are of equal length - for all practical purposes, keys/columns should be unique ## End goal The end goal of the upcoming series of PR's is to allow us to make `cols` and `vals` private. Then it would be possible to exchange the backing datastructure to best fit the expected workload. This could be statically (by finding the best balance) or dynamically by using an `enum` of potential representations. ## Parts - Add validating explicit part constructor `Record::from_raw_cols_vals()` - Add `Record.columns()` iterator - Add `Record.values()` iterator - Add consuming `Record.into_values()` iterator - Add `Record.contains()` helper - Add `Record.insert()` that respects existing keys - Add key-based `.get()`/`.get_mut()` to `Record` - Add `Record.get_index()` for index-based access - Implement `Extend` for `Record` naively - Use checked constructor in `record!` macro - Add `Record.index_of()` to get index by key # User-Facing Changes None directly # Developer facing changes You don't have to roll your own record handling and can use a familiar API # Tests + Formatting No explicit unit tests yet. Wouldn't be too tricky to validate core properties directly. Will be exercised by the following PRs using the new methods/traits/iterators.

# Description While we have now a few ways to add items or iterate over the collection, we don't have a way to cleanly remove items from `Record`. This PR fixes that: - Add `Record.remove()` to remove by key - makes the assumption that keys are unique, so can not be used universally, yet (see nushell#10875 for an important example) - Add naive `Record.retain()` for inplace removal - This follows the two separate `retain`/`retain_mut` in the Rust std library types, compared to the value-mutating `retain` in `indexmap` - Add `Record.retain_mut()` for one-pass pruning Continuation of nushell#10841 # User-Facing Changes None yet. # Tests + Formatting Doctests for the `retain`ing fun

# Description Consequences of nushell#10841 This does not yet make the assumption that columns are always duplicated. Follow the existing logic here - Use saner record API in `nu-engine/src/eval.rs` - Use checked record construction in `nu-engine/src/scope.rs` - Use `values` iterator in `nu-engine/src/scope.rs` - Use `columns` iterator in `nu_engine::get_columns()` - Start using record API in `value/mod.rs` - Use `.insert` in `eval_const.rs` Record code - Record API for `eval_const.rs` table code # User-Facing Changes None # Tests + Formatting None

# Description Since nushell#10841 the goal is to remove the implementation details of `Record` outside of core operations. To this end use Record iterators and map-like accessors in a bunch of places. In this PR I try to collect the boring cases where I don't expect any dramatic performance impacts or don't have doubts about the correctness afterwards - Use checked record construction in `nu_plugin_example` - Use `Record::into_iter` in `columns` - Use `Record` iterators in `headers` cmd - Use explicit record iterators in `split-by` - Use `Record::into_iter` in variable completions - Use `Record::values` iterator in `into sqlite` - Use `Record::iter_mut` for-loop in `default` - Change `nu_engine::nonexistent_column` to use iterator - Use `Record::columns` iter in `nu-cmd-base` - Use `Record::get_index` in `nu-command/network/http` - Use `Record.insert()` in `merge` - Refactor `move` to use encapsulated record API - Use `Record.insert()` in `explore` - Use proper `Record` API in `explore` - Remove defensiveness around record in `explore` - Use encapsulated record API in more `nu-command`s # User-Facing Changes None intentional # Tests + Formatting (-)

sholderbach added 10 commits October 26, 2023 00:20

Add Record.columns() iterator

d7b6c44

We have a few places where we read the `.cols` field. Let's turn them into a `Columns` iterator. End goal, make the internal representation of `Record` opaque.

Add Record.values() iterator

72afe48

Avoid exposing `record.vals` directly.

Add consuming Record.into_values() iterator

f990b81

Compared to `Record.values()` gives you owned values. Primarily to support the `values` command.

Add Record.contains() helper

b1092c8

Add key-based .get()/.get_mut() to Record

86f874f

These are the common accessors for maps. Note: as we can't guarantee yet that `cols.len() == vals.len()` the lookup in `vals` still needs a bounds check.

Add Record.get_index() for index-based access

95e0e7d

To enable some existing command implementations that perform indexing themselves expose a way to get values with an index. Short-circuits if (partially) out of bounds.

Use checked constructor in record! macro

313d7f9

To be able to make the fields private we need to use the public checking `Record::from_raw_cols_vals()` constructor. This change will introduce a single length assert that can probably be elided during optimization.

WindSoilder reviewed Oct 25, 2023

View reviewed changes

sholderbach added 3 commits October 26, 2023 10:20

Make .insert() generic over key instead of Cow

bc8cdf7

Should lead to more optimized code h/t @IanManske

Add Record.index_of() to get index by key

b21d571

Some usages in the public API envisioned in conjunction with `.get_index` Also should simplify some internal implementations

Use .index_of() internally

7251b3c

Note: It appears that `impl AsRef<str>` has the downside of taking a directly directly by value if not prefixed with `&`. We may want to look into `Borrow` for those.

sholderbach merged commit c87bac0 into nushell:main Oct 28, 2023

sholderbach deleted the record-convenience-api branch October 28, 2023 13:18

sholderbach mentioned this pull request Oct 29, 2023

Add Record::remove/retain/retain_mut #10876

Merged

sholderbach mentioned this pull request Nov 1, 2023

Use Record APIs in nu-protocol/nu-engine #10917

Merged

sholderbach mentioned this pull request Nov 2, 2023

Use Record's public API in a bunch of places #10927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add common map-like API to `nu_protocol::Record`#10841

Add common map-like API to `nu_protocol::Record`#10841
sholderbach merged 13 commits intonushell:mainfrom
sholderbach:record-convenience-api

sholderbach commented Oct 25, 2023

Uh oh!

WindSoilder Oct 25, 2023

Uh oh!

sholderbach Oct 26, 2023

Uh oh!

Uh oh!

Uh oh!

IanManske commented Oct 26, 2023

Uh oh!

sholderbach commented Oct 26, 2023

Uh oh!

amtoine commented Oct 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sholderbach commented Oct 25, 2023

Description

Record assumptions

End goal

Parts

User-Facing Changes

Developer facing changes

Tests + Formatting

Uh oh!

WindSoilder Oct 25, 2023

Choose a reason for hiding this comment

Uh oh!

sholderbach Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IanManske commented Oct 26, 2023

Uh oh!

sholderbach commented Oct 26, 2023

Uh oh!

amtoine commented Oct 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants