Skip to content

Create Record type to reduce code duplication#9648

Closed
IanManske wants to merge 31 commits intonushell:mainfrom
IanManske:small-value
Closed

Create Record type to reduce code duplication#9648
IanManske wants to merge 31 commits intonushell:mainfrom
IanManske:small-value

Conversation

@IanManske
Copy link
Copy Markdown
Member

Description

This PR creates a new Record type to reduce duplicate code and possibly bugs as well. While the main purpose of this PR is to make the code cleaner and more robust, this change also makes a new optimization possible. That is, if we box both the Value::Record and Value::Closure cases, it will decrease the byte size of Value from 80 to 48. This seems to result in net performance gain, but this optimization is easy to change/undo if necessary.

Code

Since there is now a separate Record type, lots of duplicated code was able to be removed. E.g., no more cols.iter().zip(vals)! Similarly, an iterator can be collect-ed into a Record and vice versa, which helps with conversions to and from (hash)maps, etc.

Record has a push(col, val) function to help to insure that the number of columns is equal to the number of values. I caught one potential bug in the ls command thanks to this. Finally, I added a bonus record! macro that should help simplify record creation. It supports { ident => expr }, { expr => expr } , and { ident } forms:

record! {
  some_int => Value::int(1, span),
  some_str => Value::string("something", span),
}

record! {
  "some col" => Value::int(1, span),
  "other-col" => Value::string("other", span,
}

let field_name1 = Value::string("1", span);
let field_name2 = Value::string("2", span);
record! { field_name1, field_name2 }

Since record!{} calls the vec![] macro which knows the length of the Vec in advance, then the Vecs in the Record will be created without reallocating.

Performance

Nushell command code tends to be somewhat liberal with its usage of clone(). Since Value is now only 60% of its original size, sections of code whose running time is mostly clones can, in the best case, take only 60% of the time they did before. For example, take this slightly unrealistic benchmark based off #5236:

timeit { 0..100000 | each {|i| { foo: $i }} }

This now takes 68ms on my hardware, whereas it took 100ms before. Of course, the downside is that each Record incurs an extra memory allocation and each access to a Record now has an addition level of indirection. But this extra time and memory cost seems to be negliable compared to the other gains.

User-Facing Changes

No big user-facing changes besides maybe a little better performance.

⚠️ Of course, this PR changes the Value type and other public functions which is a breaking change for any dependents.

@IanManske
Copy link
Copy Markdown
Member Author

IanManske commented Jul 10, 2023

I am a naughty contributor and did not talk with any core team members beforehand. (I will in the future though!) So, it's fine by me if nothing substantial comes from this PR -- I had fun writing it anyways!

But yeah, this PR is pretty large, and I've broken it in into commits that can be viewed mostly in isolation.
The most relevant commit is e103a0b, which is where the Record type was added.
Most changes are simple conversions following a few rules:

Rules

(click to expand)
  • cols.iter().zip(vals) => record or record.iter() or val or val.iter().
    Similar transformation for into_iter().
  • from
    Value::Record { cols, vals, .. } => { f1(cols); f2(vals) }
    to
    Value::Record { val, .. } => { f1(val.cols); f2(val.vals); }
  • from
    let mut cols = vec![];
    let mut vals = vec![];
    
    cols.push("key");
    vals.push(value);
    //...
    
    Value::Record { cols, vals, span }
    to
    Value::record(
        record! {
            key => value,
            //...
        },
        span,
    )
  • from
    let mut cols = vec![];
    let mut vals = vec![];
    
    if let Some(val) = value {
        cols.push("key");
        vals.push(val);
    }
    
    //...
    
    Value::Record { cols, vals, span }
    to
    let mut record = Record::new();
    
    if let Some(val) = value {
        record.push("key", val);
    }
    
    //...
    
    Value::record(record, span)
  • from
    let mut cols = vec![];
    let mut vals = vec![];
    
    for (key, val) in some_iter {
      cols.push(key);
      vals.push(val);
    }
    
    Value::Record { cols, vals, span }
    to
    Value::record_from_iter(some_iter, span) // or some_iter.map().collect(), etc., if necessary
  • &Span => Span since Span is Copy

I also understand performance isn't super high on the agenda right now (e.g., #9464), so I can unbox the Record and Closure cases if necessary (and revisit at a later date). Regardless, here are the results for some preliminary benchmarks.
The old (v0.82.0) and new (this PR) release builds of nushell were run with the default config and with the CWD at the repository root. Then, test data was loaded:

let data = (ls crates/**) # 195 rows
let names = ($data | get name)

Next, each benchmark pipeline was run like so:

timeit { for x in 0..1000 { $pipeline } }

Results (old and new times are in ms):

Pipeline old new % change
$data | get size 174 149 -14%
$data | get name 207 180 -13%
$data | get name size 438 384 -12%
$data | reject size 88 91 +3%
$data | reject name 86 91 +6%
$data | reject name size 93 96 +3%
$data | each { |file| { name: $file.name } } 2216 2309 +4%
$data | each { |file| { name: $file.name, size: $file.size }} 2314 2281 -1%
$data | each { |file| { name: $file.name, size: $file.size, type: $file.type }} 2494 2415 -3%
$names | path parse 162 145 -10%
scope commands 5059 2981 -41%

get seems to be measuring the reduced time for copying memory (this command is clone heavy). Note that cloning Records can be slightly slower than before if the Record has zero or only a few columns.

This can be seen in the first each benchmark, which seems to be measuring the cost of the additional Box allocation. Note the decrease in % change as the number of fields in the record increases, since reading/writing non-record Values is faster.

reject seems to be measuring the cost of the additional pointer indirection (it simply accesses memory and mutates the Value in place).

scope commands I think is measuring the combination of performance increase from smaller Values as well as the benefit of using the record! macro (no Vec reallocations).

The most degenerate case I can think of is for nested records where each record has only one field with another record value (almost like a linked list). But overall, this looks like a net perf win. Also keep in mind that this PR should, at the very least, reduce the memory usage of nushell. Running only let data = (scope commands) shows 53MB of memory used in the old build and 43MB in the new build.

Of course, let me know what need changing and thanks for reading all of this!

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jul 10, 2023

I'm trying to think of the right words to say here. Wow comes to mind. Thanks for this. It'll take us a while to digest I'm sure, but clearly a lot of effort went into this PR. So, thanks for spending time on nushell. Stay tuned. :)

@sholderbach
Copy link
Copy Markdown
Member

sholderbach commented Jul 11, 2023

Thank you very much for pushing forward with this! This takes tremendous dedication to see through.
I absolutely love that you encapsulate most of the behavior in the Record type. This will certainly make future improvements or refactorings a more reasonable undertaking.

Let me summarize the observations of the @nushell/core-team chat, before we can discuss what steps we take for the detailed review or revisions we might want to do before landing it:

  • Having the sensible interfaces like .push() or impl IntoIterator and impl FromIterator is a godsend for future readability.
  • Bundling in the Boxing of Closure might make the overall review a bit harder and obscures individual performance oriented choices for Record. But the general intuition is that this is probably a good move on its own to improve the size of Value.
  • The one thing where we all had some push back is around the macro record![]:
    • General sentiment: more magic makes the code harder to read, rustfmt, refactor etc.
    • Allowing both $($key:literal => $val:expr),* and $($key:ident => $val:expr),* is a crime ;) The former suggests it would be an expression but the latter form stringify!es when you try to pass a variable as a key. Saving two " for convenience is certainly not worth it. Either we only allow the literal form or have it as $($key:expr => $val:expr),*. This doesn't seem to be that needed based on your existing code and might allow some extra shenanigans, but probably matches the expectations from "sane" Rust.
    • The $($varname:ident),* form is probably also a bit controversial based on personal taste. I can see the beauty of how compact you got the code in some places with that. Others might see it as a bit too magical leaving a very expression oriented design for "scope-magic". Maybe a compromise could be to have a separate macro with this written on the tin.
  • Lastly on performance:
    • Improving the performance by removing explicit bottlenecks and studying existing tradeoffs is absolutely on the agenda. The pushback is mostly on "magic wand" solutions while we haven't profiled out all the worst offenders or have comprehensive benchmarks to observe the tradeoffs.
    • To this end, we absolutely love your effort to have a broader list of benchmarks. We should really commit those to a benchmark suite. (The ones in the scripts repo are only covering a subset with a deep vertical but not much diversity. Having more data processing focussed ones would be great. As an alternative place we might consider the criterion benches/ for which @amtoine has tried to setup up some long term tracking)
    • I am personally not yet fully convinced that the Record should be boxed. They are absolutely core to the current data model with tables being a list of records. So the "occupancy rate" of a Value being filed with a large Value::Record is probably relatively high. So the indirection won't save that much on total memory (if Value would be smaller than the separate Record allocation). And going with two indirections for a hot datatype will certainly introduce a tradeoff. I think this was more clear cut for the Value::Error where having to go through the indirection is much more rare in typical use (Box ShellError in Value::Error #8375). One alternative way to shrink Value::Record could be to change from two Vecs to a single Vec<(String, Value)>. But that is just me guessing and we should resolve that through benchmarks. Having things encapsulated in Record should make trying out things significantly easier after this PR.
    • P.S.: Nice to see you polishing in some places by using the impl Copy for Span. Given available registers on the architecture or pressure from the signature, this should be a nice improvement.

sholderbach added a commit to sholderbach/nushell that referenced this pull request Jul 11, 2023
- A new one is the removal of unnecessary `#` in raw strings without `"`
inside.
  - https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes
- The automatically applied removal of `.into_iter()` touched several
places where nushell#9648 will change to the use of the record API.
If necessary I can remove them @IanManske to avoid churn with this PR.
- Manually applied `.try_fold` in two places
- Removed a dead `if`
@IanManske
Copy link
Copy Markdown
Member Author

@sholderbach (and core team), thanks for the response, glad to see the interest and appreciation for this PR!

To comment and generally agree with the observations mentioned:

  • I originally had only the { ident => expr } form, but then added { expr => expr } without thinking in order to handle keys that were not valid identifiers.
    • But I agree that this is probably not a good idea! { literal => expr } or { expr => expr } is indeed the way to go. indexmap!{} uses { expr => expr }, but I think either form should suit our needs. I'll remove { ident => expr } and keep { expr => expr } for now, but we can easily change this to { literal => expr } later.

    • This form of the macro does hinder rustfmt if the expression is inline. E.g.,

      record! {
        "key" => { some really large expression },
      }

      I intended this to be used for very simple expressions (i.e. { "key" => Value::case(value, span), }). That is, value is either declared somewhere above or it is also a very simple expression like x.field. Although, I might not have followed this rule in some places, which may say something about including this macro.

    • Then again, this is the one point I will disagree with, I think the macro is much more readable. The macro allows all the record column and values to be easily been seen in a central location. I.e., it's much easier to see the shape of the record from a quick glance. Also, I think the declarative { key => value } syntax makes it much easier to see each key and its corresponding value. Compare this to two separate Vecs in which the values are raw expressions and/or the values are far apart from their corresponding key (keybindings_listen.rs):

      // imagine if there were more columns...
      Value::Record {
          cols: vec![
              "char".into(),
              "code".into(),
              "modifier".into(),
              "flags".into(),
              "kind".into(),
              "state".into(),
          ],
          vals: vec![
              Value::string(format!("{c}"), Span::unknown()),
              Value::string(format!("{:#08x}", u32::from(c)), Span::unknown()),
              Value::string(format!("{modifiers:?}"), Span::unknown()),
              Value::string(format!("{modifiers:#08b}"), Span::unknown()),
              Value::string(format!("{kind:?}"), Span::unknown()),
              Value::string(format!("{state:?}"), Span::unknown()),
          ],
          span: Span::unknown(),
      };
      Compare this to (click to expand)
      record! {
        "char" => Value::string(format!("{c}"), Span::unknown()),
        "code" => Value::string(format!("{:#08x}", u32::from(c)), Span::unknown()),
        "modifier" => Value::string(format!("{modifiers:?}"), Span::unknown()),
        "flags" => Value::string(format!("{modifiers:#08b}"), Span::unknown()),
        "kind" => Value::string(format!("{kind:?}"), Span::unknown()),
        "state" => Value::string(format!("{state:?}"), Span::unknown()),
      };
    • Additionally, the macro should? make it harder to make mistakes when changing record construction. E.g., inserting a new column currently requires two edits which better be in the same index of each Vec! For the macro, simply inserting a new line will suffice. Alternatively, I guess a sequence of record.push() will also work, but that could reallocate the Vecs unless Record::with_capacity is used. However, the number of columns can change over time and the capacity would need to be updated. So, similar to how we don't use a bunch of vec.push()s, but rather the vec![] macro, I think we should be using the record!{} macro where possible instead of a bunch of record.push()s.

    • It should be noted that we were already using the indexmap!{} macro in certain places anyways and this isn't much different.

  • Regarding the { ident } form, I'm sort of indifferent to this (I just added it in case). But I realized this form helps if ident is already a Value and does not need to be wrapped like Value::case(ident, span). This way, redundant record!{ "key" => key } can be avoided. Of course, this simply saves some typing, so there's definitely a good argument for removing this form all together. I'd have to double check and see how applicable or commonly used it is/can be.
  • For performance, boxing only Closure and not Record decreases the size to 72 bytes, which is probably not worth it. So, I think the two should be boxed together or not at all. But you are right that this obscures the performance from only boxing Record. I assumed some things were blocks when they actually might be closures, which may be messing with the benchmarks above. Thankfully, the Box changes are easy to undo, so I'll experiment more by boxing none and then each in isolation and report back. I'll also check out the other benchmarks you mentioned. If nothing conclusive can be shown, I guess we should just unbox both? The change easy to undo and redo (I think I got rid of all raw constructions of Value::Record {} and Value::Closure {}), so we can revisit in the future when we have more benchmarks.
    Aside:

    Another alternative I considered was to use thin-vec which stores the length and capacity in the Vec's data buffer, so the size of ThinVec is just a pointer. But this would add another dependency and might be getting into the weeds too much.

Ok, enough writing XD

sholderbach added a commit that referenced this pull request Jul 11, 2023
# Description
- A new one is the removal of unnecessary `#` in raw strings without `"`
inside.
-
https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes
- The automatically applied removal of `.into_iter()` touched several
places where #9648 will change to the use of the record API. If
necessary I can remove them @IanManske to avoid churn with this PR.
- Manually applied `.try_fold` in two places
- Removed a dead `if`
- Manual: Combat rightward-drift with early return
@IanManske
Copy link
Copy Markdown
Member Author

IanManske commented Jul 13, 2023

I've done some additional benchmarks boxing only the closure case, only the record case, and boxing neither. All branches are present on my fork if needed.

Data table (click to expand)
Pipeline v0.82 no boxing box closure box record box both
$data | get size 162 139 141 159 138
$data | get name 204 184 183 191 181
$data | get name size 419 380 390 400 380
$data | reject name 81 77 86 97 81
$data | reject size 90 87 87 85 92
$data | reject name size 94 90 81 90 88
$data | each { |file| { name: $file.name } } 2007 2070 2060 2110 2021
$data | each { |file| { name: $file.name, size: $file.size }} 2100 2148 2151 2170 2105
$data | each { |file| { name: $file.name, size: $file.size, type: $file.type }} 2212 2249 2235 2317 2175
$names | path parse 163 156 155 159 127
scope commands 4458 2811 2791 3065 2743
'turtle' | wrap a | wrap b | wrap c | wrap d | wrap e | get e.d.c.b.a 10k iter 36 34 34 36 36
memory usage after let data = (scope commands) 53MB 45MB 44MB 45MB 43MB

My takeaways:

  • box closure: little to no difference in time compared to no boxing for these benchmarks.
  • box record: has slightly worse times compared to no boxing as a result of the extra record box/indirection.
  • box both: seems to be on par with no boxing even though box record was worse. It seems like gains from smaller Values are offsetting the cost of the record box?

So, besides v0.82 and box record, I think any of the other three should be fine? The times seem too close to call.


Regarding memory usage, given r record Values and v non-record Values, memory usage should be as follows:

  • memory(no boxing) = size(Value) * r + size(Value) * v = 80r + 80v
  • memory(box both) = (size(Value) + size(Record)) * r + size(Value) * v = (48 + 48)r + 48v = 96r + 48v

Thus, memory(no boxing) > memory(box both) if 2v > r. In other words:

  1. no boxing uses more memory if the number of non-record values is more than half the number of records.
  2. box both uses more memory if the number of records is more the twice the number of non-record values.

The only way 2. is possible is if most records have zero or one columns. To verify this, I made the following nushell script:

Script (click to expand)

num_fields is the branching factor that specifies the number of record children each outer record has.
initial_fields specifies the number of non-record values in the leaf records.

def no-boxing-uses-less-mem [num_fields, initial_fields?] {
  let initial_fields = ($initial_fields | default $num_fields)
  mut nest_lvl = 0
  mut no_box = (80 + $initial_fields * 80)
  mut box_both = (96 + $initial_fields * 48)

  while $no_box >= $box_both and $no_box < 64000000000 {
    # print $'($nest_lvl): ($no_box) >= ($box_both)'
    $nest_lvl += 1
    $no_box = ($no_box * $num_fields + 80)
    $box_both = ($box_both * $num_fields + 96)
  }

  {
    num_fields: $num_fields
    initial_fields: $initial_fields
    nest_lvl: $nest_lvl
    no_box: ($no_box | into filesize)
    box_both: ($box_both | into filesize)
  }
}

0..=10 | each { |num_fields|
0..=10 | each { |initial_fields|
  no-boxing-uses-less-mem $num_fields $initial_fields
}}
| flatten
| filter { |x|
  ($x.num_fields >= 2 and $x.initial_fields >= 1
  and $x.no_box <= 64GB
  and $x.no_box < $x.box_both)
}

The code above gives the empty list. That is, box both will use less memory if each record has at least two child records or at least one non-record value (assuming maximum of 64GB memory). You can experiment on your own and change some of the parameters and/or filters. For example, changing part of the filter to $x.num_fields >= 1 gives 10 entries, but each entry needs a level of record nesting equal to 2 * initial_fields before no boxing uses less memory.

So, I'm still in favor of box both on grounds of less memory usage. Then again, we may choose to prioritize running time, in which case no boxing is more understood and the effects of box both are not as clear. E.g., all the benchmarks were run on my hardware, which is obviously not a substitute for the range of hardware running nushell. (The decreased memory usage should be present on all platforms/hardware though?)

So, I would ask the core team if I should go forward with no boxing, box closure, or box both? Thanks!
(The macro decision can come later or at the same time, whatever works.)

@sophiajt
Copy link
Copy Markdown
Contributor

A few thoughts (sorry for the delay here, was fighting a bout of covid).

This PR feels like a few different PRs rolled up into a single one. For example, the closure changes should be spun off onto a separate PR. Likewise, you change the existing API in a few places, that should also be a separate PR. This helps with reviewing the code so we're not trying to understand unrelated changes, but instead can review them on their own.

I'd also avoid macros. I get that some Rust folks like them, but I think they really hurt readability. They add a kind of "magic" syntax but you have to understand that magic to read it. You'll notice we only rarely use macros in the codebase and tend towards removing them and replacing them with non-macro Rust code.

An alternate approach we use quite a bit is to use a builder pattern. Something like:

record().
  field("foo", Value::String("bar", ...).
  field("baz", Value::String("qux", ...).

And so on. The pattern is easier to maintain and folks will already be familiar with it.

I like the gist of the PR - to simplify the code where we run a bit loose with the cols/vals idea. I think we could definitely land a version of that idea.

@IanManske
Copy link
Copy Markdown
Member Author

IanManske commented Jul 14, 2023

Oh no, hope you are feeling better @jntrnr!

Yeah, I got a little carried away with mixing different kinds of changes. Should I open a separate PR for each of the following?

  • Reuse the existing Closure type (probably no boxing for now?)
  • Adding the functions for creating each Value case
  • Adding the functions for creating test data for each Value case
  • Adding the as functions for each Value case
    (maybe the above three can be a single PR?)
  • &Span => Span
  • Making columns() return a slice intsead of Vec
  • impl Display for CellPath
  • Did I miss anything else?

(Thankfully, most of the commits are relatively atomic.)

I agree that macros can hurt readability, especially ones that are too complex. But not everything's black and white. In this particular case, the only "magic" syntax introduced is key => value. I would argue not that it's not that foreign or hard to understand. (Maybe I could change it to (key, value)?) By this logic we should not be using format!("{value}") either, since this also adds "magic" syntax.

I didn't use the builder pattern in this case, because the builder pattern will have to call vec.push() under the hood. This will do unnecessary capacity checking, etc., but more importantly it will reallocate the Vecs -- potentially multiple times! Again, specifying the Record capacity beforehand will help with this, but a hardcoded capacity would be brittle. record!, on the other hand, will set the capacity automatically and will always be accurate for future changes.

But I should probably be talking with evidence. I can make another branch using the builder pattern instead and see if there is any noticeable difference. If not, great! Then we can just use that branch.

@sophiajt
Copy link
Copy Markdown
Contributor

sophiajt commented Jul 15, 2023

@IanManske - that list looks like a great list of refactorings. They'd work as separate PRs, or grouped by theme.

I'll be curious what you find out with the builder pattern. Some areas of the code can be sensitive to allocation (specifically cloning) but in theory the inliner should hopefully be able to simplify some of the work of the builder. We'll see if I'm right, though.

@IanManske
Copy link
Copy Markdown
Member Author

Ok, I made two more branches where all instances of record! were replaced with the builder pattern. One branch, record-builder, uses Record::new. Another branch, record-builder-capacity, uses Record::with_capacity instead. record-builder-capacity is one commit ahead of record-builder which is one commit ahead of no-box.

Like before, release builds were run in the repository root with the default config. A subset of the commands that used record! were benchmarked using the following script/command:

Script (click to expand)
do {
  let data = (ls crates/**) # 195 rows
  let name = ($data | get name)
  let size = ($data | get size)
  let modified = ($data | get modified)
  print (timeit { for x in 0..1000 { $name | enumerate }})
  print (timeit { for x in 0..1000 { $name | uniq -c }})
  print (timeit { for x in 0..1000 { $modified | date to-record }})
  print (timeit { for x in 0..1000 { $size | fmt } })
  print (timeit { for x in 0..1000 { char --list }})
  print (timeit { for x in 0..1000 { scope commands }})
  print (timeit { for x in 0..1000 { help commands }})
  print (timeit { for x in 0..1000 { keybindings list }})
  print (timeit { for x in 0..1000 { keybindings default }})
}

The no-box times were used as the baseline, and the percent change from this baseline is given for each of the record builder branches. All times are in milliseconds:

Command no-box record-builder % change record-builder-capacity % change
enumerate 33 45 +36% 43 +30%
uniq -c 132 147 +11% 145 +10%
date to-record 83 146 +76% 135 +63%
fmt 129 189 +47% 174 +35%
char --list 23 35 +52% 35 +52%
scope commands 3110 3508 +13% 3392 +9%
help commands 956 1069 +12% 1030 +8%
keybindings list 52 65 +25% 62 +19%
keybindings default 85 96 +13% 95 +12%

I was expecting record-builder to be slower, but I'm surprised record-builder-capacity is slower as well. I guess the reallocation cost isn't as big as I thought, but rather the overhead of Vec::push is adding up?

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Aug 14, 2023

Hey @IanManske 👋🏻

Thanks again for putting up this PR. We appreciate you trying to make nushell better and I especially enjoyed the attention to detail that this PR took with all the back-and-forth with testing and discussing alternatives. Good job!

We'd like to move forward with this PR by landing it with a few changes. Here's what we're thinking.

  1. We like the new Record type and would like to keep that.
  2. We like the new record!() macro and want to keep the flavor that looks like this
record! {
  "char" => Value::string(format!("{c}"), Span::unknown()),
  "code" => Value::string(format!("{:#08x}", u32::from(c)), Span::unknown()),
  "modifier" => Value::string(format!("{modifiers:?}"), Span::unknown()),
  "flags" => Value::string(format!("{modifiers:#08b}"), Span::unknown()),
  "kind" => Value::string(format!("{kind:?}"), Span::unknown()),
  "state" => Value::string(format!("{state:?}"), Span::unknown()),
};
  1. We'd like to leave the Boxing for another PR. Admittedly, there was a lot of stuff in this PR originally, LOL.

Hopefully you're willing to continue working with us on this PR. Please let us know what you think.

@IanManske
Copy link
Copy Markdown
Member Author

Hi @fdncred!

Yes, I would love to continue working on this. I just got a little distracted with other PRs 😅. Also, I just wanted to thank everyone for their feedback -- I think the PR is in a much better spot now because of it!

But yeah, the changes you mentioned were exactly what I was thinking as well. I'll go ahead and bring this PR up to date. Additionally, I'll fix a few spots to keep edits as straightforward as possible. So, all of this will probably entail a rebase.

@sholderbach
Copy link
Copy Markdown
Member

Awesome @IanManske! thanks for your work on this and the other PRs.

As we don't have any line-by-line review in this PR feel free to rebase or open as a new PR, whatever makes it more convenient to turn out.

@IanManske IanManske mentioned this pull request Aug 23, 2023
@IanManske
Copy link
Copy Markdown
Member Author

Ok, I opted to open a new PR (correct branch name, better squash message, etc.). Closed in favor of #10103.

@IanManske IanManske closed this Aug 23, 2023
sophiajt pushed a commit that referenced this pull request Aug 24, 2023
# Description
This PR creates a new `Record` type to reduce duplicate code and
possibly bugs as well. (This is an edited version of #9648.)
- `Record` implements `FromIterator` and `IntoIterator` and so can be
iterated over or collected into. For example, this helps with
conversions to and from (hash)maps. (Also, no more
`cols.iter().zip(vals)`!)
- `Record` has a `push(col, val)` function to help insure that the
number of columns is equal to the number of values. I caught a few
potential bugs thanks to this (e.g. in the `ls` command).
- Finally, this PR also adds a `record!` macro that helps simplify
record creation. It is used like so:
   ```rust
   record! {
       "key1" => some_value,
       "key2" => Value::string("text", span),
       "key3" => Value::int(optional_int.unwrap_or(0), span),
       "key4" => Value::bool(config.setting, span),
   }
   ```
Since macros hinder formatting, etc., the right hand side values should
be relatively short and sweet like the examples above.

Where possible, prefer `record!` or `.collect()` on an iterator instead
of multiple `Record::push`s, since the first two automatically set the
record capacity and do less work overall.

# User-Facing Changes
Besides the changes in `nu-protocol` the only other breaking changes are
to `nu-table::{ExpandedTable::build_map, JustTable::kv_table}`.
@dead10ck dead10ck mentioned this pull request Sep 29, 2023
@IanManske IanManske deleted the small-value branch October 30, 2023 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants