perf: reorder cell-path member accesses to avoid clones#15682
perf: reorder cell-path member accesses to avoid clones#15682cptpiepmatz merged 6 commits intonushell:mainfrom
Conversation
|
I take the example in the caveat section as a bug fix 🫠 Maybe even Update: there're some problems in the cell_path completion for lists like |
|
I know you've tried different inputs, but can we do some fuzzy testing on this? Maybe I don't fully understand how cell paths are evaluated. I'm just concerned we might break something in odd cases. |
7fa95a7 to
3efac3e
Compare
hmm, I don't think this is correct. you can already get |
|
My hunch is that reordering beyond moving one level (which would suffice for correct table semantics) would be overly permissive and could lead to surprises on input data that is not conforming to the schema that was initially expected. Again NO performance tag without at least a minimal attempt at showing a benchmark (doesn't have to be an exhaustive and fair suite but show with an example and don't claim) |
|
@Bahex where are we at on this? Should we land it? |
|
|
ok, we should put it on the agenda for a meeting so we can discuss it. I don't think you'll get much attention here. |
3efac3e to
d8a0482
Compare
|
After rebasing the branch on main, I ran some benchmarks comparing against main in both profiles, using both a small and a moderately big sample. let binaries = [
"./main-debug"
"./main-release"
"./follow-cellpath-reorder-debug"
"./follow-cellpath-reorder-release"
]
let benchmark_src = r#'
use std/bench
let small = seq 1 4 | wrap a
let big = seq 1 100 | each {|i| {e: {d: {c: {b: {a: $in}}}}} }
bench --rounds 1000 ...[
{ $big.e.d.c.b.a.42 }
{ $big.42.e.d.c.b.a }
{ $small.a.1 }
{ $small.1.a }
]
| update code { ansi strip }
| save -f ($nu.current-exe)-tests.nuon
'#
$binaries | each {|bin| ^$bin -n -c $benchmark_src} | ignore
follow-cellpath-reorder-debug-tests.nuon
follow-cellpath-reorder-release-tests.nuon
main-debug-tests.nuon
main-release-tests.nuon
|
|
As #16028 is merged now, we can add an option for the reordering. |
d8a0482 to
90493b5
Compare
cptpiepmatz
left a comment
There was a problem hiding this comment.
I tested this with your benchmark value and it works great. 👍🏻
Co-authored-by: Piepmatz <git+github@cptpiepmatz.de>
# Description In #16028 I also added a test to check that identifiers are valid to ensure that we have consistency there. But I only checked for alphanumeric strings as identifiers. It doesn't allow underscores or dashes. @Bahex used in his PR about #15682 a dash to separate words. So expanded the test to allow that. # User-Facing Changes None. # Tests + Formatting The `assert_identifiers_are_valid` now allows dashes. # After Submitting The tests in #15682 should work then.
# Description In nushell#16028 I also added a test to check that identifiers are valid to ensure that we have consistency there. But I only checked for alphanumeric strings as identifiers. It doesn't allow underscores or dashes. @Bahex used in his PR about nushell#15682 a dash to separate words. So expanded the test to allow that. # User-Facing Changes None. # Tests + Formatting The `assert_identifiers_are_valid` now allows dashes. # After Submitting The tests in nushell#15682 should work then.
Co-authored-by: Bahex <17417311+Bahex@users.noreply.github.com> Co-authored-by: Piepmatz <git+github@cptpiepmatz.de>
Description
While traversing a value with a cell-path, if the value is a list, all column accesses will essentially be a
mapoperation over its elements, and produce another list.This is wasteful if there is a row access after the column accesses, as all other rows are then discarded.
In comparison, if we reorder cell-path access to prioritize row accesses over column accesses:
This reordering works especially well with #15640, as accessing a row in a list or a field in a record can be done through references, avoiding clones.
Caveats
Whether this is a good thing or bad isn't really clear 🤷
User-Facing Changes
Tests + Formatting
After Submitting