Skip to content

perf: reorder cell-path member accesses to avoid clones#15682

Merged
cptpiepmatz merged 6 commits intonushell:mainfrom
Bahex:follow-cellpath-reorder
Jul 2, 2025
Merged

perf: reorder cell-path member accesses to avoid clones#15682
cptpiepmatz merged 6 commits intonushell:mainfrom
Bahex:follow-cellpath-reorder

Conversation

@Bahex
Copy link
Copy Markdown
Member

@Bahex Bahex commented May 2, 2025

Description

While traversing a value with a cell-path, if the value is a list, all column accesses will essentially be a map operation over its elements, and produce another list.

This is wasteful if there is a row access after the column accesses, as all other rows are then discarded.

let val = [[bar]; [{foo: [A, B]}], [{foo: [C, D]}]]

$val.bar
# => [[foo]; [[A, B]], [[C, D]]]
# this is equivalent to the following, because we're
# accessing the `bar` field of all list items 
$val | each { $in.bar }

$val.bar.foo
# => [[A, B], [C, D]]
# because cell-path access works on values and not streams,
# we're actually collecting in between these
$val | each { $in.bar } | collect | each { $in.foo }

$val.bar.foo.0
# => [A, B]
# row accesses are straight forward, it's just indexing into a list
# no mapping over items required
$val | each { $in.bar } | collect | each { $in.foo.0 }

$val.bar.foo.0.0
# => A
$val | each { $in.bar } | collect | each { $in.foo.0.0 }

In comparison, if we reorder cell-path access to prioritize row accesses over column accesses:

let val = [[bar]; [{foo: [A, B]}], [{foo: [C, D]}]]
# we have a list, so we can do a row access
# let's move the first row access to the front of the cell-path
# $.bar.foo.0.0 => $.0.bar.foo.0
$val.0
# => {bar: {foo: [A, B]}}
# just indexing into a list, no mapping or collecting

# $val.0 is not a list we can't move any other row access to the front
# but that's ok, accessing a field on a record is similarly cheap
$val.0.bar
# => {foo: [A, B]}

# another record field access 
$val.0.bar.foo
# => [A, B]

$val.0.bar.foo.0
# => A
# we've reached the same value!

This reordering works especially well with #15640, as accessing a row in a list or a field in a record can be done through references, avoiding clones.

Caveats

let val = [{foo: A}, {baz: B}]
$val
# => ╭───┬─────┬─────╮
# => │ # │ foo │ baz │
# => ├───┼─────┼─────┤
# => │ 0 │ A   │ ❎  │
# => │ 1 │ ❎  │ B   │
# => ╰───┴─────┴─────╯

$val.foo
# throws an error as expected

$val.foo.0
# before this PR, this also throws an error
# but due to reordering the access to `$val.0.foo`
# this works without issue and returns `A`

Whether this is a good thing or bad isn't really clear 🤷

User-Facing Changes

Tests + Formatting

  • 🟢 toolkit fmt
  • 🟢 toolkit clippy
  • 🟢 toolkit test
  • 🟢 toolkit test stdlib

After Submitting

@blindFS
Copy link
Copy Markdown
Contributor

blindFS commented May 3, 2025

I take the example in the caveat section as a bug fix 🫠

Maybe even $val.foo should return [A, null]?

Update: there're some problems in the cell_path completion for lists like [{foo: A}, {baz: B}]. I'll fix it up when we come to a conclusion of what $val.foo should behave like.

@Bahex Bahex added performance Work to make nushell quicker and use less resources A:cell-path-semantics All around the cell path type and its semantics for access labels May 4, 2025
@cptpiepmatz
Copy link
Copy Markdown
Member

I know you've tried different inputs, but can we do some fuzzy testing on this? Maybe I don't fully understand how cell paths are evaluated. I'm just concerned we might break something in odd cases.

@Bahex Bahex force-pushed the follow-cellpath-reorder branch from 7fa95a7 to 3efac3e Compare May 13, 2025 20:22
@132ikl
Copy link
Copy Markdown
Member

132ikl commented May 19, 2025

I take the example in the caveat section as a bug fix 🫠

Maybe even $val.foo should return [A, null]?

Update: there're some problems in the cell_path completion for lists like [{foo: A}, {baz: B}]. I'll fix it up when we come to a conclusion of what $val.foo should behave like.

hmm, I don't think this is correct. you can already get [A, null] with $val.foo?, which case $val.foo?.0 works. i think the current behavior is correct, and the behavior from this PR would be incorrect?

@sholderbach
Copy link
Copy Markdown
Member

My hunch is that reordering beyond moving one level (which would suffice for correct table semantics) would be overly permissive and could lead to surprises on input data that is not conforming to the schema that was initially expected.

Again NO performance tag without at least a minimal attempt at showing a benchmark (doesn't have to be an exhaustive and fair suite but show with an example and don't claim)

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jun 13, 2025

@Bahex where are we at on this? Should we land it?

@Bahex
Copy link
Copy Markdown
Member Author

Bahex commented Jun 13, 2025

@fdncred

  1. We need to reach consensus about whether this behavior is acceptable/desirable.

    let val = [{foo: A}, {baz: B}]
    $val
    # => ╭───┬─────┬─────╮
    # => │ # │ foo │ baz │
    # => ├───┼─────┼─────┤
    # => │ 0 │ A   │ ❎  │
    # => │ 1 │ ❎  │ B   │
    # => ╰───┴─────┴─────╯
    
    $val.foo
    # throws an error as expected
    
    $val.foo.0
    # before this PR, this also throws an error
    # but due to reordering the access to `$val.0.foo`
    # this works without issue and returns `A`

    Whether this is a good thing or bad isn't really clear 🤷

  2. Additional tests to catch any possible regressions (I would like some help with coming up with or generating test cases)

  3. Benchmarks

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Jun 13, 2025

ok, we should put it on the agenda for a meeting so we can discuss it. I don't think you'll get much attention here.

@Bahex Bahex force-pushed the follow-cellpath-reorder branch from 3efac3e to d8a0482 Compare June 22, 2025 08:57
@Bahex
Copy link
Copy Markdown
Member Author

Bahex commented Jun 22, 2025

After rebasing the branch on main, I ran some benchmarks comparing against main in both profiles, using both a small and a moderately big sample.

let binaries = [
	"./main-debug"
	"./main-release"
	"./follow-cellpath-reorder-debug"
	"./follow-cellpath-reorder-release"
]

let benchmark_src = r#'
	use std/bench

	let small = seq 1 4 | wrap a
	let big = seq 1 100 | each {|i| {e: {d: {c: {b: {a: $in}}}}} }

	bench --rounds 1000 ...[
		{ $big.e.d.c.b.a.42 }
		{ $big.42.e.d.c.b.a }
		{ $small.a.1 }
		{ $small.1.a }
	]
	| update code { ansi strip }
	| save -f ($nu.current-exe)-tests.nuon
'#

$binaries | each {|bin| ^$bin -n -c $benchmark_src} | ignore

  • total_ratio is ratio among all the results
  • each benchmark contains a manually reordered cell-path access for contrast

follow-cellpath-reorder-debug-tests.nuon

code mean min max std ratio total_ratio
$big.e.d.c.b.a.42 52µs 72ns 50µs 652ns 75µs 566ns 1µs 846ns 1.6395982241254448 14.684715172024816
$big.42.e.d.c.b.a 51µs 328ns 50µs 17ns 75µs 278ns 1µs 799ns 1.6161717938222235 14.474901297236322
$small.a.1 31µs 915ns 30µs 829ns 42µs 501ns 1µs 287ns 1.0049119934506754 9.000282007896221
$small.1.a 31µs 759ns 30µs 757ns 39µs 992ns 1µs 180ns 1 8.95628877608573

follow-cellpath-reorder-release-tests.nuon

code mean min max std ratio total_ratio
$big.e.d.c.b.a.42 11µs 116ns 10µs 500ns 38µs 658ns 1µs 497ns 2.9399629727585297 3.134799774393683
$big.42.e.d.c.b.a 11µs 175ns 10µs 890ns 25µs 442ns 803ns 2.9555673102353874 3.1514382402707275
$small.a.1 3µs 781ns 3µs 435ns 21µs 497ns 1µs 49ns 1 1.066271855611957
$small.1.a 3µs 861ns 3µs 519ns 34µs 522ns 1µs 142ns 1.0211584236974345 1.0888324873096447

main-debug-tests.nuon

code mean min max std ratio total_ratio
$big.e.d.c.b.a.42 254µs 796ns 217µs 970ns 535µs 8ns 47µs 841ns 8.660344651779342 71.85448392554991
$big.42.e.d.c.b.a 51µs 828ns 47µs 131ns 137µs 84ns 7µs 708ns 1.761598857958601 14.61590524534687
$small.a.1 33µs 428ns 31µs 602ns 72µs 972ns 4µs 303ns 1.1361952346963053 9.426959954878736
$small.1.a 29µs 421ns 28µs 36ns 65µs 599ns 3µs 447ns 1 8.296954314720812

main-release-tests.nuon

code mean min max std ratio total_ratio
$big.e.d.c.b.a.42 35µs 227ns 34µs 263ns 68µs 109ns 1µs 941ns 9.934292160180485 9.934292160180485
$big.42.e.d.c.b.a 10µs 830ns 10µs 525ns 30µs 937ns 856ns 3.05414551607445 3.05414551607445
$small.a.1 3µs 842ns 3µs 595ns 18µs 131ns 716ns 1.083474337281444 1.083474337281444
$small.1.a 3µs 546ns 3µs 436ns 8µs 596ns 296ns 1 1

@cptpiepmatz
Copy link
Copy Markdown
Member

As #16028 is merged now, we can add an option for the reordering.

@Bahex Bahex force-pushed the follow-cellpath-reorder branch from d8a0482 to 90493b5 Compare July 2, 2025 00:35
@cptpiepmatz cptpiepmatz self-requested a review July 2, 2025 09:47
Copy link
Copy Markdown
Member

@cptpiepmatz cptpiepmatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with your benchmark value and it works great. 👍🏻

Co-authored-by: Piepmatz <git+github@cptpiepmatz.de>
Bahex pushed a commit that referenced this pull request Jul 2, 2025
# Description

In #16028 I also added a test to check that identifiers are valid to
ensure that we have consistency there. But I only checked for
alphanumeric strings as identifiers. It doesn't allow underscores or
dashes. @Bahex used in his PR about #15682 a dash to separate words. So
expanded the test to allow that.

# User-Facing Changes

None.

# Tests + Formatting

The `assert_identifiers_are_valid` now allows dashes.

# After Submitting

The tests in #15682 should work then.
@cptpiepmatz cptpiepmatz merged commit c95c1e8 into nushell:main Jul 2, 2025
17 checks passed
@github-actions github-actions bot added this to the v0.106.0 milestone Jul 2, 2025
mkatychev pushed a commit to mkatychev/nushell that referenced this pull request Jul 2, 2025
# Description

In nushell#16028 I also added a test to check that identifiers are valid to
ensure that we have consistency there. But I only checked for
alphanumeric strings as identifiers. It doesn't allow underscores or
dashes. @Bahex used in his PR about nushell#15682 a dash to separate words. So
expanded the test to allow that.

# User-Facing Changes

None.

# Tests + Formatting

The `assert_identifiers_are_valid` now allows dashes.

# After Submitting

The tests in nushell#15682 should work then.
mkatychev pushed a commit to mkatychev/nushell that referenced this pull request Jul 2, 2025
Co-authored-by: Bahex <17417311+Bahex@users.noreply.github.com>
Co-authored-by: Piepmatz <git+github@cptpiepmatz.de>
@Bahex Bahex deleted the follow-cellpath-reorder branch March 22, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A:cell-path-semantics All around the cell path type and its semantics for access performance Work to make nushell quicker and use less resources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants