Add packed QueryOrigin#1103
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
Merging this PR will improve performance by 6.28%
Performance Changes
Tip Curious why this is faster? Comment Comparing |
|
Hmm, the performance results are a bit surprising |
dc66f2c to
345bb67
Compare
|
Okay, this looks pretty reasonable |
|
CC: @Veykril |
|
This version is perf-neutral on ty astral-sh/ruff#25545 |
the codspeed report looks good, but there are still some big timing changes reported by ecosystem-analyzer FYI: astral-sh/ruff#25545 (comment) (full report at https://61f6224b.ty-ecosystem-ext.pages.dev/timing) |
I saw that. I'm a bit skeptical on the results. I have to do a closer review but I don't think there's anything in here that justifies a 2x cost |
|
Unlike what I thought, this was a real regression in the ecosystem check but only on x86. |
| // Store the key parts directly rather than as a `DatabaseKeyIndex`. Packed origins | ||
| // unpack many transient `QueryEdge`s while flattening cycles; keeping the fields | ||
| // split lets `kind`, `Hash`, and `Eq` operate on the decoded words directly. |
There was a problem hiding this comment.
Curious but this makes it sound like #[repr(transparent)] could have the same effect no?
There was a problem hiding this comment.
The issue is that going from packed to DatabaseKeyIndex requires
// Clear the tag to restore the original index.
DatabaseKeyIndex::new(self.ingredient.with_tag(false), self.id())
and Id is a NonZero, also requiring some computations
which turned out to be slow on x86 when called in a hot Hash or Eq?
|
I plan to clean this up |
Summary
This PR introduces a new packed
QueryOriginthat uses 8 bytes to represent a query edge instead of 12 for as long as all edgesingredient index <= 4095generation <= 1,048,575Which should be true for most edges. The implementation uses the same 12 byte representation as we use today if any edge has an ingredient index or a generation larger than those limits.
This PR is entirely written by codex and i mainly want to get some initial feedback on if this is an optimization we want or if we'd prefer https://github.com/salsa-rs/salsa/compare/master...MichaReiser:salsa:micha/query-edge-compact-boxed?expand=1 which implements a similar optimization but where the packing happens at the
QueryEdgelevel (edges with larger ingredient index or generations are boxed).I'm leaning towards this approach because the characteristics between the packed and non packed remain mostly the same. That is, it doesn't introduce extra allocations. It's just that the allocated query edges is somewhat bigger.
This reduces ty's memory usage by about 10%