Skip to content

Add packed QueryOrigin#1103

Merged
MichaReiser merged 12 commits into
salsa-rs:masterfrom
MichaReiser:micha/query-edge-packed-or-wide
Jun 4, 2026
Merged

Add packed QueryOrigin#1103
MichaReiser merged 12 commits into
salsa-rs:masterfrom
MichaReiser:micha/query-edge-packed-or-wide

Conversation

@MichaReiser

@MichaReiser MichaReiser commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR introduces a new packed QueryOrigin that uses 8 bytes to represent a query edge instead of 12 for as long as all edges

  • ingredient index <= 4095
  • generation <= 1,048,575
  • it's an output (only used by specify)

Which should be true for most edges. The implementation uses the same 12 byte representation as we use today if any edge has an ingredient index or a generation larger than those limits.

This PR is entirely written by codex and i mainly want to get some initial feedback on if this is an optimization we want or if we'd prefer https://github.com/salsa-rs/salsa/compare/master...MichaReiser:salsa:micha/query-edge-compact-boxed?expand=1 which implements a similar optimization but where the packing happens at the QueryEdge level (edges with larger ingredient index or generations are boxed).

I'm leaning towards this approach because the characteristics between the packed and non packed remain mostly the same. That is, it doesn't introduce extra allocations. It's just that the allocated query edges is somewhat bigger.

This reduces ty's memory usage by about 10%

@netlify

netlify Bot commented Jun 2, 2026

Copy link
Copy Markdown

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 9588802
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/6a21b59921bd6b0008f2fd3c

@codspeed-hq

codspeed-hq Bot commented Jun 2, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 6.28%

⚡ 5 improved benchmarks
✅ 21 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory converge_diverge 9.8 KB 8.9 KB +10.89%
Simulation amortized[InternedInput] 2.2 µs 2 µs +5.73%
Memory converge_diverge_nested 12.7 KB 12 KB +5.49%
Memory new[SupertypeInput] 165 B 157 B +5.1%
Memory new[Input] 97 B 93 B +4.3%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing MichaReiser:micha/query-edge-packed-or-wide (9588802) with master (0f7e31d)

Open in CodSpeed

@MichaReiser

Copy link
Copy Markdown
Contributor Author

Hmm, the performance results are a bit surprising

@MichaReiser MichaReiser force-pushed the micha/query-edge-packed-or-wide branch from dc66f2c to 345bb67 Compare June 2, 2026 11:20
@MichaReiser

Copy link
Copy Markdown
Contributor Author

Okay, this looks pretty reasonable

@MichaReiser

Copy link
Copy Markdown
Contributor Author

CC: @Veykril

MichaReiser added a commit to astral-sh/ruff that referenced this pull request Jun 2, 2026
@MichaReiser

Copy link
Copy Markdown
Contributor Author

This version is perf-neutral on ty astral-sh/ruff#25545

@AlexWaygood

AlexWaygood commented Jun 2, 2026

Copy link
Copy Markdown

This version is perf-neutral on ty astral-sh/ruff#25545

the codspeed report looks good, but there are still some big timing changes reported by ecosystem-analyzer FYI: astral-sh/ruff#25545 (comment) (full report at https://61f6224b.ty-ecosystem-ext.pages.dev/timing)

@MichaReiser

Copy link
Copy Markdown
Contributor Author

This version is perf-neutral on ty astral-sh/ruff#25545

the codspeed report looks good, but there are still some big timing changes reported by ecosystem-analyzer FYI: astral-sh/ruff#25545 (comment) (full report at 61f6224b.ty-ecosystem-ext.pages.dev/timing)

I saw that. I'm a bit skeptical on the results. I have to do a closer review but I don't think there's anything in here that justifies a 2x cost

@MichaReiser

Copy link
Copy Markdown
Contributor Author

Unlike what I thought, this was a real regression in the ecosystem check but only on x86.

Comment thread src/zalsa_local.rs Outdated
Comment thread src/zalsa_local.rs Outdated
Comment on lines +1249 to +1251
// Store the key parts directly rather than as a `DatabaseKeyIndex`. Packed origins
// unpack many transient `QueryEdge`s while flattening cycles; keeping the fields
// split lets `kind`, `Hash`, and `Eq` operate on the decoded words directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious but this makes it sound like #[repr(transparent)] could have the same effect no?

@MichaReiser MichaReiser Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that going from packed to DatabaseKeyIndex requires

// Clear the tag to restore the original index.
DatabaseKeyIndex::new(self.ingredient.with_tag(false), self.id())

and Id is a NonZero, also requiring some computations

which turned out to be slow on x86 when called in a hot Hash or Eq?

@MichaReiser MichaReiser marked this pull request as draft June 4, 2026 06:42
@MichaReiser

Copy link
Copy Markdown
Contributor Author

I plan to clean this up

MichaReiser added a commit to astral-sh/ruff that referenced this pull request Jun 4, 2026
MichaReiser added a commit to astral-sh/ruff that referenced this pull request Jun 4, 2026
@MichaReiser MichaReiser marked this pull request as ready for review June 4, 2026 17:33
@MichaReiser MichaReiser added this pull request to the merge queue Jun 4, 2026
Merged via the queue into salsa-rs:master with commit f8162cf Jun 4, 2026
12 checks passed
@MichaReiser MichaReiser deleted the micha/query-edge-packed-or-wide branch June 4, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants