Skip to content

Parallel copperlists#942

Merged
gbin merged 23 commits into
masterfrom
gbin/checkpoints
Mar 21, 2026
Merged

Parallel copperlists#942
gbin merged 23 commits into
masterfrom
gbin/checkpoints

Conversation

@gbin

@gbin gbin commented Mar 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

This is a feature that was in concepts for a long while: push the bandwidth up for copper by parallelizing the copperlists while preserving the overall serial shape of the execution in terms of causality.

Related issues

  • Closes #

Changes

Testing

  • just fmt
  • just lint
  • just test
  • optional full just std-ci (if std/runtime paths are impacted)
  • optional full just nostd-ci (if embedded/no_std paths are impacted)
  • Other (please specify):

pro-tip: just with no parameters in the root defaults to just fmt, just lint, and just test.

Checklist

  • I have updated docs or examples where needed
  • I have added or updated tests where needed
  • I have considered platform impact (Linux/macOS/Windows/embedded)
  • I have considered config/logging changes (if applicable)
  • This change is not a breaking change (or I documented it below)

Breaking changes (if any)

Additional context

@gbin gbin added enhancement New feature or request include in changelog labels Mar 20, 2026
@gbin gbin marked this pull request as draft March 20, 2026 20:09
@gbin gbin marked this pull request as ready for review March 20, 2026 20:09
@gbin gbin force-pushed the gbin/checkpoints branch from 91daf41 to 259b42d Compare March 20, 2026 21:06
@gbin gbin changed the title WIP: parallel copperlists Parallel copperlists Mar 21, 2026
@gbin

gbin commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator Author

We tested 2 architectures in this PR:

  • stage pipeline: a pool is executing stage by stage any unlocked workload.
    - serial: 131.474s
    - parallel: 10.016s
  • checkpoints: a thread goes through 1 CL and waits if it is going too fast before passing another one
    - serial: 131.013s
    - parallel: 12.902s

@gbin

gbin commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator Author

Added a couple experiment an having a spsc between the stages is slightly better:
Current Config

Branch Runs (s) Avg (s) Range (s)
gbin/checkpoints 7.801, 13.386, 9.404 10.197 5.585
gbin/checkpoints_spsc 7.805, 11.228, 11.217 10.083 3.423
gbin/checkpoints_spsc_custom 7.962, 15.499, 10.284 11.248 7.537

No Logging
I temporarily disabled global task logging plus the frames task logging in temp worktrees only, ran the same 3x round-robin, then restored those worktrees to a clean state.

Branch Runs (s) Avg (s) Range (s)
gbin/checkpoints 7.606, 7.809, 7.685 7.700 0.203
gbin/checkpoints_spsc 7.641, 7.653, 7.651 7.648 0.012
gbin/checkpoints_spsc_custom 7.674, 7.725, 7.767 7.722 0.093

@gbin

gbin commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator Author

ok with the fsync + small bug fix on the mmap, we shaved half the stdev between runs:

  • baseline-clean: mean 10.399s, stdev 1.309s, range 3.390s
  • fsync-clean: mean 11.847s, stdev 0.784s, range 2.124s

# Conflicts:
#	core/cu29/Cargo.toml
#	core/cu29_runtime/Cargo.toml
@gbin

gbin commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator Author

urgh, as a last pass before merging that, checking if we did not added any overhead on the base case .. we did:

  • master run 1: 248 ns mean, 6,822,755 samples
  • master run 2: 250 ns mean, 6,754,493 samples
  • gbin/checkpoints run 1: 375 ns mean, 5,747,314 samples
  • gbin/checkpoints run 2: 369 ns mean, 5,889,504 samples

@gbin gbin merged commit 234feda into master Mar 21, 2026
23 checks passed
@gbin gbin deleted the gbin/checkpoints branch March 21, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant