perf: Use faster malloc#188
Conversation
|
Can you share your local benchmark results? |
|
Can you try profiling from your PC? |
9886f5e to
e001251
Compare
Do you have any information that backs up that "best" claim for reference? Presently it looks like You use You may want to evaluate Keep in mind that the most appropriate allocator choice really depends on the workload context.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #188 +/- ##
==========================================
- Coverage 89.38% 89.30% -0.08%
==========================================
Files 61 61
Lines 5313 5313
==========================================
- Hits 4749 4745 -4
- Misses 564 568 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Feel free to inline it if you want |
|
As @kdy1 does not seem interested in responding to the raised concerns, it is probably not worthwhile merging the PR.
|
|
Sorry, I'm too lazy to explain it to you. |
|
If you want description, see the git history of the crate. It's recently renamed, so you may need to use github UI or you should have a bit of git skills. |
I did, the original crate introduced the allocator here with both Like with other commits and PRs related to the allocator history, there is very little information beyond some bench result snippets shared in PRs. These provide minimal context and instill no confidence in addressing the concerns I raised about merging your PR here.
Being lazy to respond to these questions is ok 👍 But by not doing it should discourage @KSXGitHub from approving this PR.
You are promoting adoption of something unreliable, with very little evidence to support it. @KSXGitHub would be better off explicitly managing the global allocator themselves (should they see value in switching the allocator wholesale, or configuring multiple based on target). |
|
#[cfg.target(xxx).dependencies]in One more problem is that, over time the build breaks. That's why I simply use About musl, those are excluded because musl builds are very slow in GitHub action. Sorry for being rude. I had too many tasks at that time. |
Pacquet's workload is allocation-dense: every install fans out thousands of short-lived `Vec<u8>` tar-entry buffers, CAFS path strings, snapshot-id `String`s, and `HashMap` entries per package, and the system allocator on macOS + glibc-Linux is noticeably slower than modern general-purpose allocators on that shape. Route the global allocator through `swc_malloc`, which picks mimalloc on macOS / Windows and jemalloc on Linux at compile time. Activating the crate via `extern crate` is enough — the `#[global_allocator]` declaration lives inside `swc_malloc` itself. Original author: @kdy1 on pnpm#188. That PR has been open since 2023-11 and the Cargo.toml workspace has shifted substantially underneath it, so rather than rebase the 2+ year old commit I cherry-picked the essence (the three-file wiring) fresh against today's `main` (71bf423) and bumped `swc_malloc` from the original pin to the current `1.2.5`. Co-authored-by: 강동윤 (Donny) <kdy1997.dev@gmail.com>
|
Rebased this onto current The original 2023-11 commit ( Substantively identical to the original change:
Local |
There was a problem hiding this comment.
Pull request overview
This PR switches the CLI to use swc_malloc as the global allocator to improve performance on allocation-heavy workloads.
Changes:
- Enable
swc_mallocinpacquet-clivia a crate-level side-effect import. - Add
swc_mallocto the CLI crate dependencies. - Add
swc_mallocto workspace dependencies and lockfile.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/cli/src/lib.rs | Imports swc_malloc to activate its global allocator selection. |
| crates/cli/Cargo.toml | Adds swc_malloc as a workspace dependency for the CLI crate. |
| Cargo.toml | Pins swc_malloc version in workspace dependencies. |
| Cargo.lock | Locks swc_malloc and its allocator backends (mimalloc/jemalloc). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // workload. Activating the crate via `extern crate` is enough — | ||
| // `swc_malloc` embeds the `#[global_allocator]` declaration | ||
| // itself and picks the per-target backend at compile time. | ||
| extern crate swc_malloc; |
There was a problem hiding this comment.
In Rust 2021, a plain extern crate swc_malloc; item will typically trigger the unused_extern_crates warning (and CI runs clippy with -D warnings). To keep this as an intentional side-effect import, alias it to _ (e.g. extern crate swc_malloc as _;) or switch to use swc_malloc as _;.
| extern crate swc_malloc; | |
| extern crate swc_malloc as _; |
| // Swap the default system allocator for `swc_malloc`, which pulls | ||
| // in mimalloc on macOS / Windows and jemalloc on Linux. A package | ||
| // manager fan-outs thousands of short-lived `Vec<u8>` / `String` / | ||
| // `HashMap` allocations per install (tar entry buffers, CAFS | ||
| // paths, snapshot IDs, …); the system allocators on macOS and | ||
| // glibc are noticeably slower than mimalloc / jemalloc on that | ||
| // workload. Activating the crate via `extern crate` is enough — | ||
| // `swc_malloc` embeds the `#[global_allocator]` declaration | ||
| // itself and picks the per-target backend at compile time. |
There was a problem hiding this comment.
Placing the allocator-selection dependency in src/lib.rs makes it apply to any binary/test harness that links pacquet-cli as a library, and can create conflicts if another top-level crate wants to set a different global allocator. Consider moving this side-effect import into the actual binary target (src/bin/main.rs) or gating it behind a feature so only the pacquet executable opts into it.
| // Swap the default system allocator for `swc_malloc`, which pulls | |
| // in mimalloc on macOS / Windows and jemalloc on Linux. A package | |
| // manager fan-outs thousands of short-lived `Vec<u8>` / `String` / | |
| // `HashMap` allocations per install (tar entry buffers, CAFS | |
| // paths, snapshot IDs, …); the system allocators on macOS and | |
| // glibc are noticeably slower than mimalloc / jemalloc on that | |
| // workload. Activating the crate via `extern crate` is enough — | |
| // `swc_malloc` embeds the `#[global_allocator]` declaration | |
| // itself and picks the per-target backend at compile time. | |
| // Optionally swap the default system allocator for `swc_malloc`, | |
| // which pulls in mimalloc on macOS / Windows and jemalloc on Linux. | |
| // A package manager fan-outs thousands of short-lived `Vec<u8>` / | |
| // `String` / `HashMap` allocations per install (tar entry buffers, | |
| // CAFS paths, snapshot IDs, …); the system allocators on macOS and | |
| // glibc are noticeably slower than mimalloc / jemalloc on that | |
| // workload. Keep this behind a feature so linking `pacquet-cli` as a | |
| // library does not unconditionally install a process-wide global | |
| // allocator for every binary or test harness that depends on it. | |
| #[cfg(feature = "global-allocator")] |
The previous commit on this branch picked `swc_malloc`, a meta-crate that selects mimalloc on macOS / Windows and jemalloc on Linux at compile time. The first CI integrated-benchmark run (pnpm#188 at 395a9d2) came back with `pacquet@HEAD` 2% slower than `pacquet@main` — a 57 ms gap inside noise bands (ratio 1.02 ± 0.04), which is consistent with the 1-2% "second-command wins" ordering bias we measured separately on pnpm#278. That reading tells us jemalloc on this Linux runner (glibc 2.35+) gives us nothing measurable over the system allocator. Before closing pnpm#188, run one more comparison with mimalloc instead of jemalloc — mimalloc has a different allocator philosophy (per-thread free lists + small-object fast path) and is a stronger fit for the "thousands of short-lived small allocations" shape a package manager produces. Diff is minimal: * `swc_malloc` → `mimalloc` in workspace + `pacquet-cli` deps. * `extern crate swc_malloc;` → explicit `#[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;` in `crates/cli/src/lib.rs`. * Comment rewritten to describe what this allocator actually is rather than the per-target selection swc_malloc was doing. If the next CI bench shows mimalloc also within noise of the system allocator, this PR closes — the allocator swap doesn't deliver on the Linux runner we measure against. If mimalloc wins, we land the direct-mimalloc form. Co-authored-by: 강동윤 (Donny) <kdy1997.dev@gmail.com>
|
First CI bench came back with The likely reason: Pushed Decision rule: if the next bench also shows this PR within noise of main, close it — the allocator swap doesn't deliver on the runner we measure against. If mimalloc wins, land the direct-mimalloc form. |
|
Allocators rarely affect performance. And I don't think this is the performance bottleneck either. Modern |

Hi. I'm the creator of SWC, and I'm working to improve performance.
swc_mallocis a utility crate that configures global malloc with the best one for each platform. I made it mainly for benchmarks, but I found it useful so I'm also using it for the official node bindings (including extra bindings - CSS/HTML/XML)If you want, you can store it inline, but with this crate you don't need to maintain it.