Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

perf: Use faster malloc#188

Closed
kdy1 wants to merge 2 commits into
pnpm:mainfrom
kdy1:perf-1-malloc
Closed

perf: Use faster malloc#188
kdy1 wants to merge 2 commits into
pnpm:mainfrom
kdy1:perf-1-malloc

Conversation

@kdy1

@kdy1 kdy1 commented Nov 11, 2023

Copy link
Copy Markdown

Hi. I'm the creator of SWC, and I'm working to improve performance.

swc_malloc is a utility crate that configures global malloc with the best one for each platform. I made it mainly for benchmarks, but I found it useful so I'm also using it for the official node bindings (including extra bindings - CSS/HTML/XML)

If you want, you can store it inline, but with this crate you don't need to maintain it.

@anonrig

anonrig commented Nov 11, 2023

Copy link
Copy Markdown
Member

Can you share your local benchmark results?

@kdy1

kdy1 commented Nov 11, 2023

Copy link
Copy Markdown
Author

I didn't post it because the benchmark was too flaky, but there you go.

image

@kdy1

kdy1 commented Nov 13, 2023

Copy link
Copy Markdown
Author

Can you try profiling from your PC?

Comment thread Cargo.toml Outdated
Comment thread crates/cli/src/lib.rs Outdated
@kdy1 kdy1 force-pushed the perf-1-malloc branch 4 times, most recently from 9886f5e to e001251 Compare November 22, 2023 05:48
@polarathene

polarathene commented Nov 23, 2023

Copy link
Copy Markdown

swc_malloc is a utility crate that configures global malloc with the best one for each platform.

Do you have any information that backs up that "best" claim for reference?

Presently it looks like swc_malloc only has two allocators that'd be selected? mimalloc and jemalloc (Cargo.toml + conditionals for each in lib.rs).

You use mimalloc for anything not on linux, and jemalloc for linux (but only x86_64 + aarch64 archs on -gnu target), anything else will fall back to the default? So that will not be the best for -musl targets which tend to perform poorly (especially on multi-threaded contexts) with the default allocator.


You may want to evaluate snmalloc which may be worthwhile.

Keep in mind that the most appropriate allocator choice really depends on the workload context.

  • So pushing swc_malloc elsewhere may not make the "best" allocator choice for other projects, the effectiveness could vary on which one is actually best for a projects needs?
  • Given the prefix, would it be safe to say that the choice of allocator is subject to change, will it be a breaking semver change or implicitly forced upon downstreams where regressions may occur?

@codecov

codecov Bot commented Nov 24, 2023

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.30%. Comparing base (71bf423) to head (395a9d2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
- Coverage   89.38%   89.30%   -0.08%     
==========================================
  Files          61       61              
  Lines        5313     5313              
==========================================
- Hits         4749     4745       -4     
- Misses        564      568       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kdy1

kdy1 commented Nov 24, 2023

Copy link
Copy Markdown
Author

Feel free to inline it if you want

@polarathene

polarathene commented Nov 24, 2023

Copy link
Copy Markdown

As @kdy1 does not seem interested in responding to the raised concerns, it is probably not worthwhile merging the PR.

  • Claims of allocator selection being best lack evidence / context and may not be applicable to other projects. No clear advantage demonstrated for paquet?
  • Allocator selection excludes either for -musl targets, using the default which performs poorly in multi-threading.
  • Allocator selection may be subject to change implicitly, neither is explicit selection control available.

@kdy1

kdy1 commented Nov 24, 2023

Copy link
Copy Markdown
Author

Sorry, I'm too lazy to explain it to you.

@kdy1

kdy1 commented Nov 24, 2023

Copy link
Copy Markdown
Author

If you want description, see the git history of the crate.

It's recently renamed, so you may need to use github UI or you should have a bit of git skills.

@polarathene

polarathene commented Nov 24, 2023

Copy link
Copy Markdown

If you want description, see the git history of the crate.

I did, the original crate introduced the allocator here with both jemalloc and mimalloc (specific commit from PR branch).

Like with other commits and PRs related to the allocator history, there is very little information beyond some bench result snippets shared in PRs. These provide minimal context and instill no confidence in addressing the concerns I raised about merging your PR here.


I'm too lazy to explain it to you.

Being lazy to respond to these questions is ok 👍 But by not doing it should discourage @KSXGitHub from approving this PR.

  • You're aware that just because you had an improvement in your bench, it does not mean it will be "best" for all projects? All you have to support your choice is a few lines of benches related to your project?
  • You're aware that your crates config patterns is non-exhaustive? The common musl targets are excluded intentionally (without any reasoning?)
  • You're aware that you may change the allocators in future, or adjust which targets qualify for a specific allocator, yet won't express if that'd be treated as a breaking change semver bump, or if you'd provide any explicit control of allocator selection?

You are promoting adoption of something unreliable, with very little evidence to support it. @KSXGitHub would be better off explicitly managing the global allocator themselves (should they see value in switching the allocator wholesale, or configuring multiple based on target).

@kdy1

kdy1 commented Dec 2, 2023

Copy link
Copy Markdown
Author

xxx_malloc crate is important for multi-crate projects because you can do extern crate xxx_malloc from benchmark files. It does not replace the default malloc if there's no extern crate. If you try to inline it into a final binary crate, configuring malloc for benchmarking becomes very cumbersome.

#[cfg.target(xxx).dependencies]

in Cargo.toml of each crate is required.


One more problem is that, over time the build breaks. That's why I simply use swc_malloc (formerly swc_node_base) from my other private projects. I'll maintain SWC anyway - so I don't need to update my side project just because of the build environment change.


About musl, those are excluded because musl builds are very slow in GitHub action.


Sorry for being rude. I had too many tasks at that time.

Pacquet's workload is allocation-dense: every install fans out
thousands of short-lived `Vec<u8>` tar-entry buffers, CAFS path
strings, snapshot-id `String`s, and `HashMap` entries per
package, and the system allocator on macOS + glibc-Linux is
noticeably slower than modern general-purpose allocators on that
shape. Route the global allocator through `swc_malloc`, which
picks mimalloc on macOS / Windows and jemalloc on Linux at
compile time. Activating the crate via `extern crate` is enough —
the `#[global_allocator]` declaration lives inside `swc_malloc`
itself.

Original author: @kdy1 on pnpm#188. That PR has been open since
2023-11 and the Cargo.toml workspace has shifted substantially
underneath it, so rather than rebase the 2+ year old commit I
cherry-picked the essence (the three-file wiring) fresh against
today's `main` (71bf423) and bumped `swc_malloc` from the
original pin to the current `1.2.5`.

Co-authored-by: 강동윤 (Donny) <kdy1997.dev@gmail.com>
@zkochan

zkochan commented Apr 24, 2026

Copy link
Copy Markdown
Member

Rebased this onto current main (71bf423) so CI + benchmarks can actually run against today's code.

The original 2023-11 commit (e001251) was two years off from the current workspace — Cargo.toml had shifted substantially and the rebase hit conflicts on both Cargo.toml and Cargo.lock. Rather than resolve conflicts line-by-line for a three-file diff, I cherry-picked the essence onto a fresh commit (395a9d2) authored by me with Co-authored-by: @kdy1 for attribution, bumped swc_malloc from the original pin to the current 1.2.5, and fleshed out the rationale with an inline comment at the extern crate site.

Substantively identical to the original change:

  • swc_malloc = "1.2.5" added to workspace deps
  • Crate listed as a dep of pacquet-cli
  • extern crate swc_malloc; at the top of crates/cli/src/lib.rs (activates the crate's embedded #[global_allocator] — mimalloc on macOS/Windows, jemalloc on Linux)

Local just ready passes (205/205). Waiting on integrated-benchmark to measure the payoff against main.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches the CLI to use swc_malloc as the global allocator to improve performance on allocation-heavy workloads.

Changes:

  • Enable swc_malloc in pacquet-cli via a crate-level side-effect import.
  • Add swc_malloc to the CLI crate dependencies.
  • Add swc_malloc to workspace dependencies and lockfile.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
crates/cli/src/lib.rs Imports swc_malloc to activate its global allocator selection.
crates/cli/Cargo.toml Adds swc_malloc as a workspace dependency for the CLI crate.
Cargo.toml Pins swc_malloc version in workspace dependencies.
Cargo.lock Locks swc_malloc and its allocator backends (mimalloc/jemalloc).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/cli/src/lib.rs Outdated
// workload. Activating the crate via `extern crate` is enough —
// `swc_malloc` embeds the `#[global_allocator]` declaration
// itself and picks the per-target backend at compile time.
extern crate swc_malloc;

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Rust 2021, a plain extern crate swc_malloc; item will typically trigger the unused_extern_crates warning (and CI runs clippy with -D warnings). To keep this as an intentional side-effect import, alias it to _ (e.g. extern crate swc_malloc as _;) or switch to use swc_malloc as _;.

Suggested change
extern crate swc_malloc;
extern crate swc_malloc as _;

Copilot uses AI. Check for mistakes.
Comment thread crates/cli/src/lib.rs Outdated
Comment on lines +1 to +9
// Swap the default system allocator for `swc_malloc`, which pulls
// in mimalloc on macOS / Windows and jemalloc on Linux. A package
// manager fan-outs thousands of short-lived `Vec<u8>` / `String` /
// `HashMap` allocations per install (tar entry buffers, CAFS
// paths, snapshot IDs, …); the system allocators on macOS and
// glibc are noticeably slower than mimalloc / jemalloc on that
// workload. Activating the crate via `extern crate` is enough —
// `swc_malloc` embeds the `#[global_allocator]` declaration
// itself and picks the per-target backend at compile time.

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing the allocator-selection dependency in src/lib.rs makes it apply to any binary/test harness that links pacquet-cli as a library, and can create conflicts if another top-level crate wants to set a different global allocator. Consider moving this side-effect import into the actual binary target (src/bin/main.rs) or gating it behind a feature so only the pacquet executable opts into it.

Suggested change
// Swap the default system allocator for `swc_malloc`, which pulls
// in mimalloc on macOS / Windows and jemalloc on Linux. A package
// manager fan-outs thousands of short-lived `Vec<u8>` / `String` /
// `HashMap` allocations per install (tar entry buffers, CAFS
// paths, snapshot IDs, …); the system allocators on macOS and
// glibc are noticeably slower than mimalloc / jemalloc on that
// workload. Activating the crate via `extern crate` is enough —
// `swc_malloc` embeds the `#[global_allocator]` declaration
// itself and picks the per-target backend at compile time.
// Optionally swap the default system allocator for `swc_malloc`,
// which pulls in mimalloc on macOS / Windows and jemalloc on Linux.
// A package manager fan-outs thousands of short-lived `Vec<u8>` /
// `String` / `HashMap` allocations per install (tar entry buffers,
// CAFS paths, snapshot IDs, …); the system allocators on macOS and
// glibc are noticeably slower than mimalloc / jemalloc on that
// workload. Keep this behind a feature so linking `pacquet-cli` as a
// library does not unconditionally install a process-wide global
// allocator for every binary or test harness that depends on it.
#[cfg(feature = "global-allocator")]

Copilot uses AI. Check for mistakes.
The previous commit on this branch picked `swc_malloc`, a
meta-crate that selects mimalloc on macOS / Windows and jemalloc
on Linux at compile time. The first CI integrated-benchmark run
(pnpm#188 at 395a9d2) came back with `pacquet@HEAD` 2% slower than
`pacquet@main` — a 57 ms gap inside noise bands (ratio 1.02 ±
0.04), which is consistent with the 1-2% "second-command wins"
ordering bias we measured separately on pnpm#278.

That reading tells us jemalloc on this Linux runner (glibc 2.35+)
gives us nothing measurable over the system allocator. Before
closing pnpm#188, run one more comparison with mimalloc instead of
jemalloc — mimalloc has a different allocator philosophy
(per-thread free lists + small-object fast path) and is a
stronger fit for the "thousands of short-lived small
allocations" shape a package manager produces.

Diff is minimal:
  * `swc_malloc` → `mimalloc` in workspace + `pacquet-cli` deps.
  * `extern crate swc_malloc;` → explicit
    `#[global_allocator] static GLOBAL: mimalloc::MiMalloc =
    mimalloc::MiMalloc;` in `crates/cli/src/lib.rs`.
  * Comment rewritten to describe what this allocator actually is
    rather than the per-target selection swc_malloc was doing.

If the next CI bench shows mimalloc also within noise of the
system allocator, this PR closes — the allocator swap doesn't
deliver on the Linux runner we measure against. If mimalloc
wins, we land the direct-mimalloc form.

Co-authored-by: 강동윤 (Donny) <kdy1997.dev@gmail.com>
@zkochan

zkochan commented Apr 24, 2026

Copy link
Copy Markdown
Member

First CI bench came back with swc_malloc 2% slower than main on the Linux runner — 57 ms gap, inside noise bands (ratio 1.02 ± 0.04), and consistent with the 1-2% "second-listed command wins" ordering bias we measured separately on #278. Real allocator difference: statistically indistinguishable from zero.

The likely reason: swc_malloc picks jemalloc on Linux, and modern glibc (Ubuntu 24.04 runners ship glibc 2.39) has narrowed the gap to jemalloc substantially. The old "jemalloc is 10-20% faster than glibc" benchmarks are from glibc 2.31-era. pacquet's workload (short-lived buffers + geometric Vec growth + bounded concurrency) doesn't hit jemalloc's classic win cases (contended arenas, long-lived large objects).

Pushed f8388d7 — swap swc_malloc for mimalloc directly, with an explicit #[global_allocator]. Tests mimalloc-vs-glibc instead of jemalloc-vs-glibc. Different allocator philosophy (per-thread free lists + small-object fast path) so it's worth one CI cycle.

Decision rule: if the next bench also shows this PR within noise of main, close it — the allocator swap doesn't deliver on the runner we measure against. If mimalloc wins, land the direct-mimalloc form.

@KSXGitHub

Copy link
Copy Markdown
Contributor

Allocators rarely affect performance. And I don't think this is the performance bottleneck either.

Modern glibc is so good now. Improve upon it is hard. Proving its advantage is not easy.

@zkochan zkochan closed this Apr 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants