Skip to content

perf(ext/node): move most fs.cp logic to rust#32580

Merged
Tango992 merged 28 commits intodenoland:mainfrom
Tango992:perf-node-fs-cp
Mar 10, 2026
Merged

perf(ext/node): move most fs.cp logic to rust#32580
Tango992 merged 28 commits intodenoland:mainfrom
Tango992:perf-node-fs-cp

Conversation

@Tango992
Copy link
Copy Markdown
Contributor

@Tango992 Tango992 commented Mar 8, 2026

The idea is to move most of the logic from the JS side to Rust without changing the behavior. Not all functions are moved, for example since users can pass a function to filter which files get copied, it's much easier to just await it on the JS side. I also grouped some sync syscalls together and execute them in a tokio blocking thread, that in theory should be more efficient than scheduling many small async syscalls.

Performance insights based on my benchmark:

Linux

Deno 2.7.4

➜ deno --v8-flags="--expose-gc" -A cp.js
clk: ~3.76 GHz
cpu: AMD Ryzen 7 3700X 8-Core Processor
runtime: deno 2.7.4 (x86_64-unknown-linux-gnu)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 478.23 ms/iter 490.91 ms   █   █              
                                          (440.06 ms … 560.40 ms) 528.58 ms ███▁▁██▁█▁▁█▁▁█▁▁▁▁▁█
                                        gc(  2.38 ms …   3.81 ms)  18.35 mb ( 17.91 mb… 18.93 mb)
                                         1.02 ipc ( 55.78% cache)   5.12M branch misses
                                1.01G cycles   1.02G instructions 179.14M c-refs  79.22M c-misses

Deno (This PR)

➜ rdeno --v8-flags="--expose-gc" -A cp.js
clk: ~3.72 GHz
cpu: AMD Ryzen 7 3700X 8-Core Processor
runtime: deno 2.7.4 (x86_64-unknown-linux-gnu)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 173.34 ms/iter 177.11 ms          █   █ █     
                                          (154.45 ms … 186.29 ms) 184.71 ms █▁▁▁▁▁▁▁███▁▁█▁█▁▁▁██
                                        gc(  2.27 ms …   3.04 ms)  15.00 mb ( 14.35 mb… 16.16 mb)
                                         1.10 ipc ( 62.68% cache) 956.05k branch misses
                              204.33M cycles 225.42M instructions  42.98M c-refs  16.04M c-misses
  • ~2.8× faster
  • ~80% fewer CPU cycles
  • ~78% fewer instructions
  • ~81% fewer branch mispredictions
  • ~80% fewer cache misses

Node.js 25.8.0

➜ node --expose-gc cp.js
clk: ~3.77 GHz
cpu: AMD Ryzen 7 3700X 8-Core Processor
runtime: node 25.8.0 (x64-linux)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 431.88 ms/iter 437.63 ms █  █     █  █        
                                          (410.46 ms … 480.62 ms) 447.80 ms █▁▁█▁▁▁▁▁█▁▁█▁▁█▁█▁▁█
                                        gc(  2.01 ms …   4.48 ms)   3.53 mb (  3.06 mb…  4.22 mb)
                                         0.94 ipc ( 60.27% cache)   3.60M branch misses
                              662.98M cycles 620.04M instructions 135.25M c-refs  53.73M c-misses

Bun 1.3.10

➜ bun cp.js
clk: ~1.94 GHz
cpu: AMD Ryzen 7 3700X 8-Core Processor
runtime: bun 1.3.10 (x64-linux)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 293.75 ms/iter 307.60 ms ███   ███   █ ██  █ █
                                          (264.81 ms … 326.46 ms) 320.32 ms ███▁▁▁███▁▁▁█▁██▁▁█▁█
                                        gc(821.35 µs …   2.04 ms) 451.00 kb (  0.00  b…  1.41 mb)
                                         1.06 ipc ( 68.17% cache)   1.03M branch misses
                              171.40M cycles 181.01M instructions  36.52M c-refs  11.63M c-misses
macOS

Deno 2.7.4

➜ sudo deno --v8-flags="--expose-gc" -A cp.js                                                 
clk: ~3.11 GHz
cpu: Apple M1 Pro
runtime: deno 2.7.4 (aarch64-apple-darwin)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 502.34 ms/iter 519.81 ms  █▃                  
                                          (474.27 ms … 549.76 ms) 547.60 ms ▆██▁▁▁▁▆▁▁▆▁▆▁▆▁▁▁▁▁▆
                                        gc(  1.77 ms …   2.28 ms)  18.19 mb ( 17.61 mb… 18.91 mb)
                                        2.16 ipc (  1.45% stalls)    NaN% L1 data cache
                              611.62M cycles   1.32G instructions   0.00% retired LD/ST (   0.00)

Deno (This PR)

➜ sudo rdeno --v8-flags="--expose-gc" -A cp.js
clk: ~3.10 GHz
cpu: Apple M1 Pro
runtime: deno 2.7.4 (aarch64-apple-darwin)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 317.60 ms/iter 314.81 ms ▃▃█                  
                                          (307.87 ms … 346.25 ms) 336.79 ms ███▁▆▆▁▁▁▁▁▁▁▁▁▁▁▁▆▁▆
                                        gc(  1.74 ms …   2.35 ms)  14.94 mb ( 14.67 mb… 15.44 mb)
                                        2.32 ipc (  1.43% stalls)    NaN% L1 data cache
                              149.54M cycles 347.46M instructions   0.00% retired LD/ST (   0.00)
  • ~36% faster
  • ~75% fewer CPU cycles
  • ~73% fewer instructions
  • ~17.9% fewer memory collected during GC

Node.js 25.8.0

➜ sudo node --expose-gc cp.js                                                                 
clk: ~3.18 GHz
cpu: Apple M1 Pro
runtime: node 25.8.0 (arm64-darwin)

benchmark                                         avg (min … max) p75 / p99    (min … top 1%)
----------------------------------------------------------------- -------------------------------
fs.promises.cp recursive with deep tree + symlinks 468.14 ms/iter 465.26 ms █  ███               
                                          (457.23 ms … 505.19 ms) 487.65 ms ██▁███▁▁▁█▁▁▁▁▁▁▁▁▁▁█
                                        gc(  1.48 ms …   2.25 ms)   3.38 mb (  3.02 mb…  4.10 mb)
                                        2.18 ipc (  1.52% stalls)    NaN% L1 data cache
                              421.12M cycles 917.95M instructions   0.00% retired LD/ST (   0.00)

@Tango992 Tango992 marked this pull request as ready for review March 8, 2026 12:56
@Tango992 Tango992 added the ci-draft Run the CI on draft PRs. label Mar 8, 2026
@Tango992 Tango992 assigned Tango992 and unassigned Tango992 Mar 8, 2026
@Tango992 Tango992 marked this pull request as draft March 8, 2026 14:26
@Tango992 Tango992 marked this pull request as ready for review March 9, 2026 01:27
@Tango992 Tango992 requested a review from bartlomieju March 9, 2026 05:36
Copy link
Copy Markdown
Contributor

@kajukitli kajukitli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive perf improvements — 2x faster on linux, 4x on macos is significant

a few observations:

the good:

  • batching sync syscalls in spawn_blocking is the right approach
  • error type mapping via CpError enum is clean
  • keeping the filter callback in JS (since it's user-provided) makes sense

questions/concerns:

  1. todo!() in JoinError handling — line ~62 in fs.rs:
if err.is_cancelled() {
  todo!("async tasks must not be cancelled")
}

this will panic if a task is cancelled. is that intentional? might want a proper error type instead.

  1. unwrap_or_default() on path strings — several places like:
resolved_src_buf.to_str().unwrap_or_default().to_string()

if a path contains invalid UTF-8, this silently becomes empty string. might be worth logging or erroring instead.

  1. symlink race condition — in cp_create_symlink, if dest exists it's removed then symlink is created. between these two ops another process could create something at that path. the original JS had the same issue, so not a regression, just noting it.

  2. #[property] on errors — the NodeFsError struct uses #[property] attributes which is cool. is that a deno_error macro? might want to document it since it's not standard thiserror.

  3. missing cp_sync optimization — this PR only optimizes the async path. the sync version in cp_sync.ts still uses the old JS implementation. intentional? might be worth a follow-up.

minor:

  • the CpError enum's kind(), message(), and path() methods all match on every variant — could use a macro to reduce repetition

overall this looks good, the benchmark numbers speak for themselves.

@Tango992 Tango992 marked this pull request as draft March 9, 2026 14:13
@Tango992 Tango992 marked this pull request as ready for review March 10, 2026 01:17
@Tango992
Copy link
Copy Markdown
Contributor Author

I also realized that running cargo test -p deno_node now fails because we're using spawn_blocking and the FileSystem has a type of Lrc<dyn FileSystem> during test. The fix is simple, that is to enable sync_fs: cargo test -p deno_node --features sync_fs.

Is it okay if I adjust the Cargo.toml to enable the sync_fs as default a flag on deno_node?

deno/ext/node/Cargo.toml

Lines 16 to 17 in 532b307

[features]
sync_fs = ["deno_fs/sync_fs", "deno_package_json/sync", "node_resolver/sync"]

@Tango992 Tango992 requested a review from bartlomieju March 10, 2026 01:25
We've parsed the options before using the `validateCpOptions`
Copy link
Copy Markdown
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this

@Tango992 Tango992 merged commit f33915b into denoland:main Mar 10, 2026
222 of 224 checks passed
@Tango992 Tango992 deleted the perf-node-fs-cp branch March 10, 2026 11:43
bartlomieju added a commit that referenced this pull request Mar 24, 2026
…ons (#32927)

## Summary

- Fix `node:fs.cp` failing with `ENOENT` when copying between allowed
directories while read permissions outside those directories are ignored
  (`"ignore": true` in deno.json permissions config)
- The `check_parent_paths` walk-up now handles `NotFound` from
permission checks (ignored denials) the same way it handles `NotFound` from `stat`
by stopping the ancestor traversal

**Root cause**: When `fs.cp` was moved from JS to Rust (#32580, #32687),
the parent path validation started propagating ignored-permission `ENOENT`
as a hard error instead of treating it as a stop signal. The old JS code's
`safeStatFn` caught `NotFound` and returned `undefined`, gracefully
stopping the walk. The Rust code's `check_cp_path()?` propagated it before
reaching the stat call's own `NotFound` handler.

Closes #32923

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run the CI on draft PRs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants