Skip to content

[turbopack] add a new hasher implementation to eliminate allocations#89059

Merged
lukesandberg merged 1 commit intocanaryfrom
persistent_hash
Mar 4, 2026
Merged

[turbopack] add a new hasher implementation to eliminate allocations#89059
lukesandberg merged 1 commit intocanaryfrom
persistent_hash

Conversation

@lukesandberg
Copy link
Contributor

@lukesandberg lukesandberg commented Jan 26, 2026

We only store hashes for TaskTypes in the TaskCache keyspace, so this introduces an allocation free path to compute it.

For writing out the TaskCache this wont make much of a difference since we were reusing a scratch buffer, but this should eliminate a source of allocations from the read path.

This does of course cause a small binary size regression 157M (160,952K) -> 158M (161,800K) = +1M (+848K)

@nextjs-bot nextjs-bot added created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js. labels Jan 26, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 26, 2026

Merging this PR will not alter performance

✅ 17 untouched benchmarks
⏩ 3 skipped benchmarks1


Comparing persistent_hash (952fbad) with canary (1f45f98)2

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on canary (34fdcbb) during the generation of this report, so 1f45f98 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@nextjs-bot
Copy link
Collaborator

nextjs-bot commented Jan 26, 2026

Stats from current PR

✅ No significant changes detected

📊 All Metrics
📖 Metrics Glossary

Dev Server Metrics:

  • Listen = TCP port starts accepting connections
  • First Request = HTTP server returns successful response
  • Cold = Fresh build (no cache)
  • Warm = With cached build artifacts

Build Metrics:

  • Fresh = Clean build (no .next directory)
  • Cached = With existing .next directory

Change Thresholds:

  • Time: Changes < 50ms AND < 10%, OR < 2% are insignificant
  • Size: Changes < 1KB AND < 1% are insignificant
  • All other changes are flagged to catch regressions

⚡ Dev Server

Metric Canary PR Change Trend
Cold (Listen) 455ms 455ms ▇▁▁▁▁
Cold (Ready in log) 438ms 438ms ▇▁▁▁▁
Cold (First Request) 1.304s 1.241s ▇▁▁▂▂
Warm (Listen) 457ms 456ms █▁▁▁▁
Warm (Ready in log) 441ms 441ms █▁▁▁▁
Warm (First Request) 348ms 347ms █▁▁▁▁
📦 Dev Server (Webpack) (Legacy)

📦 Dev Server (Webpack)

Metric Canary PR Change Trend
Cold (Listen) 455ms 455ms ▁▁▁▁▁
Cold (Ready in log) 434ms 434ms ▃▄▃▄▃
Cold (First Request) 1.944s 1.906s ▂▂▂▂▂
Warm (Listen) 456ms 455ms ▁▁▁▁▁
Warm (Ready in log) 434ms 434ms ▃▃▃▃▃
Warm (First Request) 1.915s 1.904s ▁▁▂▂▁

⚡ Production Builds

Metric Canary PR Change Trend
Fresh Build 3.903s 3.861s ▆▁▁▁▁
Cached Build 3.863s 3.863s ▇▁▁▁▁
📦 Production Builds (Webpack) (Legacy)

📦 Production Builds (Webpack)

Metric Canary PR Change Trend
Fresh Build 13.908s 13.911s ▁▁▁▁▁
Cached Build 13.998s 13.999s ▁▁▁▁▁
node_modules Size 476 MB 476 MB ▁▁▁▁▁
📦 Bundle Sizes

Bundle Sizes

⚡ Turbopack

Client

Main Bundles: **401 kB** → **401 kB** ✅ -21 B

80 files with content-based hashes (individual files not comparable between builds)

Server

Middleware
Canary PR Change
middleware-b..fest.js gzip 770 B 765 B
Total 770 B 765 B ✅ -5 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 450 B 452 B
Total 450 B 452 B ⚠️ +2 B

📦 Webpack

Client

Main Bundles
Canary PR Change
5528-HASH.js gzip 5.54 kB N/A -
6280-HASH.js gzip 58.7 kB N/A -
6335.HASH.js gzip 169 B N/A -
912-HASH.js gzip 4.59 kB N/A -
e8aec2e4-HASH.js gzip 62.6 kB N/A -
framework-HASH.js gzip 59.7 kB 59.7 kB
main-app-HASH.js gzip 255 B 254 B
main-HASH.js gzip 39.1 kB 39.1 kB
webpack-HASH.js gzip 1.68 kB 1.68 kB
262-HASH.js gzip N/A 4.59 kB -
2889.HASH.js gzip N/A 169 B -
5602-HASH.js gzip N/A 5.55 kB -
6948ada0-HASH.js gzip N/A 62.6 kB -
9544-HASH.js gzip N/A 59.5 kB -
Total 232 kB 233 kB ⚠️ +723 B
Polyfills
Canary PR Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Total 39.4 kB 39.4 kB
Pages
Canary PR Change
_app-HASH.js gzip 194 B 194 B
_error-HASH.js gzip 183 B 180 B 🟢 3 B (-2%)
css-HASH.js gzip 331 B 330 B
dynamic-HASH.js gzip 1.81 kB 1.81 kB
edge-ssr-HASH.js gzip 256 B 256 B
head-HASH.js gzip 351 B 352 B
hooks-HASH.js gzip 384 B 383 B
image-HASH.js gzip 580 B 581 B
index-HASH.js gzip 260 B 260 B
link-HASH.js gzip 2.51 kB 2.51 kB
routerDirect..HASH.js gzip 320 B 319 B
script-HASH.js gzip 386 B 386 B
withRouter-HASH.js gzip 315 B 315 B
1afbb74e6ecf..834.css gzip 106 B 106 B
Total 7.98 kB 7.98 kB ✅ -1 B

Server

Edge SSR
Canary PR Change
edge-ssr.js gzip 125 kB 125 kB
page.js gzip 254 kB 255 kB
Total 379 kB 380 kB ⚠️ +875 B
Middleware
Canary PR Change
middleware-b..fest.js gzip 617 B 614 B
middleware-r..fest.js gzip 156 B 155 B
middleware.js gzip 43.8 kB 43.9 kB
edge-runtime..pack.js gzip 842 B 842 B
Total 45.4 kB 45.5 kB ⚠️ +75 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 715 B 718 B
Total 715 B 718 B ⚠️ +3 B
Build Cache
Canary PR Change
0.pack gzip 4.06 MB 4.06 MB
index.pack gzip 103 kB 103 kB
index.pack.old gzip 104 kB 103 kB 🟢 1.49 kB (-1%)
Total 4.27 MB 4.27 MB ⚠️ +2.61 kB

🔄 Shared (bundler-independent)

Runtimes
Canary PR Change
app-page-exp...dev.js gzip 321 kB 321 kB
app-page-exp..prod.js gzip 170 kB 170 kB
app-page-tur...dev.js gzip 320 kB 320 kB
app-page-tur..prod.js gzip 170 kB 170 kB
app-page-tur...dev.js gzip 317 kB 317 kB
app-page-tur..prod.js gzip 168 kB 168 kB
app-page.run...dev.js gzip 317 kB 317 kB
app-page.run..prod.js gzip 168 kB 168 kB
app-route-ex...dev.js gzip 70.8 kB 70.8 kB
app-route-ex..prod.js gzip 49.3 kB 49.3 kB
app-route-tu...dev.js gzip 70.9 kB 70.9 kB
app-route-tu..prod.js gzip 49.3 kB 49.3 kB
app-route-tu...dev.js gzip 70.5 kB 70.5 kB
app-route-tu..prod.js gzip 49 kB 49 kB
app-route.ru...dev.js gzip 70.4 kB 70.4 kB
app-route.ru..prod.js gzip 49 kB 49 kB
dist_client_...dev.js gzip 324 B 324 B
dist_client_...dev.js gzip 326 B 326 B
dist_client_...dev.js gzip 318 B 318 B
dist_client_...dev.js gzip 317 B 317 B
pages-api-tu...dev.js gzip 43.2 kB 43.2 kB
pages-api-tu..prod.js gzip 32.9 kB 32.9 kB
pages-api.ru...dev.js gzip 43.2 kB 43.2 kB
pages-api.ru..prod.js gzip 32.9 kB 32.9 kB
pages-turbo....dev.js gzip 52.6 kB 52.6 kB
pages-turbo...prod.js gzip 38.5 kB 38.5 kB
pages.runtim...dev.js gzip 52.6 kB 52.6 kB
pages.runtim..prod.js gzip 38.5 kB 38.5 kB
server.runti..prod.js gzip 62 kB 62 kB
Total 2.83 MB 2.83 MB ✅ -1 B
📎 Tarball URL
https://vercel-packages.vercel.app/next/commits/952fbadd0f211530271f1a27b24bb2cc37bdd213/next

@lukesandberg lukesandberg force-pushed the family_size_configuration branch from 1750d5e to 83303ff Compare January 26, 2026 19:07
@lukesandberg lukesandberg force-pushed the persistent_hash branch 2 times, most recently from 1eb984c to d1287e0 Compare January 26, 2026 21:49
@lukesandberg lukesandberg force-pushed the family_size_configuration branch from 83303ff to a974ae3 Compare January 26, 2026 21:49
@nextjs-bot
Copy link
Collaborator

nextjs-bot commented Jan 26, 2026

Tests Passed

@lukesandberg lukesandberg force-pushed the family_size_configuration branch from a974ae3 to 9dfbd35 Compare January 26, 2026 23:27
@lukesandberg lukesandberg changed the base branch from family_size_configuration to graphite-base/89059 January 27, 2026 00:39
@lukesandberg lukesandberg changed the base branch from graphite-base/89059 to remove_backend_field January 27, 2026 00:40
@lukesandberg lukesandberg changed the title add a new hasher implemntation to eliminate allocations [turbopack] add a new hasher implementation to eliminate allocations Jan 27, 2026
@lukesandberg lukesandberg force-pushed the remove_backend_field branch 2 times, most recently from 92efbfb to 5f9077c Compare January 27, 2026 01:06
Copy link
Contributor Author

DeterministicHash is used for creating filenames, and so it isn't appropriate for TaskInputs. (hashing VCs is explicitly called out as verboten)

so i would need to produce a new trait 'GoodEnoughForTurboPersistenceHash', and then derive that for TaskInput. This approach seemed simpler, but i'll admit that that might be a bit nicer.

@lukesandberg lukesandberg marked this pull request as ready for review February 12, 2026 01:19
@lukesandberg lukesandberg force-pushed the direct_reverse_mapping branch from 1e6270a to 53a30a8 Compare March 1, 2026 23:49
@lukesandberg lukesandberg force-pushed the direct_reverse_mapping branch 2 times, most recently from f98c5f2 to 153f568 Compare March 2, 2026 03:36
@lukesandberg lukesandberg force-pushed the persistent_hash branch 3 times, most recently from 0b363db to d858438 Compare March 2, 2026 21:48
@lukesandberg lukesandberg force-pushed the direct_reverse_mapping branch from ab8eb1e to 4df1745 Compare March 2, 2026 21:48
@lukesandberg lukesandberg force-pushed the direct_reverse_mapping branch 2 times, most recently from 3c9b033 to 9603703 Compare March 3, 2026 18:17
@lukesandberg lukesandberg changed the base branch from direct_reverse_mapping to graphite-base/89059 March 3, 2026 21:39
lukesandberg added a commit that referenced this pull request Mar 3, 2026
Save space in the persisent store by only recording TaskType one time instead of twice.

## What?
Instead of using an encoded TaskType struct as a key in persistent storage just use a 64 bit hash.   Now when looking up we actually need to do a cascading read

* read task ids that match the hash
* restore all the tasks until we find one that matches our CachedTaskType

Full cache misses perform the same and so do cache hits since we always end up reading the TaskData anyway (in the `connect_child` operation that immediately follows).   This just slightly changes _when_ that second read happens, as such we shouldn't really expect it to slow down.

In the case of a hash collisions we do end up doing strictly more work (reading and restoring TaskData entries for the wrong task), but this work is cached and this should be _extremely_ rare assuming a good hash function..

From measuring vercel-site this saves ~231M of data in the persistent cache. (The cache goes from from 3846M ->  3615 or -231M or about -6%).

Finally this also addresses a narrow race condition where two racing calls to `get_persistent_task_id` for the same task could result in two entries being pusshed to the `persistent_task_log`, that is now addressed as well.

## Why?

Currently we encode 2 copies of every `CachedTaskType` in the database. 
1. as the key of the `TaskType`->`TaskId` map (aka `TaskCache` keyspace)
2. as a part of the `TaskStorage` struct stored in the `TaskData` keyspace

This redundancy is wasteful.  Instead we can make the `TaskCache` map much smaller and add a bit of complexity to lookups.



## Future Work

### Better hash functions
Right now to compute the hashes we are just running `encode` and then hashing the bytes.  This is not optimal, but we do not have a hash function that is suitable for this usecase.  So we should create a new `PersistentHash` trait that TaskInputs implement in order to support this without encoding. Or perhaps a custom _encoder_ that accumulates encoding data to a hasher.  This will be addressed in #89059

### New SST ffile heuristics

 now that the TaskcCache keyspace is smaller our heuristics on 'maxiumum number of keys in a file' need to be rethought since we are now producing lots of 7Mb files for the taskcache.  this will be addressed in #89058

### New compaction semantics

Right now we tolerate duplicates in the database but compaction will delete them.  This is not too harmful for us since it means if there is a hash coliision we will tend to lose one of the results over time.

A better option would be to change compaction semantics for this KeySpace to either tolerate duplicates, leverage values for comparison, or something wilder where we simply 'recompute' the TaskCache instead of compacting it.   This will be addressed in #89075
@lukesandberg lukesandberg force-pushed the graphite-base/89059 branch from 02b7412 to 34fdcbb Compare March 3, 2026 21:40
@graphite-app graphite-app bot changed the base branch from graphite-base/89059 to canary March 3, 2026 21:41
@lukesandberg lukesandberg merged commit 18b9463 into canary Mar 4, 2026
288 of 291 checks passed
Copy link
Contributor Author

Merge activity

@lukesandberg lukesandberg deleted the persistent_hash branch March 4, 2026 00:20
sokra pushed a commit that referenced this pull request Mar 6, 2026
…89059)

We only store hashes for TaskTypes in the TaskCache keyspace, so this introduces an allocation free path to compute it. 

For writing out the TaskCache this wont make much of a difference since we were reusing a scratch buffer, but this should eliminate a source of allocations from the read path.

This does of course cause a small binary size regression `157M (160,952K) -> 158M (161,800K)  = +1M (+848K)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants