Skip to content

experimental.prefetchInlining: bundle segment prefetches into a single response#90555

Merged
acdlite merged 1 commit intovercel:canaryfrom
acdlite:acdlite/prefetch-inline-option
Mar 1, 2026
Merged

experimental.prefetchInlining: bundle segment prefetches into a single response#90555
acdlite merged 1 commit intovercel:canaryfrom
acdlite:acdlite/prefetch-inline-option

Conversation

@acdlite
Copy link
Contributor

@acdlite acdlite commented Feb 26, 2026

Background

Next.js 16 introduced per-segment prefetching through the Client Segment Cache. Rather than fetching all data for a route in a single request, the client issues individual requests for each segment in the route tree. This design improves cache efficiency: shared layouts between sibling routes (e.g., /dashboard/settings and /dashboard/profile sharing a /dashboard layout) are fetched once and reused from the client cache, avoiding redundant data transfer.

The trade-off is request volume. A route with N segments now produces N prefetch requests instead of one. Users upgrading from older versions notice significantly more network activity in their devtools, even though the total bytes transferred may be similar or lower due to deduplication.

Per-segment fetching is still a reasonable default for many sites. These prefetch requests are served from cache, they're fast, and they run in parallel. The main scenario where the trade-off breaks down is deployment environments that charge per-request. But even setting aside cost, there is a theoretical performance threshold where very small segments are better off inlined — the per-request overhead (connection setup, headers, scheduling) exceeds the cost of transferring duplicate bytes. This is analogous to JS bundlers, which inline small modules rather than creating separate chunks, because the overhead of an additional script tag or dynamic import outweighs the bytes saved.

What this change does

This adds experimental.prefetchInlining, a boolean option in next.config.js. When enabled, the server bundles all segment data for a route into a single /_inlined response rather than serving each segment individually. The tree prefetch (/_tree), which provides route structure metadata, remains a separate request — but optimistic routing (#88965) eliminates that request entirely by predicting the route structure client-side. With both features enabled, prefetching is effectively one request per link.

The fundamental trade-off is straightforward: inlining reduces request count at the cost of deduplication. Each inlined response includes its own copy of any shared layout data, so two sibling routes will each transfer the shared layout rather than sharing a single cached copy. This is the same trade-off that compilers and bundlers face when deciding whether to inline a function: inlining eliminates the overhead of indirection (here, extra HTTP requests) but increases total size when the same data appears in multiple call sites.

Future direction

The boolean flag is a stepping stone. The observation that there is a natural size threshold below which inlining is strictly better — where per-request overhead dominates the cost of any duplicate bytes — points toward a size-based heuristic, analogous to how compilers choose an inlining threshold. Small segments would be inlined automatically; segments exceeding a byte threshold would be "outlined" into separate requests where deduplication can take effect. For most applications, this would require no configuration. For applications with specific latency or bandwidth constraints, an option to adjust the threshold would let developers tune their position on the requests-vs-bytes curve. Adaptive heuristics based on network conditions are also possible, though further out.

@acdlite acdlite force-pushed the acdlite/prefetch-inline-option branch from 2ecf667 to a98945e Compare February 26, 2026 02:37
@nextjs-bot
Copy link
Collaborator

nextjs-bot commented Feb 26, 2026

Tests Passed

@acdlite acdlite force-pushed the acdlite/prefetch-inline-option branch from a98945e to 5e24338 Compare February 26, 2026 03:36
@acdlite acdlite marked this pull request as ready for review February 26, 2026 03:57
@acdlite acdlite force-pushed the acdlite/prefetch-inline-option branch from 5e24338 to 66aeca4 Compare February 26, 2026 03:58
@acdlite acdlite force-pushed the acdlite/prefetch-inline-option branch from 66aeca4 to 00dabef Compare February 26, 2026 04:17
…e response

## Background

Next.js 16 introduced per-segment prefetching through the Client
Segment Cache. Rather than fetching all data for a route in a single
request, the client issues individual requests for each segment in
the route tree. This design improves cache efficiency: shared
layouts between sibling routes (e.g., /dashboard/settings and
/dashboard/profile sharing a /dashboard layout) are fetched once
and reused from the client cache, avoiding redundant data transfer.

The trade-off is request volume. A route with N segments now
produces N prefetch requests instead of one. Users upgrading from
older versions notice significantly more network activity in their
devtools, even though the total bytes transferred may be similar
or lower due to deduplication.

Per-segment fetching is still a reasonable default for many sites.
These prefetch requests are served from cache, they're fast, and
they run in parallel. The main scenario where the trade-off breaks
down is deployment environments that charge per-request. But even
setting aside cost, there is a theoretical performance threshold
where very small segments are better off inlined — the per-request
overhead (connection setup, headers, scheduling) exceeds the cost
of transferring duplicate bytes. This is analogous to JS bundlers,
which inline small modules rather than creating separate chunks,
because the overhead of an additional script tag or dynamic import
outweighs the bytes saved.

## What this change does

This adds `experimental.prefetchInlining`, a boolean option in
next.config.js. When enabled, the server bundles all segment data
for a route into a single `/_inlined` response rather than serving
each segment individually. The tree prefetch (`/_tree`), which
provides route structure metadata, remains a separate request —
but optimistic routing (vercel#88965) eliminates that request entirely
by predicting the route structure client-side. With both features
enabled, prefetching is effectively one request per link.

The fundamental trade-off is straightforward: inlining reduces
request count at the cost of deduplication. Each inlined response
includes its own copy of any shared layout data, so two sibling
routes will each transfer the shared layout rather than sharing a
single cached copy. This is the same trade-off that compilers and
bundlers face when deciding whether to inline a function: inlining
eliminates the overhead of indirection (here, extra HTTP requests)
but increases total size when the same data appears in multiple
call sites.

## Future direction

The boolean flag is a stepping stone. The observation that there
is a natural size threshold below which inlining is strictly
better — where per-request overhead dominates the cost of any
duplicate bytes — points toward a size-based heuristic, analogous
to how compilers choose an inlining threshold. Small segments
would be inlined automatically; segments exceeding a byte
threshold would be "outlined" into separate requests where
deduplication can take effect. For most applications, this would
require no configuration. For applications with specific latency
or bandwidth constraints, an option to adjust the threshold would
let developers tune their position on the requests-vs-bytes
curve. Adaptive heuristics based on network conditions are also
possible, though further out.
@acdlite acdlite force-pushed the acdlite/prefetch-inline-option branch from 00dabef to 52c84e8 Compare February 26, 2026 04:31
@acdlite acdlite merged commit 9ce8d13 into vercel:canary Mar 1, 2026
268 of 270 checks passed
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 4, 2026
Add a build-time measurement pass that decides which route segments
should have their prefetch data bundled into a descendant's response
vs fetched as separate HTTP requests.

Background

Per-segment prefetching in the Client Segment Cache improves cache
efficiency by letting shared layouts be fetched once and reused across
sibling routes. The trade-off is request volume: a route with N
segments produces N prefetch requests. Below some compressed size,
the per-request overhead (HTTP headers, stream framing, CDN cache
lookup latency) dominates the deduplication benefit of a separate
cacheable response, and bundling is strictly better. This is the same
trade-off that JS bundlers face when choosing an inlining threshold.

This change implements the size-based heuristic described as a future
direction in vercel#90555. Rather than an all-or-nothing boolean flag, each
segment is individually evaluated: small segments are inlined into a
descendant's response, large segments are "outlined" into their own
cacheable responses.

Algorithm

At build time, a measurement pass renders each segment's prefetch
response and measures its gzip size. A parent-first traversal then
decides which segments to inline:

- A segment whose gzip size exceeds a per-segment threshold (2KB) is
  never inlined — it always gets its own response where CDN caching
  and cross-route deduplication can take effect.
- A segment below the threshold offers its size to its children. The
  deepest accepting descendant bundles the ancestor data into its own
  response, eliminating separate fetches for the inlined ancestors.
- A cumulative budget (10KB) caps the total ancestor bytes inlined
  into any single response, preventing long chains of small segments
  from producing oversized responses.
- Parents are inlined into children (not the reverse) because parent
  segments are more likely shared across routes. Keeping the parent's
  standalone response clean preserves its deduplication value.
- In parallel routes, the parent is inlined into only one child slot
  to avoid duplicating data across sibling responses.
- Multi-level chains are supported: if root, layout, and page are all
  small, all three collapse into a single fetch for the page segment.

The hints are persisted to a build manifest (prefetch-hints.json) and
loaded at server startup. They're embedded into the route tree
prefetch response (/_tree) so the client knows which segments to skip.

Implementation

The main algorithm lives in collectPrefetchHintsImpl in
collect-segment-data.tsx. It produces two per-segment flags:

- InlinedIntoChild: this segment's data lives in a descendant's
  response; the client should not fetch it separately.
- ParentInlinedIntoSelf: this segment's response includes ancestor
  data; the client can expect a larger-than-usual response.

The hints flow through two paths:

1. Build time: collectSegmentData unions the freshly computed hints
   into the TreePrefetch nodes (the FlightRouterState in the buffer
   doesn't have them yet because the measurement pass runs after the
   initial pre-render).
2. Runtime: the hints manifest is loaded at server startup and passed
   to createFlightRouterStateFromLoaderTree, which unions them into
   the FlightRouterState for dynamic /_tree responses.

Tests cover the key scenarios: small chains that fully collapse,
large segments that break the chain, parallel routes where only one
slot accepts, and deep multi-level inlining. Test output uses ASCII
tree snapshots that visualize the inlining decisions at a glance.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 4, 2026
Add a build-time measurement pass that decides which route segments
should have their prefetch data bundled into a descendant's response
vs fetched as separate HTTP requests.

This PR only computes and outputs the hints — no client-side behavior
changes yet. The hints are embedded in the route tree prefetch
response (/_tree), but the client router does not act on them.
Subsequent PRs will wire up the client to use InlinedIntoChild and
ParentInlinedIntoSelf when scheduling prefetch requests.

Background

Per-segment prefetching in the Client Segment Cache improves cache
efficiency by letting shared layouts be fetched once and reused across
sibling routes. The trade-off is request volume: a route with N
segments produces N prefetch requests. Below some compressed size,
the per-request overhead (HTTP headers, stream framing, CDN cache
lookup latency) dominates the deduplication benefit of a separate
cacheable response, and bundling is strictly better. This is the same
trade-off that JS bundlers face when choosing an inlining threshold.

This change implements the size-based heuristic described as a future
direction in vercel#90555. Rather than an all-or-nothing boolean flag, each
segment is individually evaluated: small segments are inlined into a
descendant's response, large segments are "outlined" into their own
cacheable responses.

Algorithm

At build time, a measurement pass renders each segment's prefetch
response and measures its gzip size. A parent-first traversal then
decides which segments to inline:

- A segment whose gzip size exceeds a per-segment threshold (2KB) is
  never inlined — it always gets its own response where CDN caching
  and cross-route deduplication can take effect.
- A segment below the threshold offers its size to its children. The
  deepest accepting descendant bundles the ancestor data into its own
  response, eliminating separate fetches for the inlined ancestors.
- A cumulative budget (10KB) caps the total ancestor bytes inlined
  into any single response, preventing long chains of small segments
  from producing oversized responses.
- Parents are inlined into children (not the reverse) because parent
  segments are more likely shared across routes. Keeping the parent's
  standalone response clean preserves its deduplication value.
- In parallel routes, the parent is inlined into only one child slot
  to avoid duplicating data across sibling responses.
- Multi-level chains are supported: if root, layout, and page are all
  small, all three collapse into a single fetch for the page segment.

The hints are persisted to a build manifest (prefetch-hints.json) and
loaded at server startup. They're embedded into the route tree
prefetch response (/_tree) so the client knows which segments to skip.

Implementation

The main algorithm lives in collectPrefetchHintsImpl in
collect-segment-data.tsx. It produces two per-segment flags:

- InlinedIntoChild: this segment's data lives in a descendant's
  response; the client should not fetch it separately.
- ParentInlinedIntoSelf: this segment's response includes ancestor
  data; the client can expect a larger-than-usual response.

The hints flow through two paths:

1. Build time: collectSegmentData unions the freshly computed hints
   into the TreePrefetch nodes (the FlightRouterState in the buffer
   doesn't have them yet because the measurement pass runs after the
   initial pre-render).
2. Runtime: the hints manifest is loaded at server startup and passed
   to createFlightRouterStateFromLoaderTree, which unions them into
   the FlightRouterState for dynamic /_tree responses.

Tests cover the key scenarios: small chains that fully collapse,
large segments that break the chain, parallel routes where only one
slot accepts, and deep multi-level inlining. Test output uses ASCII
tree snapshots that visualize the inlining decisions at a glance.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 4, 2026
Add a build-time measurement pass that decides which route segments
should have their prefetch data bundled into a descendant's response
vs fetched as separate HTTP requests.

This PR only computes and outputs the hints — no client-side behavior
changes yet. The hints are embedded in the route tree prefetch
response (/_tree), but the client router does not act on them.
Subsequent PRs will wire up the client to use InlinedIntoChild and
ParentInlinedIntoSelf when scheduling prefetch requests.

Background

Per-segment prefetching in the Client Segment Cache improves cache
efficiency by letting shared layouts be fetched once and reused across
sibling routes. The trade-off is request volume: a route with N
segments produces N prefetch requests. Below some compressed size,
the per-request overhead (HTTP headers, stream framing, CDN cache
lookup latency) dominates the deduplication benefit of a separate
cacheable response, and bundling is strictly better. This is the same
trade-off that JS bundlers face when choosing an inlining threshold.

This change implements the size-based heuristic described as a future
direction in vercel#90555. Rather than an all-or-nothing boolean flag, each
segment is individually evaluated: small segments are inlined into a
descendant's response, large segments are "outlined" into their own
cacheable responses.

Algorithm

At build time, a measurement pass renders each segment's prefetch
response and measures its gzip size. A parent-first traversal then
decides which segments to inline:

- A segment whose gzip size exceeds a per-segment threshold (2KB) is
  never inlined — it always gets its own response where CDN caching
  and cross-route deduplication can take effect.
- A segment below the threshold offers its size to its children. The
  deepest accepting descendant bundles the ancestor data into its own
  response, eliminating separate fetches for the inlined ancestors.
- A cumulative budget (10KB) caps the total ancestor bytes inlined
  into any single response, preventing long chains of small segments
  from producing oversized responses.
- Parents are inlined into children (not the reverse) because parent
  segments are more likely shared across routes. Keeping the parent's
  standalone response clean preserves its deduplication value.
- In parallel routes, the parent is inlined into only one child slot
  to avoid duplicating data across sibling responses.
- Multi-level chains are supported: if root, layout, and page are all
  small, all three collapse into a single fetch for the page segment.

The hints are persisted to a build manifest (prefetch-hints.json) and
loaded at server startup. They're embedded into the route tree
prefetch response (/_tree) so the client knows which segments to skip.

Implementation

The main algorithm lives in collectPrefetchHintsImpl in
collect-segment-data.tsx. It produces two per-segment flags:

- InlinedIntoChild: this segment's data lives in a descendant's
  response; the client should not fetch it separately.
- ParentInlinedIntoSelf: this segment's response includes ancestor
  data; the client can expect a larger-than-usual response.

The hints flow through two paths:

1. Build time: collectSegmentData unions the freshly computed hints
   into the TreePrefetch nodes (the FlightRouterState in the buffer
   doesn't have them yet because the measurement pass runs after the
   initial pre-render).
2. Runtime: the hints manifest is loaded at server startup and passed
   to createFlightRouterStateFromLoaderTree, which unions them into
   the FlightRouterState for dynamic /_tree responses.

Tests cover the key scenarios: small chains that fully collapse,
large segments that break the chain, parallel routes where only one
slot accepts, and deep multi-level inlining. Test output uses ASCII
tree snapshots that visualize the inlining decisions at a glance.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 5, 2026
Add a build-time measurement pass that decides which route segments
should have their prefetch data bundled into a descendant's response
vs fetched as separate HTTP requests.

This PR only computes and outputs the hints — no client-side behavior
changes yet. The hints are embedded in the route tree prefetch
response (/_tree), but the client router does not act on them.
Subsequent PRs will wire up the client to use InlinedIntoChild and
ParentInlinedIntoSelf when scheduling prefetch requests.

Background

Per-segment prefetching in the Client Segment Cache improves cache
efficiency by letting shared layouts be fetched once and reused across
sibling routes. The trade-off is request volume: a route with N
segments produces N prefetch requests. Below some compressed size,
the per-request overhead (HTTP headers, stream framing, CDN cache
lookup latency) dominates the deduplication benefit of a separate
cacheable response, and bundling is strictly better. This is the same
trade-off that JS bundlers face when choosing an inlining threshold.

This change implements the size-based heuristic described as a future
direction in vercel#90555. Rather than an all-or-nothing boolean flag, each
segment is individually evaluated: small segments are inlined into a
descendant's response, large segments are "outlined" into their own
cacheable responses.

Algorithm

At build time, a measurement pass renders each segment's prefetch
response and measures its gzip size. A parent-first traversal then
decides which segments to inline:

- A segment whose gzip size exceeds a per-segment threshold (2KB) is
  never inlined — it always gets its own response where CDN caching
  and cross-route deduplication can take effect.
- A segment below the threshold offers its size to its children. The
  deepest accepting descendant bundles the ancestor data into its own
  response, eliminating separate fetches for the inlined ancestors.
- A cumulative budget (10KB) caps the total ancestor bytes inlined
  into any single response, preventing long chains of small segments
  from producing oversized responses.
- Parents are inlined into children (not the reverse) because parent
  segments are more likely shared across routes. Keeping the parent's
  standalone response clean preserves its deduplication value.
- In parallel routes, the parent is inlined into only one child slot
  to avoid duplicating data across sibling responses.
- Multi-level chains are supported: if root, layout, and page are all
  small, all three collapse into a single fetch for the page segment.

The hints are persisted to a build manifest (prefetch-hints.json) and
loaded at server startup. They're embedded into the route tree
prefetch response (/_tree) so the client knows which segments to skip.

Implementation

The main algorithm lives in collectPrefetchHintsImpl in
collect-segment-data.tsx. It produces two per-segment flags:

- InlinedIntoChild: this segment's data lives in a descendant's
  response; the client should not fetch it separately.
- ParentInlinedIntoSelf: this segment's response includes ancestor
  data; the client can expect a larger-than-usual response.

The hints flow through two paths:

1. Build time: collectSegmentData unions the freshly computed hints
   into the TreePrefetch nodes (the FlightRouterState in the buffer
   doesn't have them yet because the measurement pass runs after the
   initial pre-render).
2. Runtime: the hints manifest is loaded at server startup and passed
   to createFlightRouterStateFromLoaderTree, which unions them into
   the FlightRouterState for dynamic /_tree responses.

Tests cover the key scenarios: small chains that fully collapse,
large segments that break the chain, parallel routes where only one
slot accepts, and deep multi-level inlining. Test output uses ASCII
tree snapshots that visualize the inlining decisions at a glance.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 6, 2026
The Client Segment Cache splits page prefetches into per-segment
requests so that shared layouts can be cached independently and
reused across sibling routes. A page with N segments produces N
separate prefetch requests, each individually cacheable at the CDN
edge. This is a significant improvement over the previous approach
of fetching all page data in a single request — navigating between
sibling routes that share a layout no longer re-downloads the
layout's data, because it's already cached from a previous
navigation.

The trade-off is request volume. Below some compressed size, the
per-request overhead (HTTP headers, connection setup, CDN cache
lookup latency) exceeds the deduplication benefit of a standalone
cacheable response. This is the same trade-off that JS bundlers
face when deciding whether to inline a module into a chunk vs
keeping it as a separate file: below a threshold, inlining is
strictly better.

The previous commit (vercel#90555) introduced a build-time measurement
pass that computes per-segment gzip sizes and decides which segments
should be bundled together. This commit wires up both the server and
client to act on those decisions at request time.

## Static cacheability

This bundling optimization applies specifically to static prefetch
responses — the ones served from the CDN. These responses must be
statically cacheable, so the bundling decisions can't vary per
request. This is why the hints are computed once at build time and
remain fixed for the lifetime of the deployment. They're persisted
to a manifest and embedded into the route tree prefetch response,
so the client always sees the same bundling shape for a given build.

Hints are not recomputed during ISR/revalidation either. If
revalidation could change the hints, the client would need to
re-fetch the route tree after every revalidation to learn about
the new bundling shape, defeating the purpose of caching the tree
independently. By keeping hints stable across revalidations, the
tree response and the segment responses can all be cached at the
CDN edge without coordination.

Runtime prefetches (for dynamic segments that can't be
pre-rendered) don't need this — they already bundle all segments
into a single dynamic response, the same way navigation requests
do. The bundling optimization here is about closing the gap between
the request efficiency of dynamic responses and the cacheability of
static ones.

## Architecture

The core abstraction is a "segment bundle" — a linked list of
segment cache entries (client) or segment RSC data (server) that
maps 1:1 to a single HTTP response. A standalone segment is a
bundle of length 1. A segment with inlined parents is a longer
bundle. The same data structure and fetch/fulfill logic handles both
cases, so there are no separate code paths for inlined vs
non-inlined segments.

On the server, when rendering a segment whose hint says its parent
was inlined into it, the response contains a SegmentPrefetch[] array
instead of a single SegmentPrefetch. The array contains the terminal
segment followed by each inlined ancestor, innermost to outermost.

On the client, the prefetch tree walk accumulates a SegmentBundle
linked list as it traverses segments marked InlinedIntoChild. When
it reaches a segment that isn't inlined, the accumulated bundle is
finalized and a single fetch is spawned. The response is then walked
in parallel with the bundle, fulfilling each cache entry from its
corresponding array element.

The head (metadata/viewport) is also bundled. During the hint
computation pass, the head's gzip size is measured alongside the
segments. At each page leaf, the algorithm checks whether the head
fits within the remaining maxBundleSize. If it does, the head is
appended to that page's bundle and the standalone head fetch is
skipped. If no page has room, a HeadOutlined flag on the root tells
the client to fetch the head separately, preserving the existing
behavior.

## Config

experimental.prefetchInlining is a boolean (default false). When
true, size-based inlining is enabled using default thresholds
(maxSize: 2KB, maxBundleSize: 10KB). Eventually this will become
the default behavior for all apps; the flag exists only for
incremental rollout. Threshold overrides (maxSize, maxBundleSize)
are not yet exposed in config but are straightforward to add.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 6, 2026
The Client Segment Cache splits page prefetches into per-segment
requests so that shared layouts can be cached independently and
reused across sibling routes. A page with N segments produces N
separate prefetch requests, each individually cacheable at the CDN
edge. This is a significant improvement over the previous approach
of fetching all page data in a single request — navigating between
sibling routes that share a layout no longer re-downloads the
layout's data, because it's already cached from a previous
navigation.

The trade-off is request volume. Below some compressed size, the
per-request overhead (HTTP headers, connection setup, CDN cache
lookup latency) exceeds the deduplication benefit of a standalone
cacheable response. This is the same trade-off that JS bundlers
face when deciding whether to inline a module into a chunk vs
keeping it as a separate file: below a threshold, inlining is
strictly better.

The previous commit (vercel#90555) introduced a build-time measurement
pass that computes per-segment gzip sizes and decides which segments
should be bundled together. This commit wires up both the server and
client to act on those decisions at request time.

## Static cacheability

This bundling optimization applies specifically to static prefetch
responses — the ones served from the CDN. These responses must be
statically cacheable, so the bundling decisions can't vary per
request. This is why the hints are computed once at build time and
remain fixed for the lifetime of the deployment. They're persisted
to a manifest and embedded into the route tree prefetch response,
so the client always sees the same bundling shape for a given build.

Hints are not recomputed during ISR/revalidation either. If
revalidation could change the hints, the client would need to
re-fetch the route tree after every revalidation to learn about
the new bundling shape, defeating the purpose of caching the tree
independently. By keeping hints stable across revalidations, the
tree response and the segment responses can all be cached at the
CDN edge without coordination.

Runtime prefetches (for dynamic segments that can't be
pre-rendered) don't need this — they already bundle all segments
into a single dynamic response, the same way navigation requests
do. The bundling optimization here is about closing the gap between
the request efficiency of dynamic responses and the cacheability of
static ones.

## Architecture

The core abstraction is a "segment bundle" — a linked list of
segment cache entries (client) or segment RSC data (server) that
maps 1:1 to a single HTTP response. A standalone segment is a
bundle of length 1. A segment with inlined parents is a longer
bundle. The same data structure and fetch/fulfill logic handles both
cases, so there are no separate code paths for inlined vs
non-inlined segments.

On the server, when rendering a segment whose hint says its parent
was inlined into it, the response contains a SegmentPrefetch[] array
instead of a single SegmentPrefetch. The array contains the terminal
segment followed by each inlined ancestor, innermost to outermost.

On the client, the prefetch tree walk accumulates a SegmentBundle
linked list as it traverses segments marked InlinedIntoChild. When
it reaches a segment that isn't inlined, the accumulated bundle is
finalized and a single fetch is spawned. The response is then walked
in parallel with the bundle, fulfilling each cache entry from its
corresponding array element.

The head (metadata/viewport) is also bundled. During the hint
computation pass, the head's gzip size is measured alongside the
segments. At each page leaf, the algorithm checks whether the head
fits within the remaining maxBundleSize. If it does, the head is
appended to that page's bundle and the standalone head fetch is
skipped. If no page has room, a HeadOutlined flag on the root tells
the client to fetch the head separately, preserving the existing
behavior.

## Config

experimental.prefetchInlining is a boolean (default false). When
true, size-based inlining is enabled using default thresholds
(maxSize: 2KB, maxBundleSize: 10KB). Eventually this will become
the default behavior for all apps; the flag exists only for
incremental rollout. Threshold overrides (maxSize, maxBundleSize)
are not yet exposed in config but are straightforward to add.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 6, 2026
The Client Segment Cache splits page prefetches into per-segment
requests so that shared layouts can be cached independently and
reused across sibling routes. A page with N segments produces N
separate prefetch requests, each individually cacheable at the CDN
edge. This is a significant improvement over the previous approach
of fetching all page data in a single request — navigating between
sibling routes that share a layout no longer re-downloads the
layout's data, because it's already cached from a previous
navigation.

The trade-off is request volume. Below some compressed size, the
per-request overhead (HTTP headers, connection setup, CDN cache
lookup latency) exceeds the deduplication benefit of a standalone
cacheable response. This is the same trade-off that JS bundlers
face when deciding whether to inline a module into a chunk vs
keeping it as a separate file: below a threshold, inlining is
strictly better.

The previous commit (vercel#90555) introduced a build-time measurement
pass that computes per-segment gzip sizes and decides which segments
should be bundled together. This commit wires up both the server and
client to act on those decisions at request time.

## Static cacheability

This bundling optimization applies specifically to static prefetch
responses — the ones served from the CDN. These responses must be
statically cacheable, so the bundling decisions can't vary per
request. This is why the hints are computed once at build time and
remain fixed for the lifetime of the deployment. They're persisted
to a manifest and embedded into the route tree prefetch response,
so the client always sees the same bundling shape for a given build.

Hints are not recomputed during ISR/revalidation either. If
revalidation could change the hints, the client would need to
re-fetch the route tree after every revalidation to learn about
the new bundling shape, defeating the purpose of caching the tree
independently. By keeping hints stable across revalidations, the
tree response and the segment responses can all be cached at the
CDN edge without coordination.

Runtime prefetches (for dynamic segments that can't be
pre-rendered) don't need this — they already bundle all segments
into a single dynamic response, the same way navigation requests
do. The bundling optimization here is about closing the gap between
the request efficiency of dynamic responses and the cacheability of
static ones.

## Architecture

The core abstraction is a "segment bundle" — a linked list of
segment cache entries (client) or segment RSC data (server) that
maps 1:1 to a single HTTP response. A standalone segment is a
bundle of length 1. A segment with inlined parents is a longer
bundle. The same data structure and fetch/fulfill logic handles both
cases, so there are no separate code paths for inlined vs
non-inlined segments.

On the server, when rendering a segment whose hint says its parent
was inlined into it, the response contains a SegmentPrefetch[] array
instead of a single SegmentPrefetch. The array contains the terminal
segment followed by each inlined ancestor, innermost to outermost.

On the client, the prefetch tree walk accumulates a SegmentBundle
linked list as it traverses segments marked InlinedIntoChild. When
it reaches a segment that isn't inlined, the accumulated bundle is
finalized and a single fetch is spawned. The response is then walked
in parallel with the bundle, fulfilling each cache entry from its
corresponding array element.

The head (metadata/viewport) is also bundled. During the hint
computation pass, the head's gzip size is measured alongside the
segments. At each page leaf, the algorithm checks whether the head
fits within the remaining maxBundleSize. If it does, the head is
appended to that page's bundle and the standalone head fetch is
skipped. If no page has room, a HeadOutlined flag on the root tells
the client to fetch the head separately, preserving the existing
behavior.

## Config

experimental.prefetchInlining is a boolean (default false). When
true, size-based inlining is enabled using default thresholds
(maxSize: 2KB, maxBundleSize: 10KB). Eventually this will become
the default behavior for all apps; the flag exists only for
incremental rollout. Threshold overrides (maxSize, maxBundleSize)
are not yet exposed in config but are straightforward to add.
acdlite added a commit to acdlite/next.js that referenced this pull request Mar 6, 2026
The Client Segment Cache splits page prefetches into per-segment
requests so that shared layouts can be cached independently and
reused across sibling routes. A page with N segments produces N
separate prefetch requests, each individually cacheable at the CDN
edge. This is a significant improvement over the previous approach
of fetching all page data in a single request — navigating between
sibling routes that share a layout no longer re-downloads the
layout's data, because it's already cached from a previous
navigation.

The trade-off is request volume. Below some compressed size, the
per-request overhead (HTTP headers, connection setup, CDN cache
lookup latency) exceeds the deduplication benefit of a standalone
cacheable response. This is the same trade-off that JS bundlers
face when deciding whether to inline a module into a chunk vs
keeping it as a separate file: below a threshold, inlining is
strictly better.

The previous commit (vercel#90555) introduced a build-time measurement
pass that computes per-segment gzip sizes and decides which segments
should be bundled together. This commit wires up both the server and
client to act on those decisions at request time.

## Static cacheability

This bundling optimization applies specifically to static prefetch
responses — the ones served from the CDN. These responses must be
statically cacheable, so the bundling decisions can't vary per
request. This is why the hints are computed once at build time and
remain fixed for the lifetime of the deployment. They're persisted
to a manifest and embedded into the route tree prefetch response,
so the client always sees the same bundling shape for a given build.

Hints are not recomputed during ISR/revalidation either. If
revalidation could change the hints, the client would need to
re-fetch the route tree after every revalidation to learn about
the new bundling shape, defeating the purpose of caching the tree
independently. By keeping hints stable across revalidations, the
tree response and the segment responses can all be cached at the
CDN edge without coordination.

Runtime prefetches (for dynamic segments that can't be
pre-rendered) don't need this — they already bundle all segments
into a single dynamic response, the same way navigation requests
do. The bundling optimization here is about closing the gap between
the request efficiency of dynamic responses and the cacheability of
static ones.

## Architecture

The core abstraction is a "segment bundle" — a linked list of
segment cache entries (client) or segment RSC data (server) that
maps 1:1 to a single HTTP response. A standalone segment is a
bundle of length 1. A segment with inlined parents is a longer
bundle. The same data structure and fetch/fulfill logic handles both
cases, so there are no separate code paths for inlined vs
non-inlined segments.

On the server, when rendering a segment whose hint says its parent
was inlined into it, the response contains a SegmentPrefetch[] array
instead of a single SegmentPrefetch. The array contains the terminal
segment followed by each inlined ancestor, innermost to outermost.

On the client, the prefetch tree walk accumulates a SegmentBundle
linked list as it traverses segments marked InlinedIntoChild. When
it reaches a segment that isn't inlined, the accumulated bundle is
finalized and a single fetch is spawned. The response is then walked
in parallel with the bundle, fulfilling each cache entry from its
corresponding array element.

The head (metadata/viewport) is also bundled. During the hint
computation pass, the head's gzip size is measured alongside the
segments. At each page leaf, the algorithm checks whether the head
fits within the remaining maxBundleSize. If it does, the head is
appended to that page's bundle and the standalone head fetch is
skipped. If no page has room, a HeadOutlined flag on the root tells
the client to fetch the head separately, preserving the existing
behavior.

## Config

experimental.prefetchInlining is a boolean (default false). When
true, size-based inlining is enabled using default thresholds
(maxSize: 2KB, maxBundleSize: 10KB). Eventually this will become
the default behavior for all apps; the flag exists only for
incremental rollout. Threshold overrides (maxSize, maxBundleSize)
are not yet exposed in config but are straightforward to add.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants