Skip to content

Source map production is unbelievably memory intensive #20961

@BPapp-MS

Description

@BPapp-MS

Have you used AI?

Yes

Bug Description

SourceMapDevToolPlugin causes excessive memory consumption on large production builds due to its two-phase architecture accumulating all source map task data in memory simultaneously.

In Phase 1, getTaskForFile() calls asset.sourceAndMap() for every matching file and stores the full source map object and source string in a tasks[] array. All tasks must exist before Phase 2 begins because of the module name resolution step between phases (conflict detection across all source names). Phase 2 then iterates the tasks to serialize and emit .map files.

For builds with many locale-variant chunks sharing the same modules, this means thousands of tasks — each holding multi-megabyte source map objects as V8 heap data — coexist in memory. Additionally, webpack-sources CachedSource objects retain their composed map results internally, so shared module sources accumulate duplicate cached map data across chunks.

In our production monorepo, a single large project (5,275 unique modules, 52 locale-variant chunks, ~14,000 source map tasks) spikes from ~6 GB RSS at the end of minification to 25.5 GB RSS after SourceMapDevToolPlugin completes — an increase of nearly 20 GB attributable entirely to this plugin. This causes OOM failures in standard CI environments.

Phase-by-phase profiling confirms the minification phase stays well under budget. The memory spike occurs exclusively within the processAssets hook at PROCESS_ASSETS_STAGE_DEV_TOOLING.

Link to Minimal Reproduction and step to reproduce

A minimal reproduction is difficult because the issue manifests at scale — it requires thousands of source map tasks with shared modules across many chunks. Below is a detailed description of the conditions that trigger it.

Project structure:

  • 5,275 unique modules bundled by webpack5-module-minifier-plugin (RushStack)
  • 52 locale-variant chunks, each containing ~2,060 source references
  • Each chunk produces a source map of ~9.1 MB as JSON
  • Total: ~14,000 source map tasks processed by SourceMapDevToolPlugin
  • Production mode with SourceMapDevToolPlugin configured for separate .map files (filename set, append: false, module: true, noSources: false)

Steps to reproduce:

  1. Build a large project with many locale variants (or any configuration producing thousands of chunks with heavily overlapping module sets)
  2. Enable production source maps via SourceMapDevToolPlugin with external .map files
  3. Monitor peak RSS during the build (e.g., via /usr/bin/time -v or process.memoryUsage.rss())
  4. Observe that RSS spikes dramatically during the processAssets stage, not during minification

Metrics collected (single-process, --expose-gc, GNU time):

Scenario Peak RSS
Production build, no source maps 6,523 MB
Production build + SourceMapDevToolPlugin 25,532 MB
After minification phase (before SMDTP) 5,593 MB
After SMDTP processAssets completes 21,140 MB

The ~19 GB delta between "no source maps" and "with source maps" is entirely within SourceMapDevToolPlugin.

Expected Behavior

SourceMapDevToolPlugin should be able to produce source maps for large builds without holding all task data in memory simultaneously. The module name resolution step (conflict detection) only requires module identifiers and source names — not the full composed maps or source strings. Memory-proportional-to-one-task-at-a-time processing should be achievable.

For our build, we expect peak RSS during source map generation to stay under 14 GB — roughly double the minification baseline of ~6 GB, accounting for the inherent cost of one chunk's map composition at a time plus serialization overhead.

Experimental patches applied to SourceMapDevToolPlugin confirm this is feasible:

  • Serializing maps to Buffers (1 byte/char) instead of holding V8 string objects (2 bytes/char) reduced peak RSS by ~5 GB (20%)
  • Adding recursive CachedSource cache clearing after each chunk's sourceAndMap() call reduced peak RSS by ~8.2 GB (32%), bringing the total from 25.5 GB down to 17.3 GB

With these combined, a target of ~14 GB for our scenario appears achievable

Actual Behavior

SourceMapDevToolPlugin accumulates all source map task data (composed maps + source strings) in memory during Phase 1 before processing any of it in Phase 2. For our build, this causes peak RSS to reach 25,532 MB — a 19 GB increase over the same build without source maps (6,523 MB).

The breakdown:

  • Phase 1 collects ~14,000 tasks. Each task stores the full sourceMap object (~9.1 MB JSON per large chunk) and source string (~1.8 MB) as V8 heap objects. V8 strings use 2 bytes/char (UTF-16 internal encoding), so a 9.1 MB JSON string occupies ~18 MB of V8 heap.
  • CachedSource objects shared across locale-variant chunks retain composed map results internally, adding further GBs of redundant cached data.
  • By the time Phase 2 begins, RSS has already climbed to ~16–21 GB. Phase 2 then re-serializes each map, adding another transient spike.

This reliably causes OOM failures in CI environments with 16 GB memory limits, preventing source map generation for large projects.

Environment

System:
  OS: Linux 6.8 (Ubuntu 22.04 - GitHub Codespaces)
  CPU: (4) x64 AMD EPYC (Zen 4)
  Memory: 32 GB

Binaries:
  Node: 22.x
  npm: 10.x
  pnpm: 9.x (via Rush)

Packages:
  webpack: 5.105.4
  webpack-sources: 3.3.4
  neo-async: 2.6.2

Is this a regression?

No

Last Working Version

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Performance.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions