Add configurable threshold to avoid power-of-two rounding for large pinned memory allocations by crcrpar · Pull Request #171662 · pytorch/pytorch

crcrpar · 2026-01-04T03:52:22Z

See under "correction" for up to date description
This pull request introduces a configurable threshold for the maximum allocation size that will be cached by the CUDA pinned memory (host) allocator. Allocations larger than this threshold will no longer be rounded up to the next power of two or cached, which helps avoid memory waste for large allocations that are just above a power-of-two boundary. The threshold is controlled via a new pinned_max_cachesize_mb option in PYTORCH_CUDA_ALLOC_CONF. The default behavior remains unchanged unless this option is set. Documentation and configuration parsing have been updated accordingly.

Pinned memory allocator improvements:

Added logic in CachingHostAllocatorImpl and CUDACachingHostAllocatorImpl to skip power-of-two rounding and caching for allocations larger than a configurable threshold, freeing them immediately instead. The threshold is set by the new pinned_max_cachesize_mb option. [1] [2] [3] [4]

Configuration and API changes:

Introduced the pinned_max_cachesize_mb option to CUDAAllocatorConfig, including parsing, storage, and API for retrieving the value. This option is now recognized and handled in the allocator configuration. [1] [2] [3] [4] [5] [6] [7] [8]

Documentation:

Updated the CUDA notes documentation to describe the new pinned_max_cachesize_mb option, its usage, and its default behavior.

Miscellaneous:

Added missing includes for <limits> where needed to support the new logic. [1] [2]

Rel: #150517

Used Claude Opus 4.5

Correction

This PR adds two new PYTORCH_CUDA_ALLOC_CONF options for the pinned memory (host) caching allocator to address memory waste from power-of-2
rounding and caching of large allocations:

pinned_max_round_threshold_mb: allocations above this threshold skip power-of-2 rounding.
pinned_max_cached_size_mb: allocations above this threshold are freed immediately instead of cached in the free list. Allocations that
exceed this limit are also never rounded up, since rounding is only useful for cached blocks.

Both options default to unlimited (disabled), preserving existing behavior. A warning is emitted if pinned_max_round_threshold_mb is
explicitly set larger than pinned_max_cached_size_mb.

The block caching/freeing logic is refactored into a shared maybe_cache_block method used by both the direct free path and the
event-processing path. The new config values are also exposed in the memory snapshot allocator settings.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

pytorch-bot · 2026-01-04T03:52:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171662

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a00c135 with merge base 08268aa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wu-hanqing · 2026-01-06T04:54:22Z

-      pool.free_list_[index].list_.push_back(block);
+      // Check if block is too large to cache
+      // See https://github.com/pytorch/pytorch/issues/150517
+      size_t maxCachedSize = pinned_max_cached_size();


Does max_round_size and max_cache_size must be equal? If not, there might be a problem.

For example, max_round_size is 128M, max_cache_size is 256M. When free a 129M block, it will be cached by a free_list for 256MB block. Next time, when allocate a 200M block, it seems previous 129M block is returned.

Please correct me if I am wrong.

sounds you're right. I consolidated the configs into 1 and added some tests.

ngimel · 2026-01-07T18:19:11Z

AI assistance on PRs must be disclosed. Claude or codex?

ngimel · 2026-01-07T18:20:44Z

+      // See https://github.com/pytorch/pytorch/issues/150517
+      size_t maxPower2Size = pinned_max_power2_size();
+      if (maxPower2Size > 0 && block->size_ > maxPower2Size) {
+        // Block too large to cache, free it immediately


conflating "I don't want to round allocation up" and "I don't want to cache allocation" is confusing

ngimel · 2026-01-07T18:23:36Z

+    // See https://github.com/pytorch/pytorch/issues/150517
+    size_t roundSize = size;
+    size_t maxPower2Size = pinned_max_power2_size();
+    if (maxPower2Size == 0 || size <= maxPower2Size) {


the naming of the variable is confusing, returning 0 (instead of say numeric_limits::max) to indicate that it's disabled is also confusing

Made a change so that the default value is -1 to indicate the config isn't set.

crcrpar · 2026-01-08T03:48:37Z

AI assistance on PRs must be disclosed. Claude or codex?

claude opus 4.5

ngimel

This still doesn't disentangle max size for caching and max size for rounding up.

ngimel · 2026-01-15T19:57:14Z

+  // -1 means disabled (all allocations use power-of-two and caching).
+  // 0 means no caching (all allocations use exact size).
+  // Positive values set the threshold in MB.
+  m_pinned_max_cachesize_mb = val;


you can directly set it to numeric_limits::max here to avoid -1 logic later

ngimel · 2026-01-15T20:10:08Z

+        // See https://github.com/pytorch/pytorch/issues/150517
+        size_t maxCachesize = pinned_max_cachesize();
+        if (block->size_ > maxCachesize) {
+          // Block too large to cache, free it immediately


you have fairly large codeblocks to delete blocks in 2 places now, factor them out into a function

galv · 2026-01-30T05:23:18Z

Honestly, I wish we would stop adding new capabilities to CachingHostAllocator and instead allow for a different implementation (in this case, one that just cuts up segments into blocks like CUDACachingAllocator), but I understand that that is an unreasonable ask given all of the interfaces involved at this point.

ehfd · 2026-03-23T16:46:16Z

What is the status of this PR? The problem this solves leads to OOM errors for vLLM with UMA (360GB pinned unified memory OOMs because much more RAM is spent because of this issue).

The default value of these two configs is `std::numeric_limits<size_t>::max()`. `pinned_max_round_threshold_mb` sets the maximum allocation size which will be rounded up to the nearest power-of-2. Exact requested sizes are used if their allocation size is greater than this threshold. `pinned_max_cached_size_mb` sets the maximum block size that will be cached. Blocks larger than this threshold will be freed immediately when no longer in use, rather then being cached. Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

ngimel · 2026-04-30T15:59:51Z

@pytorchbot merge

pytorchmergebot · 2026-04-30T16:02:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

crcrpar requested review from Aidyn-A, eqy and syed-ahmed as code owners January 4, 2026 03:52

pytorchbot added the open source label Jan 4, 2026

wu-hanqing reviewed Jan 6, 2026

View reviewed changes

crcrpar force-pushed the add-pinned-max-round-size-config branch from ac90d12 to 76bc9e9 Compare January 7, 2026 07:11

crcrpar requested review from a team, IvanYashchuk, ezyang, fmassa, jeffdaily, lezcano, malfet, mruberry and nikitaved as code owners January 7, 2026 07:11

pytorch-bot Bot added ciflow/inductor module: inductor release notes: releng release notes category labels Jan 7, 2026

crcrpar force-pushed the add-pinned-max-round-size-config branch 2 times, most recently from 0013bc0 to bee1cc6 Compare January 7, 2026 07:50

ezyang requested review from colesbury and ngimel January 7, 2026 13:34

ngimel reviewed Jan 7, 2026

View reviewed changes

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 8, 2026

crcrpar force-pushed the add-pinned-max-round-size-config branch from bee1cc6 to e2a7cd7 Compare January 12, 2026 03:45

ngimel reviewed Jan 15, 2026

View reviewed changes

crcrpar force-pushed the add-pinned-max-round-size-config branch from e2a7cd7 to 3abdf3a Compare January 21, 2026 14:35

wzhao18 mentioned this pull request Jan 24, 2026

[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation vllm-project/vllm#32993

Merged

5 tasks

jeffdaily removed their request for review April 28, 2026 20:08

crcrpar and others added 2 commits April 29, 2026 18:56

fixes and claude review comments

2533df0

ngimel force-pushed the add-pinned-max-round-size-config branch from 3abdf3a to 2533df0 Compare April 30, 2026 03:32

cleanup

a00c135

ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 30, 2026

ngimel approved these changes Apr 30, 2026

View reviewed changes

pytorchmergebot added the merging label Apr 30, 2026

pytorchmergebot closed this in 2661ee1 Apr 30, 2026

pytorchmergebot added Merged and removed merging labels Apr 30, 2026

Conversation

crcrpar commented Jan 4, 2026 • edited by ngimel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/171662

✅ No Failures

Uh oh!

wu-hanqing Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

crcrpar Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ngimel commented Jan 7, 2026

Uh oh!

ngimel Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ngimel Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

crcrpar Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

crcrpar commented Jan 8, 2026

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ngimel Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

galv commented Jan 30, 2026

Uh oh!

ehfd commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Apr 30, 2026

Uh oh!

pytorchmergebot commented Apr 30, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

crcrpar commented Jan 4, 2026 •

edited by ngimel

Loading

pytorch-bot Bot commented Jan 4, 2026 •

edited

Loading

ehfd commented Mar 23, 2026 •

edited

Loading