Pinned memory doubles memory usage for tensors slightly over 128MB

### 🐛 Describe the bug

This issue appears to be related to #95823, but also occurs on smaller tensors. 
Although #95823 is closed, the underlying problem still exists.

PyTorch appears to allocate memory up to the next power of two (256MB) when pinning tensors that are slightly larger than 128MB in size. 
This nearly doubles the expected memory usage.

### Minimal Example
```python
import torch

def get_free():
    import subprocess
    r = subprocess.run(["free", "-m"], capture_output=True)
    d = r.stdout.decode('utf-8')
    s = d.split(':')[1].split()
    return f"[used={s[1]:7}, shared={s[3]:7}] "

model_weight = torch.randn(18944, 3584, dtype=torch.float16, device='cpu') #129.5MB (qwen2.5 7b, up_proj)
# model_weight = torch.randn(14336, 4096, dtype=torch.float16, device='cpu') #112.0MB (llama3.1 7b, up_proj)
print("weight memory usage:", model_weight.element_size() * model_weight.nelement() / (1024 ** 2), "MB")

# Pinning memory
print(get_free() + "Before pin")
model_weight = model_weight.pin_memory()
print(get_free() + "After pin")
```

### Observed Behavior
It allocates almost double the memory when pinning qwen2.5 7b's up_proj (expected: 129.5 MB, actually used: 264 MB):
```bash
weight memory usage: 129.5 MB
[used=9306   , shared=108    ] Before pin
[used=9334   , shared=372    ] After pin
```
Pinning llama3.1 8b's up_proj (expected:112.0 MB, actually used: 136 MB):
```bash
weight memory usage: 112.0 MB
[used=9280   , shared=108    ] Before pin
[used=9321   , shared=244    ] After pin
```

Although the additional memory is less noticeable when pinning a single tensor, it can scale up and significantly inflate DRAM usage. For instance, it results in approximately 12GB of extra memory overhead when pinning weights of Qwen2.5 7b model.

### Versions

PyTorch version: 2.6.0+cu126

cc @ptrblck @msaroufim @eqy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinned memory doubles memory usage for tensors slightly over 128MB #150517

🐛 Describe the bug

Minimal Example

Observed Behavior

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pinned memory doubles memory usage for tensors slightly over 128MB #150517

Description

🐛 Describe the bug

Minimal Example

Observed Behavior

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions