Skip to content

Importing torch_tensorrt causes warning for implicitly cleaned up file #147744

@ivan94fi

Description

@ivan94fi

🐛 Describe the bug

A temporary directory is created at this line in torch.distributed.nn.jit.instantiator and it is never cleaned:

_TEMP_DIR = tempfile.TemporaryDirectory()

A warning is generated by tempfile itself when the program exits:

WARNING  py.warnings               /usr/lib/python3.12/tempfile.py:1075: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpxy_e0smt'>                              warnings.py:110
           _warnings.warn(warn_message, ResourceWarning)           

The generated file is _remote_module_non_scriptable.py.

For me the warning message is generated when torch_tensorrt is imported:

-> import torch_tensorrt
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/__init__.py(125)<module>()
-> from torch_tensorrt.runtime import *  # noqa: F403
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/runtime/__init__.py(1)<module>()
-> from torch_tensorrt.dynamo.runtime import (  # noqa: F401
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/__init__.py(10)<module>()
-> from ._compiler import compile, convert_exported_program_to_serialized_trt_engine
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/_compiler.py(14)<module>()
-> from torch_tensorrt.dynamo import _defaults, partitioning
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/partitioning/__init__.py(1)<module>()
-> from ._adjacency_partitioner import partition as fast_partition
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py(20)<module>()
-> from torch_tensorrt.dynamo.conversion._ConverterRegistry import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/conversion/__init__.py(1)<module>()
-> from . import aten_ops_converters, ops_evaluators, prims_ops_converters
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/conversion/aten_ops_converters.py(12)<module>()
-> from torch_tensorrt.dynamo.conversion import impl
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/dynamo/conversion/impl/__init__.py(1)<module>()
-> from torch_tensorrt.fx.converters.impl import convolution
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/__init__.py(1)<module>()
-> from .converters import *  # noqa: F403 F401
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/converters/__init__.py(5)<module>()
-> from .adaptive_avgpool import *  # noqa: F401 F403
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/converters/adaptive_avgpool.py(7)<module>()
-> from .converter_utils import extend_mod_attr_to_tuple, mark_as_int8_layer
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/converters/converter_utils.py(23)<module>()
-> from ..utils import Frameworks, unified_dtype_converter
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/utils.py(12)<module>()
-> from torch_tensorrt.fx.passes.lower_basic_pass import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/passes/lower_basic_pass.py(14)<module>()
-> from ..tracer.acc_tracer import acc_ops
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch_tensorrt/fx/tracer/acc_tracer/acc_ops.py(891)<module>()
-> from torchvision.ops import stochastic_depth
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/__init__.py(10)<module>()
-> from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/models/__init__.py(2)<module>()
-> from .convnext import *
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/models/convnext.py(8)<module>()
-> from ..ops.misc import Conv2dNormActivation, Permute
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/ops/__init__.py(23)<module>()
-> from .poolers import MultiScaleRoIAlign
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/ops/poolers.py(10)<module>()
-> from .roi_align import roi_align
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torchvision/ops/roi_align.py(7)<module>()
-> from torch._dynamo.utils import is_compile_supported
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/__init__.py(3)<module>()
-> from . import convert_frame, eval_frame, resume_execution
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py(53)<module>()
-> from . import config, exc, trace_rules
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/trace_rules.py(46)<module>()
-> from .variables import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/variables/__init__.py(2)<module>()
-> from .builtin import BuiltinVariable
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py(47)<module>()
-> from .ctx_manager import EventVariable, StreamVariable
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/variables/ctx_manager.py(22)<module>()
-> from .functions import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py(31)<module>()
-> from torch.distributed._composable.fsdp import _fsdp_param_group
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_composable/__init__.py(3)<module>()
-> from .fully_shard import fully_shard
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_composable/fully_shard.py(10)<module>()
-> from torch.distributed.fsdp._common_utils import _FSDPState
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/fsdp/__init__.py(1)<module>()
-> from ._flat_param import FlatParameter as FlatParameter
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/fsdp/_flat_param.py(47)<module>()
-> from ._fsdp_extensions import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/fsdp/_fsdp_extensions.py(6)<module>()
-> from torch.distributed._shard.sharded_tensor.api import ShardedTensor
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_shard/__init__.py(1)<module>()
-> from .api import _shard_tensor, load_with_process_group, shard_module, shard_parameter
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_shard/api.py(9)<module>()
-> from torch.distributed._shard.sharded_tensor import ShardedTensor
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py(8)<module>()
-> from .api import (
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_shard/sharded_tensor/api.py(31)<module>()
-> from .reshard import reshard_local_shard, reshuffle_local_shard
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/_shard/sharded_tensor/reshard.py(14)<module>()
-> from torch.distributed.nn.functional import all_to_all, all_to_all_single
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1310)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/nn/__init__.py(7)<module>()
-> from .api.remote_module import RemoteModule
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/nn/api/remote_module.py(26)<module>()
-> from torch.distributed.nn.jit import instantiator
  <frozen importlib._bootstrap>(1415)_handle_fromlist()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
  <frozen importlib._bootstrap>(1360)_find_and_load()
  <frozen importlib._bootstrap>(1331)_find_and_load_unlocked()
  <frozen importlib._bootstrap>(935)_load_unlocked()
  <frozen importlib._bootstrap_external>(995)exec_module()
  <frozen importlib._bootstrap>(488)_call_with_frames_removed()
> /opt/uv/venv/lib/python3.12/site-packages/torch/distributed/nn/jit/instantiator.py(17)<module>()

Versions

PyTorch version: 2.5.0+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.39

Python version: 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 550.120
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               24
On-line CPU(s) list:                  0-23
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 9 7900X 12-Core Processor
CPU family:                           25
Model:                                97
Thread(s) per core:                   2
Core(s) per socket:                   12
Socket(s):                            1
Stepping:                             2
CPU(s) scaling MHz:                   65%
CPU max MHz:                          5733.0000
CPU min MHz:                          400.0000
BogoMIPS:                             9399.26
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization:                       AMD-V
L1d cache:                            384 KiB (12 instances)
L1i cache:                            384 KiB (12 instances)
L2 cache:                             12 MiB (12 instances)
L3 cache:                             64 MiB (2 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-23
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnx==1.17.0
[pip3] onnx-graphsurgeon==0.5.2
[pip3] onnxconverter-common==1.14.0
[pip3] onnxmltools==1.12.0
[pip3] onnxruntime==1.18.1
[pip3] onnxruntime-gpu==1.18.1
[pip3] onnxscript==0.1.0.dev20241212
[pip3] torch==2.5.0+cu124
[pip3] torch_tensorrt==2.5.0+cu124
[pip3] torchinfo==1.8.0
[pip3] torchprofile==0.0.4
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.20.0+cu124
[pip3] triton==3.1.0
[conda] Could not collect

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Metadata

Metadata

Assignees

No one assigned

    Labels

    oncall: distributedAdd this issue/PR to distributed oncall triage queue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions