TRT-LLM installation tool by apbose · Pull Request #3829 · pytorch/TensorRT

apbose · 2025-09-22T06:33:10Z

TRT-LLM download utility. USE_TRTLLM_PLUGINS=1 should trigger the download mechanism.
wheel downloaded to /tmp/torch_tensorrt_/ and unzipped
plugin_lib_path is set to here - /tmp/torch_tensorrt_/trtllm/0.17.0.post1_linux_x86_64/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so.
Across runs wheel is removed, while .so file is retained

narendasan · 2025-09-26T19:02:19Z

py/torch_tensorrt/dynamo/conversion/custom_ops_converters.py

-        tensorrt_fused_nccl_all_gather_op,
-        tensorrt_fused_nccl_reduce_scatter_op,
-    )
+if load_tensorrt_llm_for_nccl():


I would like to use the enabled features system for this rather than a stand alone function

narendasan · 2025-09-26T19:02:58Z

py/torch_tensorrt/dynamo/utils.py

+
+    Unsupported:
+        - Windows platforms
+        - Jetson/Orin/Xavier (aarch64 architecture + 'tegra' in platform release)


Thor also not supported by TRT-LLM right?

yeah Thor and sbsa should support NCCL, but TRT-LLM I am not aware. Will include Thor in the list of unsupported platform.
A followup question, what about sbsa? I see on TRT-LLM page that they are supported on Blackwell, but that does not imply sbsa support right (can be supported on B200 - non sbsa vs GB200 - sbsa).

narendasan · 2025-09-26T19:03:12Z

py/torch_tensorrt/dynamo/utils.py

+
+    if machine == "aarch64" and "tegra" in release:
+        logger.info(
+            "TensorRT-LLM plugins for NCCL backend are not supported on Jetson/Orin/Xavier (Tegra) devices."


Edit the error message here to include thor

narendasan · 2025-09-26T19:04:06Z

py/torch_tensorrt/dynamo/utils.py

+    try:
+        cuda_version = torch.version.cuda  # e.g., "12.4" or "13.0"
+        if cuda_version is None:
+            logger.warning("No CUDA runtime detected — TRT-LLM plugins unavailable.")


This is somewhat misleading because the actual error is that the pytorch install does not support cuda.

Also if that is the case would this be an error? What invokes this function? Should the user continue to be able to run? Would they be under the assumption that TRT-LLM plugins would be available?

yes will change the error message.
In that case cuda runtime is not available, but I assume we would hit an error before only before reaching this point. Wrt to this function we won't be able to verify if the CUDA is 12.X or 13.X. Should I remove this check altogether?

Its fine to have redundant checks as long as they are clear

narendasan · 2025-09-26T19:06:25Z

py/torch_tensorrt/dynamo/utils.py

+
+        major, minor = map(int, cuda_version.split("."))
+        if major != 12:
+            logger.warning("CUDA 13 is not supported for TRT-LLM plugins.")


not currently supported. Same comment as above though, Seems to me this is at least log error, but the question is if we should kill the process. If the program will not run as intended we should otherwise its still an error but we can continue

Will change the error message to add currently
Its more like then this function will return a false
load_tensorrt_llm_for_nccl() calls is_platform_supported_for_trtllm() which will return false and the converter will be unsupported.

narendasan · 2025-09-26T19:07:51Z

py/torch_tensorrt/dynamo/utils.py

+    return False
+
+
+def load_tensorrt_llm_for_nccl() -> bool:


This function should be in the enabled features system. And should register the feature for other parts of the library to query against

Will make this change

narendasan · 2025-09-26T19:32:15Z

Yes you should be able to run it on GB200, I think there is just not a thor distribution of TRT-LLM for now.

…ing. Pending- check support on Thor and sbsa

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_features.py	2025-10-02 20:53:24.201799+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/_features.py	2025-10-02 20:53:59.151475+00:00
@@ -165,10 +165,11 @@
def needs_trtllm_for_nccl(f: Callable[..., Any]) -> Callable[..., Any]:
    def wrapper(*args: List[Any], **kwargs: Dict[str, Any]) -> Any:
        if ENABLED_FEATURES.trtllm_for_nccl:
            return f(*args, **kwargs)
        else:
+
            def not_implemented(*args: List[Any], **kwargs: Dict[str, Any]) -> Any:
                raise NotImplementedError(
                    "Refit feature is currently not available in Python 3.13 or higher"
                )

narendasan · 2025-10-07T19:02:15Z

.github/workflows/build-test-linux-x86_64.yml

        python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/l1_ts_models_tests_results.xml -n auto  models/
        popd

+  L1-dynamo-distributed-tests:


Can we make this L2 for now?

meta-cla bot added the cla signed label Sep 22, 2025

apbose mentioned this pull request Sep 22, 2025

TRT-LLM loading mechanism tool #3398

Closed

narendasan reviewed Sep 26, 2025

View reviewed changes

apbose added 3 commits September 30, 2025 19:04

TRT-LLM installation tool

892e8bb

Changes for CUDA13

9b45f43

Addressing review comments- include in enabled feature and error logg…

cee5c7a

…ing. Pending- check support on Thor and sbsa

apbose force-pushed the abose/trt_llm_installtion branch from 8dd657c to cee5c7a Compare October 1, 2025 02:05

excluding thor from the supported platform of TRTLLM wheel

6bbd852

apbose force-pushed the abose/trt_llm_installtion branch 2 times, most recently from 6dfb740 to f4bbba4 Compare October 2, 2025 20:53

github-actions bot requested changes Oct 2, 2025

View reviewed changes

fixing circular imports

7046c6d

apbose force-pushed the abose/trt_llm_installtion branch from f4bbba4 to 7046c6d Compare October 2, 2025 20:57

fixing typo

a028601

apbose force-pushed the abose/trt_llm_installtion branch from 045722a to a028601 Compare October 3, 2025 23:34

addressing the review comments- comments and error message

226ed04

apbose force-pushed the abose/trt_llm_installtion branch 2 times, most recently from fb2e683 to b96b9ee Compare October 7, 2025 03:22

changing location of the L1 distributed tests

2f2cd31

apbose force-pushed the abose/trt_llm_installtion branch from b96b9ee to 2f2cd31 Compare October 7, 2025 17:27

narendasan reviewed Oct 7, 2025

View reviewed changes

moving tests to L2

24264e5

apbose force-pushed the abose/trt_llm_installtion branch from 44d9b60 to 24264e5 Compare October 9, 2025 16:54

apbose merged commit f2f5e8d into main Oct 9, 2025
14 checks passed

Conversation

apbose commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

narendasan commented Sep 26, 2025 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

apbose commented Sep 22, 2025 •

edited

Loading

narendasan commented Sep 26, 2025 via email •

edited

Loading