NVIDIA TensorRT

NVIDIA® TensorRT™ is a high-performance deep learning inference SDK that helps you deploy AI models on NVIDIA GPUs. It optimizes your trained neural networks from TensorFlow, PyTorch, and MXNet. The result is optimized inference engines that run efficiently on NVIDIA hardware.

Key Capabilities

- Automated Optimization: Applies layer fusion, precision calibration, and kernel auto-tuning
- Multi-Precision Support: Supports FP32, FP16, INT8, and INT4 quantization
- Dynamic Tensors: Handles variable batch sizes, image resolutions, and sequence lengths without rebuilding engines
- Platform Coverage: Supports data center GPUs, embedded systems, and automotive platforms

TensorRT transforms your trained models into optimized runtime engines that deliver low-latency inference. Use TensorRT for image classification, computer vision, large language models, and other AI applications.

NVIDIA TensorRT Documentation

This documentation provides information regarding the current NVIDIA TensorRT release.

Browse