Link to this section使用 Ultralytics YOLO 进行模型导出#

Ultralytics YOLO ecosystem and integrations

Link to this section简介#

训练模型的最终目标是将其部署到现实世界的应用程序中。Ultralytics YOLO26 中的导出模式提供了一系列通用的选项，用于将训练好的模型导出为不同格式，使其能够部署在各种平台和设备上。本综合指南旨在带你了解模型导出的细节，展示如何实现最大的兼容性和性能。

Watch: How to Export Ultralytics YOLO26 in different formats for Deployment | ONNX, TensorRT, CoreML 🚀

Link to this section为什么要选择 YOLO26 的导出模式？#

多功能性： 支持导出为多种格式，包括 ONNX、TensorRT、CoreML 等。
性能： 使用 TensorRT 可获得高达 5 倍的 GPU 加速，使用 ONNX 或 OpenVINO 可获得 3 倍的 CPU 加速。
兼容性： 使你的模型能够在众多的硬件和软件环境中实现通用部署。
易用性： 提供简单的 CLI 和 Python API，实现快速直接的模型导出。

Link to this section导出模式的主要功能#

以下是一些出色的功能：

一键导出： 用于导出为不同格式的简单命令。
批量导出： 导出支持批量推理的模型。
优化推理： 导出的模型针对更快的推理时间进行了优化。
教程视频： 提供深入的指南和教程，带来顺畅的导出体验。

提示

导出为 ONNX 或 OpenVINO 可实现最高 3 倍的 CPU 加速。
导出为 TensorRT 可实现最高 5 倍的 GPU 加速。

Link to this section使用示例#

将 YOLO26n 模型导出为其他格式，例如 ONNX 或 TensorRT。有关完整的导出参数列表，请参阅下方的参数部分。

示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n.pt")  # load an official model
model = YOLO("path/to/best.pt")  # load a custom-trained model

# Export the model
model.export(format="onnx")

Link to this section参数#

此表格详细说明了将 YOLO 模型导出为不同格式时可用的配置和选项。这些设置对于优化导出模型的性能、大小以及在各种平台和环境下的兼容性至关重要。正确的配置可确保模型能够以最佳效率准备好在预期的应用程序中进行部署。

参数	类型	默认值	描述
`format`	`str`	`'torchscript'`	导出模型的目标格式，例如 `'onnx'`、`'torchscript'`、`'engine'` (TensorRT) 等。每种格式都能实现与不同部署环境的兼容性。
`imgsz`	`int` 或 `tuple`	`640`	模型输入所需的图像尺寸。对于正方形图像，可以是一个整数（例如 `640` 表示 640×640），也可以是一个元组 `(height, width)` 以指定具体尺寸。
`keras`	`bool`	`False`	启用导出为 TensorFlow SavedModel 的 Keras 格式，提供与 TensorFlow 服务和 API 的兼容性。
`optimize`	`bool`	`False`	导出为 TorchScript 时应用针对移动设备的优化，可能减小模型体积并提高推理性能。该选项与 NCNN 格式或 CUDA 设备不兼容。对于 DEEPX，启用更高程度的编译器优化，这会减少推理延迟并增加编译时间。
`half`	`bool`	`False`	启用 FP16（半精度）量化，减小模型体积，并可能在支持的硬件上加快推理速度。该选项与 INT8 量化或仅 CPU 的导出方式不兼容。仅适用于特定格式，例如 ONNX（见下文）。
`int8`	`bool`	`False`	激活 INT8 量化，进一步压缩模型并加速推理，同时将 accuracy 损失降至最低，主要针对 edge devices。TensorRT 11+ 使用 ModelOpt 显式 Q/DQ 量化；TensorRT 7-10 使用带校准器的 PTQ。
`dynamic`	`bool`	`False`	允许 TorchScript、ONNX、OpenVINO、TensorRT 和 CoreML 导出采用动态输入尺寸，增强了处理不同图像尺寸时的灵活性。
`simplify`	`bool`	`True`	使用 `onnxslim` 简化 ONNX 导出的模型图，从而潜在地提高性能以及与推理引擎的兼容性。
`opset`	`int`	`None`	指定用于与不同 ONNX 解析器和运行时兼容的 ONNX opset 版本。如果不设置，将使用最新支持的版本。
`workspace`	`float` 或 `None`	`None`	设置 TensorRT 优化的最大工作空间大小（单位为 GiB），以平衡内存使用和性能。使用 `None` 可由 TensorRT 自动分配，最高可达设备上限。
`nms`	`bool`	`False`	在支持的情况下（参见导出格式）为导出模型添加非极大值抑制（NMS），提高检测后处理效率。该选项不适用于端到端模型。
`batch`	`int`	`1`	指定导出模型的批量推理大小，即导出模型在 `predict` 模式下将同时处理的最大图像数量。对于 Edge TPU 导出，此值会自动设为 1。
`device`	`str`	`None`	指定导出的设备：GPU (`device=0`)、CPU (`device=cpu`)、适用于 Apple 芯片的 MPS (`device=mps`)、华为昇腾 NPU (`device=npu` 或 `device=npu:0`)，或者适用于 NVIDIA Jetson 的 DLA (`device=dla:0` 或 `device=dla:1`)。TensorRT 导出会自动使用 GPU，但 TensorRT 11.0 不支持 DLA。
`data`	`str`	`None`	指向 dataset 配置文件的路径，对于 INT8 量化校准至关重要。如果在启用 INT8 时未指定，Ultralytics 会在需要时选择特定于任务的校准数据集，或者回退到模型任务的默认数据集。
`fraction`	`float`	`1.0`	指定用于 INT8 量化校准的数据集比例。允许在完整数据集的一个子集上进行校准，这对实验或资源受限时非常有用。如果启用 INT8 但未指定，将使用完整数据集。
`end2end`	`bool`	`None`	覆盖支持无 NMS 推理（YOLO26、YOLOv10）的 YOLO 模型中的端到端模式。将其设为 `False`，允许你导出这些模型，以使其兼容传统的基于 NMS 的后处理流水线。详情请参阅端到端检测指南。

调整这些参数可以自定义导出过程，以满足特定要求，例如部署环境、硬件限制和性能目标。选择合适的格式和设置对于在模型大小、速度和 accuracy 之间实现最佳平衡至关重要。

可用的 YOLO26 导出格式如下表所示。你可以使用 format 参数导出为任何格式，例如 format='onnx' 或 format='engine'。你可以直接在导出的模型上进行预测或验证，例如 yolo predict model=yolo26n.onnx。导出完成后，会显示你模型的使用示例。模型也可以直接在 Ultralytics Platform 浏览器中导出，无需任何本地设置。

格式	`format` 参数	模型	元数据	参数
PyTorch	-	`yolo26n.pt`	✅	-
TorchScript	`torchscript`	`yolo26n.torchscript`	✅	`imgsz`, `half`, `dynamic`, `optimize`, `nms`, `batch`, `device`
ONNX	`onnx`	`yolo26n.onnx`	✅	`imgsz`, `half`, `int8`, `dynamic`, `simplify`, `opset`, `nms`, `batch`, `data`, `fraction`, `device`
OpenVINO	`openvino`	`yolo26n_openvino_model/`	✅	`imgsz`, `half`, `dynamic`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
TensorRT	`engine`	`yolo26n.engine`	✅	`imgsz`, `half`, `dynamic`, `simplify`, `workspace`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
CoreML	`coreml`	`yolo26n.mlpackage`	✅	`imgsz`, `dynamic`, `half`, `int8`, `nms`, `batch`, `device`
TF SavedModel	`saved_model`	`yolo26n_saved_model/`	✅	`imgsz`, `keras`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
TF GraphDef	`pb`	`yolo26n.pb`	❌	`imgsz`, `batch`, `device`
TF Lite	`tflite`	`yolo26n.tflite`	✅	`imgsz`, `half`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
TF Edge TPU	`edgetpu`	`yolo26n_edgetpu.tflite`	✅	`imgsz`, `int8`, `data`, `fraction`, `device`
TF.js	`tfjs`	`yolo26n_web_model/`	✅	`imgsz`, `half`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
PaddlePaddle	`paddle`	`yolo26n_paddle_model/`	✅	`imgsz`, `batch`, `device`
MNN	`mnn`	`yolo26n.mnn`	✅	`imgsz`, `batch`, `int8`, `half`, `device`
NCNN	`ncnn`	`yolo26n_ncnn_model/`	✅	`imgsz`, `half`, `batch`, `device`
IMX500	`imx`	`yolo26n_imx_model/`	✅	`imgsz`, `int8`, `data`, `fraction`, `nms`, `device`
RKNN	`rknn`	`yolo26n_rknn_model/`	✅	`imgsz`、`batch`、`name`、`half`、`int8`、`data`、`fraction`、`device`
ExecuTorch	`executorch`	`yolo26n_executorch_model/`	✅	`imgsz`, `batch`, `device`
Axelera	`axelera`	`yolo26n_axelera_model/`	✅	`imgsz`, `batch`, `int8`, `data`, `fraction`, `device`
DEEPX	`deepx`	`yolo26n_deepx_model/`	✅	`imgsz`, `int8`, `data`, `optimize`, `device`
Qualcomm QNN	`qnn`	`yolo26n_qnn.onnx`	✅	`imgsz`, `batch`, `name`, `int8`, `data`, `fraction`, `device`

Link to this section常见问题解答#

Link to this section如何将 YOLO26 模型导出为 ONNX 格式？#

使用 Ultralytics 将 YOLO26 模型导出为 ONNX 格式非常简单。它提供了用于导出模型的 Python 和 CLI 方法。

示例

from ultralytics import YOLO

# Load a model
model = YOLO("yolo26n.pt")  # load an official model
model = YOLO("path/to/best.pt")  # load a custom-trained model

# Export the model
model.export(format="onnx")

有关该过程的更多详细信息，包括处理不同输入大小等高级选项，请参阅 ONNX 集成指南。

Link to this section使用 TensorRT 进行模型导出的好处是什么？#

使用 TensorRT 进行模型导出可带来显著的性能提升。导出为 TensorRT 的 YOLO26 模型可实现高达 5 倍的 GPU 加速，非常适合实时推理应用。

多功能性： 针对特定硬件配置优化模型。
速度： 通过高级优化实现更快的推理。
兼容性： 与 NVIDIA 硬件无缝集成。

要了解有关集成 TensorRT 的更多信息，请参阅 TensorRT 集成指南。

Link to this section在导出 YOLO26 模型时，如何启用 INT8 量化？#

INT8 量化是压缩模型和加速推理的绝佳方式，特别是在边缘设备上。以下是你如何启用 INT8 量化的方法：

示例

from ultralytics import YOLO

model = YOLO("yolo26n.pt")  # Load a model
model.export(format="onnx", int8=True, data="coco8.yaml")

INT8 quantization can be applied to various formats, such as ONNX, TensorRT, OpenVINO, CoreML, and Rockchip RKNN. For optimal quantization results, provide a representative dataset using the data parameter.

Link to this section为什么导出模型时动态输入大小很重要？#

动态输入大小允许导出的模型处理不同的图像尺寸，从而在不同用例中提供灵活性并优化处理效率。当导出为 ONNX 或 TensorRT 等格式时，启用动态输入大小可确保模型能够无缝适应不同的输入形状。

要启用此功能，请在导出期间使用 dynamic=True 标志：

示例

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.export(format="onnx", dynamic=True)

动态输入大小对于输入尺寸可能会变化的应用程序特别有用，例如视频处理或处理来自不同来源的图像时。

Link to this section优化模型性能时需要考虑哪些关键导出参数？#

理解和配置导出参数对于优化模型性能至关重要：

format: 导出模型的目标格式（例如 onnx、torchscript、tensorflow）。
imgsz: 模型输入所需的图像大小（例如 640 或 (height, width)）。
half: 启用 FP16 量化，减小模型大小并可能加速推理。
optimize: 为移动设备或受限环境应用特定优化。
int8: 启用 INT8 量化，这对边缘 AI 部署非常有益。

对于在特定硬件平台上的部署，请考虑使用专门的导出格式，例如用于 NVIDIA GPU 的 TensorRT、用于 Apple 设备的 CoreML，或用于 Google Coral 设备的 Edge TPU。

Link to this section导出的 YOLO 模型中的输出张量代表什么？#

当你将 YOLO 模型导出为 ONNX 或 TensorRT 等格式时，输出张量结构取决于模型任务。理解这些输出对于自定义推理实现很重要。

对于 YOLO26 detection models（例如 yolo26n.pt），端到端导出在支持的格式中默认启用，因此输出形状为 (batch_size, max_detections, 6)，包含 [x1, y1, x2, y2, confidence, class_id] 值。在默认的 max_det=300 设置下，这通常为 (batch_size, 300, 6)。在某些受限格式下，当不支持端到端算子时，会自动回退到传统的输出布局。

对于非端到端检测模型，或以 end2end=False 导出的 YOLO26 模型，输出通常是一个形状为 (batch_size, 4 + num_classes, num_predictions) 的单个张量，其中通道代表边界框坐标加上每个类别的分数，而 num_predictions 取决于导出时的输入分辨率（且可以是动态的）。

对于 分割模型（例如 yolo26n-seg.pt），你通常会得到两个输出：第一个形状为 (batch_size, 4 + num_classes + mask_dim, num_predictions) 的张量（包含边界框、类分数和掩码系数），第二个形状为 (batch_size, mask_dim, proto_h, proto_w) 的张量包含与系数一起使用以生成实例掩码的掩码原型。尺寸取决于导出输入分辨率（并且可以是动态的）。

对于 姿态模型（例如 yolo26n-pose.pt），输出张量形状通常为 (batch_size, 4 + num_classes + keypoint_dims, num_predictions)，其中 keypoint_dims 取决于姿态规范（例如关键点数量以及是否包含置信度），num_predictions 取决于导出的输入分辨率（并且可以是动态的）。

ONNX 推理示例中的示例展示了如何针对每种模型类型处理这些输出。

Link to this sectionWhy is `output0` FP32 when exporting with `half=True` and `end2end=True`?#

当使用 half=True（或 int8=True）导出时，大多数张量被转换为较低精度以减小模型大小并提高性能。但是，当启用 end2end=True 时，后处理（包括类索引）被直接嵌入到导出的图中。

output0 张量包含类索引，这些索引在内部表示为浮点值。由于 FP16 尾数精度有限，它无法可靠地表示 2048 以上的整数值。为了避免潜在的精度损失或错误的类别 ID，output0 被故意保留为 FP32。

此行为是预期的，也适用于必须保持类索引保真度的低精度或量化导出。

如果需要完整的 FP16 输出，请使用 end2end=False 进行导出并在外部执行后处理。

贡献者

GLglenn-jocher²⁷ BUBurhan-Q⁴ RAraimbekovm² RIRizwanMunawar² AMambitious-octopus² KAKayzwer² SHShreyas-S-809¹ PDpderrenger¹ Y-Y-T-G¹ JKjk4e¹ MAMatthewNoyce¹

创建于 2023年11月12日更新于 3天前

Link to this section使用 Ultralytics YOLO 进行模型导出#

Link to this section简介#

Link to this section为什么要选择 YOLO26 的导出模式？#

Link to this section导出模式的主要功能#

Link to this section使用示例#

Link to this section参数#

Link to this section导出格式#

Link to this section常见问题解答#

Link to this section如何将 YOLO26 模型导出为 ONNX 格式？#

Link to this section使用 TensorRT 进行模型导出的好处是什么？#

Link to this section在导出 YOLO26 模型时，如何启用 INT8 量化？#

Link to this section为什么导出模型时动态输入大小很重要？#

Link to this section优化模型性能时需要考虑哪些关键导出参数？#

Link to this section导出的 YOLO 模型中的输出张量代表什么？#

Link to this sectionWhy is `output0` FP32 when exporting with `half=True` and `end2end=True`?#

评论

Link to this section使用 Ultralytics YOLO 进行模型导出#

Link to this section简介#

Link to this section为什么要选择 YOLO26 的导出模式？#

Link to this section导出模式的主要功能#

Link to this section使用示例#

Link to this section参数#

Link to this section导出格式#

Link to this section常见问题解答#

Link to this section如何将 YOLO26 模型导出为 ONNX 格式？#

Link to this section使用 TensorRT 进行模型导出的好处是什么？#

Link to this section在导出 YOLO26 模型时，如何启用 INT8 量化？#

Link to this section为什么导出模型时动态输入大小很重要？#

Link to this section优化模型性能时需要考虑哪些关键导出参数？#

Link to this section导出的 YOLO 模型中的输出张量代表什么？#

Link to this sectionWhy is output0 FP32 when exporting with half=True and end2end=True?#

评论

Link to this sectionWhy is `output0` FP32 when exporting with `half=True` and `end2end=True`?#