Skip to content

[PoC] HF exporters#41992

Draft
IlyasMoutawwakil wants to merge 146 commits intomainfrom
hf-exporters
Draft

[PoC] HF exporters#41992
IlyasMoutawwakil wants to merge 146 commits intomainfrom
hf-exporters

Conversation

@IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Nov 3, 2025

What does this PR do?

Edit: some PRs were opened taking pieces of this one, like #42697 and #42317 so now it's mostly about HfExporters 🤗

This is an attempt at standardizing native transformers support of an export backend (dynamo, onnx).
Motivation:

  • The dynamo backend is cool and fast but also very strict compared to torchscript ; for example with torchscript, data-dependent if statements are simply traced-through with a warning, but with dynamo it tries to guard the control flow and fail a fair amount of times (see all the if not torch.compilers.is_exporting() in this PR). This means that if we were to transition in optimum-onnx/optimum-intel to dynamo export, we would have to rewrite entire modules to avoid these errors. This PR suggests adding a native component in Transformers that handles mostly monolithic export and is fully tested with all models to catch these modeling problems early on. It also gives users a friendly API to experiment with exporting freshly added models which are not yet supported in optimum-onnx. optimum-onnx will build on top of this API and be the place for seamless and easy end-to-end export, handling all the extra steps like generating the inputs, dynamic axes, splitting models (encoder-decoder, vlms), handling inference, etc.
  • Torch moved TorchScript and its onnx-based export to maintenance mode.
  • Extra: AOT inductor support in transformers 🤗 (portability)

I started with the simplest models (encoders) then decoders (with pkv inputs/outputs) and now the integration works with almost all transformers models (including encoder-decoders and vlms) except a select few.

  • The pkv generation step can be done by using the model's config but for simplicity I'm running a forward pass and retrieving the pkv from the outputs.
  • Dynamic shapes can be passed by user or generated automatically by creating a dict with Dim.AUTO and letting torch infer which axes are dynamic (simplifies dynamic export testing).

I only added the onnx exporter because it seemed very simple once the dynamo exporter was added 😅
here are some examples:

import torch

from transformers import AutoModelForMaskedLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter


model_id = "hf-internal-testing/tiny-random-BertForMaskedLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))
bert = AutoModelForMaskedLM.from_pretrained(model_id)
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
onnx_bert = exporter.export(model=bert, sample_inputs=sample_inputs)

# testing with different sized inputs
new_input = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
onnx_outputs = onnx_bert.call_reference(**new_input)  # uses numpy under the hood
ort_outputs = onnx_bert(**new_input)  # uses onnxruntime under the hood
torch.testing.assert_close(onnx_outputs[0], ort_outputs[0], rtol=1e-04, atol=1e-04)
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter
from transformers.exporters.utils import prepare_inputs_for_export


model_id = "hf-internal-testing/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
llama = AutoModelForCausalLM.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
onnx_llama = exporter.export(model=llama, sample_inputs=sample_inputs)
onnx_llama.save("onnx_llama_dynamic.onnx", external_data=True)

# testing with different sized inputs
new_inputs = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
_, new_inputs = prepare_inputs_for_export(llama, new_inputs)  # to add pkv and process related inputs
onnx_outputs = onnx_llama.call_reference(**new_inputs)  # uses numpy under the hood
ort_outputs = onnx_llama(**new_inputs)  # uses onnxruntime under the hood
torch.testing.assert_close(onnx_outputs[0], ort_outputs[0], rtol=1e-04, atol=1e-04)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as draft November 3, 2025 14:29
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil
Copy link
Member Author

IlyasMoutawwakil commented Nov 5, 2025

Currently all models (except a select few) are tested and pass the tests successfully !

389 passed, 87 skipped, 413 warnings in 143.73s (0:02:23)

skipped tests either:

  • explicitly skipped with test_torch_exportable = False, this is for custom cache models and some MoEs (15).
  • errors with an informative error torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNod (67).
  • errors with a cryptic Expected cond to be True, but got False.. (16).

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bigbird_pegasus, deepseek_vl, deepseek_vl_hybrid, dia, flava, glm_moe_dsa, idefics, mamba2, nemotron, perceiver, splinter, swin2sr, zoedepth

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41992&sha=a16df2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants