[PoC] HF exporters by IlyasMoutawwakil · Pull Request #41992 · huggingface/transformers

IlyasMoutawwakil · 2025-11-03T14:20:21Z

What does this PR do?

Edit: some PRs were opened taking pieces of this one, like #42697 and #42317 so now it's mostly about HfExporters 🤗

This is an attempt at standardizing native transformers support of an export backend (dynamo, onnx).
Motivation:

The dynamo backend is cool and fast but also very strict compared to torchscript ; for example with torchscript, data-dependent if statements are simply traced-through with a warning, but with dynamo it tries to guard the control flow and fail a fair amount of times (see all the if not torch.compilers.is_exporting() in this PR). This means that if we were to transition in optimum-onnx/optimum-intel to dynamo export, we would have to rewrite entire modules to avoid these errors. This PR suggests adding a native component in Transformers that handles mostly monolithic export and is fully tested with all models to catch these modeling problems early on. It also gives users a friendly API to experiment with exporting freshly added models which are not yet supported in optimum-onnx. optimum-onnx will build on top of this API and be the place for seamless and easy end-to-end export, handling all the extra steps like generating the inputs, dynamic axes, splitting models (encoder-decoder, vlms), handling inference, etc.
Torch moved TorchScript and its onnx-based export to maintenance mode.
Extra: AOT inductor support in transformers 🤗 (portability)

I started with the simplest models (encoders) then decoders (with pkv inputs/outputs) and now the integration works with almost all transformers models (including encoder-decoders and vlms) except a select few.

The pkv generation step can be done by using the model's config but for simplicity I'm running a forward pass and retrieving the pkv from the outputs.
Dynamic shapes can be passed by user or generated automatically by creating a dict with Dim.AUTO and letting torch infer which axes are dynamic (simplifies dynamic export testing).

I only added the onnx exporter because it seemed very simple once the dynamo exporter was added 😅
here are some examples:

import torch

from transformers import AutoModelForMaskedLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter


model_id = "hf-internal-testing/tiny-random-BertForMaskedLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))
bert = AutoModelForMaskedLM.from_pretrained(model_id)
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
onnx_bert = exporter.export(model=bert, sample_inputs=sample_inputs)

# testing with different sized inputs
new_input = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
onnx_outputs = onnx_bert.call_reference(**new_input)  # uses numpy under the hood
ort_outputs = onnx_bert(**new_input)  # uses onnxruntime under the hood
torch.testing.assert_close(onnx_outputs[0], ort_outputs[0], rtol=1e-04, atol=1e-04)

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter
from transformers.exporters.utils import prepare_inputs_for_export


model_id = "hf-internal-testing/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
llama = AutoModelForCausalLM.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
onnx_llama = exporter.export(model=llama, sample_inputs=sample_inputs)
onnx_llama.save("onnx_llama_dynamic.onnx", external_data=True)

# testing with different sized inputs
new_inputs = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
_, new_inputs = prepare_inputs_for_export(llama, new_inputs)  # to add pkv and process related inputs
onnx_outputs = onnx_llama.call_reference(**new_inputs)  # uses numpy under the hood
ort_outputs = onnx_llama(**new_inputs)  # uses onnxruntime under the hood
torch.testing.assert_close(onnx_outputs[0], ort_outputs[0], rtol=1e-04, atol=1e-04)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-11-03T14:35:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ausal LMs

IlyasMoutawwakil · 2025-11-05T11:36:10Z

Currently all models (except a select few) are tested and pass the tests successfully !

389 passed, 87 skipped, 413 warnings in 143.73s (0:02:23)

skipped tests either:

explicitly skipped with test_torch_exportable = False, this is for custom cache models and some MoEs (15).
errors with an informative error torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNod (67).
errors with a cryptic Expected cond to be True, but got False.. (16).

…rch/onnx exportable at this point

github-actions · 2026-03-12T15:31:43Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bigbird_pegasus, deepseek_vl, deepseek_vl_hybrid, dia, flava, glm_moe_dsa, idefics, mamba2, nemotron, perceiver, splinter, swin2sr, zoedepth

github-actions · 2026-03-13T16:51:10Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41992&sha=a16df2

IlyasMoutawwakil added 6 commits October 24, 2025 14:51

initial poc

4721d30

support exporting causal models

448206d

fix cache recreation issue

46ef449

group utils

23d96d9

dynamic axis on a best effort basis

e37cf45

allow user to pass their own pkv

7552964

IlyasMoutawwakil marked this pull request as draft November 3, 2025 14:29

IlyasMoutawwakil and others added 15 commits November 4, 2025 07:49

Merge branch 'main' into hf-exporters

5857f10

misc

fba576d

cascading exports

c4a3a2d

add encoder decoder cache support

25904a1

add testing for dynamo exporter

3f95193

fix cases that are easy to fix

f07de57

disable torch export for some models using custom caches

7a9e3f7

fix more models

ba02172

solve issue in model return fake tensors

ba7b4b8

disable more models with custom caches

ad73271

fix biogpt

6b838d9

biogpt

8488793

style

41dda35

error on generative encoder decoders and process attention mask for c…

c157f03

…ausal LMs

prepare_cache_inputs_for_export helper method

6eaa9f1

IlyasMoutawwakil added 6 commits November 5, 2025 13:13

add comments about non-tested models

9c4afb5

style

cfa6977

fix bamba export

e58aca3

paligemma

d2184fe

deepseek and zamba

14ea0d2

skip reformer for its custom cache

a08b663

IlyasMoutawwakil and others added 24 commits March 9, 2026 14:32

onnxscript

f27212f

style

1e5e183

fix

94330d4

revert unnecesaary

d1528ef

fixes

25cd56e

dia

6dbdff6

Merge branch 'main' into hf-exporters

a20d7d6

revert

3b32a73

fix

482ed9c

better

3bb74af

style

a42c2b0

fix idefics

5c72216

style

9ddbe86

fix flava

de777f3

simpler post export fixes

19e3a4b

fix

5b6a928

fix

4328e25

complete rewrite of the onnx exporter in 4 stages - all models are to…

016abf9

…rch/onnx exportable at this point

Merge branch 'main' into hf-exporters

e8f8e0a

revert changes in modeling in favor of patches

1368005

fix

e1a8032

revert and patch masked mean/var

387561b

more modeling reverts

5a88d63

revert

47b5e34

IlyasMoutawwakil added 4 commits March 12, 2026 16:34

simplify inputs prepartion

90ac060

stage-specific full export testing (forward, prefill, decode) !

9ea28f5

fix for mllama

6ab1bd5

docstrings and comments

a16df25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] HF exporters#41992

[PoC] HF exporters#41992
IlyasMoutawwakil wants to merge 146 commits intomainfrom
hf-exporters

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

IlyasMoutawwakil commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading