Skip to content

Onnx enable tasks for supported models (part 2)#14700

Merged
michaelbenayoun merged 11 commits intohuggingface:masterfrom
michaelbenayoun:onnx_enable_tasks_for_supported_models_part_2
Dec 22, 2021
Merged

Onnx enable tasks for supported models (part 2)#14700
michaelbenayoun merged 11 commits intohuggingface:masterfrom
michaelbenayoun:onnx_enable_tasks_for_supported_models_part_2

Conversation

@michaelbenayoun
Copy link
Member

@michaelbenayoun michaelbenayoun commented Dec 9, 2021

What does this PR do?

This PR reapplies the reverted PR #14358, and solves the issues that caused the revert.


What does this PR do?

This PR adds support for almost all the features available for already supported models.

Main contributions:

  • OnnxSeq2SeqConfigWithPast: a new class inheriting from OnnxConfigWithPast designed specifically for seq2seq models, this should make things easier for the community to contribute.
  • Tests refactoring and parameterization: now every (model, feature) export pair is tested, and is considered as a standalone test (compared to before when everything was considered to be one big test).
  • A lot of new features (a feature is a task plus the choice or not to use past_key_values), that have been requested by the community (check the list of supported feautres below)

Features now supported:

  • For BERT like models: default, sequence-classification, token-classification and question-answering (multiple-choice will be added later).
  • For causal language models (GPT-2 and GPT-neo): default, default-with-past, causal-lm, causal-lm-with-past, sequence-classification and token-classification (only for GPT2).
  • For Seq2Seq models (T5, BART, mBART):
    • T5, BART, mBART: default, default-with-past, seq2seq-lm, seq2seq-lm-with-past
    • BART, mBART: causal-lm, causal-lm-with-past, sequence-classification, question-answering

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for this, this is impressive work.

I wonder if it would be possible to upstream some of the content written in generate_dummy_inputs in the raw OnnxConfig object? It seems like a lot of the code can be reused among other models.

If it cannot be done, could you mention what are the current blockers so that we may study what needs to be done? For example, a clear separation between what models are encoder-decoders, which need past key value handling, etc.

Overall I'd argue that having very self-contained methods, that don't hop between different files, is a big plus in terms of readability. Having those be very explicit in the parent ONNX configuration and called with explicit method names in the downstream model-specific ONNX configuration would be a huge plus in terms of readability, in my opinion.

@lewtun
Copy link
Member

lewtun commented Dec 17, 2021

Gently pinging @LysandreJik for his blessing on the latest round of changes :)

@sorenmc
Copy link

sorenmc commented Dec 21, 2021

Does this ONNX conversion support beam search automatically for BART based summarizers?

@lewtun
Copy link
Member

lewtun commented Dec 21, 2021

Does this ONNX conversion support beam search automatically for BART based summarizers?

Hi @sorenmc, no, you'll have to implement your own .generate() method for the ONNX models. There is a related feature request in the optimum library here. In the meantime, you might be interested in checking out BART summarization example here

@michaelbenayoun michaelbenayoun force-pushed the onnx_enable_tasks_for_supported_models_part_2 branch from c8cc572 to 600e5f2 Compare December 21, 2021 17:52
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I'm starting to think that even these ONNX changes could potentially live in optimum and that we could have a requirement on optimum if we wanted to use ONNX anywhere in the library - but I understand this might be a bit complex to maintain in the long run.

I think this adds a lot of complexity but I understand why it's needed for goo ONNX support. Ok to merge it like this but would like to revisit this at some point in the near future to discuss the separation of ONNX features in optimum/transformers

@Avi-avidan
Copy link

Avi-avidan commented May 15, 2022

hi,
thanks HF team for your great support on this.
trying to export summarization bart
transformers.version == 4.19.0.dev0
onnxruntime.version == 1.11.1

from transformers import pipeline model_name = 'lidiya/bart-base-samsum' summarizer = pipeline("summarization", model=model_name, tokenizer=model_name)

`
from transformers import AutoConfig, AutoModelForSeq2SeqLM
from transformers.models.bart import BartOnnxConfig

config = AutoConfig.from_pretrained(model_name)
onnx_config = BartOnnxConfig(config, task="default")
print(onnx_config.outputs)
`
OrderedDict([('last_hidden_state', {0: 'batch', 1: 'decoder_sequence'})])

I am trying a few export options and non of the gives me the output from the decoder.

option 1:

/Users/aavidan/envs/py39/bin/python3.9 -m transformers.onnx --model=lidiya/bart-base-samsum --feature=seq2seq-lm --atol=5e-5 onnx

output 1:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using framework PyTorch: 1.10.2
Overriding 1 configuration item(s)
- use_cache -> False
/Users/aavidan/envs/py39/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/Users/aavidan/envs/py39/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py:236: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/Users/aavidan/envs/py39/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/Users/aavidan/envs/py39/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py:907: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
Validating ONNX model...
-[✓] ONNX model output names match reference model ({'logits'})
- Validating ONNX Model output "logits":
-[✓] (2, 8, 50265) matches (2, 8, 50265)
-[✓] all values close (atol: 5e-05)
All good, model saved at: onnx/model.onnx

`
from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel

options = SessionOptions()
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

session = InferenceSession(
'onnx/model.onnx',
sess_options=options, providers=["CPUExecutionProvider"]
)

session.disable_fallback()

outputs = [i.name for i in session.get_outputs()]

feed_dict = summarizer.tokenizer(text)
feed_dict['decoder_input_ids'] = feed_dict['input_ids']
feed_dict['decoder_attention_mask'] = feed_dict['attention_mask']
feed_dict = {k: np.array([v]) for k, v in feed_dict.items()}
pred = session.run(None, feed_dict)

for i, p in enumerate(pred):
print(i, outputs[i], p.shape)
`

printout -

0 logits (1, 228, 50265)
1 1209 (1, 228, 768)

summarizer.tokenizer.decode(pred[0][0].argmax(axis=-1), skip_special_tokens=True)

what i get -

gives me back the input text, which basically means logits is simply the input_ids and I am guessing from the shape that 1209 is the encoded vectors for all tokens in the text input. if that is in fact the case, HOW DO I EXPORT THE base_model.decoder ?

option 2:

`
from pathlib import Path
from transformers.convert_graph_to_onnx import convert

convert(framework="pt", model=summarizer.model, output=Path(f"onnx/lidiya_bart1.onnx"),
opset=11, tokenizer=summarizer.tokenizer, pipeline_name="summarization")
`

this results in the following error -

using framework PyTorch: 1.10.2
found input input_ids with shape: {0: 'batch', 1: 'sequence'}
found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
found output output_0 with shape: {0: 'batch', 1: 'sequence'}
found output output_1 with shape: {0: 'batch', 2: 'sequence'}
found output output_1 with shape: {0: 'batch', 2: 'sequence'}
found output output_1 with shape: {0: 'batch', 2: 'sequence'}
found output output_1 with shape: {0: 'batch', 2: 'sequence'}
found output output_2 with shape: {0: 'batch', 2: 'sequence'}
found output output_2 with shape: {0: 'batch', 2: 'sequence'}
found output output_2 with shape: {0: 'batch', 2: 'sequence'}
found output output_2 with shape: {0: 'batch', 2: 'sequence'}
found output output_3 with shape: {0: 'batch', 2: 'sequence'}
found output output_3 with shape: {0: 'batch', 2: 'sequence'}
found output output_3 with shape: {0: 'batch', 2: 'sequence'}
found output output_3 with shape: {0: 'batch', 2: 'sequence'}
found output output_4 with shape: {0: 'batch', 2: 'sequence'}
found output output_4 with shape: {0: 'batch', 2: 'sequence'}
found output output_4 with shape: {0: 'batch', 2: 'sequence'}
found output output_4 with shape: {0: 'batch', 2: 'sequence'}
found output output_5 with shape: {0: 'batch', 2: 'sequence'}
found output output_5 with shape: {0: 'batch', 2: 'sequence'}
found output output_5 with shape: {0: 'batch', 2: 'sequence'}
found output output_5 with shape: {0: 'batch', 2: 'sequence'}
found output output_6 with shape: {0: 'batch', 2: 'sequence'}
found output output_6 with shape: {0: 'batch', 2: 'sequence'}
found output output_6 with shape: {0: 'batch', 2: 'sequence'}
found output output_6 with shape: {0: 'batch', 2: 'sequence'}
found output output_7 with shape: {0: 'batch', 1: 'sequence'}
ensuring inputs are in correct order
decoder_input_ids is not present in the generated input list.
generated inputs order: ['input_ids', 'attention_mask']

ValueError Traceback (most recent call last)
Input In [10], in <cell line: 6>()
3 from transformers.convert_graph_to_onnx import convert
5
----> 6 convert(framework="pt", model=summarizer.model, output=Path(f"onnx/lidiya_bart1.onnx"),
7 opset=11, tokenizer=summarizer.tokenizer, pipeline_name="summarization")

File ~/envs/py39/lib/python3.9/site-packages/transformers/convert_graph_to_onnx.py:395, in convert(framework, model, output, opset, tokenizer, use_external_format, pipeline_name, **model_kwargs)
393 # Export the graph
394 if framework == "pt":
--> 395 convert_pytorch(nlp, opset, output, use_external_format)
396 else:
397 convert_tensorflow(nlp, opset, output)

File ~/envs/py39/lib/python3.9/site-packages/transformers/convert_graph_to_onnx.py:285, in convert_pytorch(nlp, opset, output, use_external_format)
282 # PyTorch deprecated the enable_onnx_checker and use_external_data_format arguments in v1.11,
283 # so we check the torch version for backwards compatibility
284 if parse(torch.version) <= parse("1.10.99"):
--> 285 export(
286 nlp.model,
287 model_args,
288 f=output.as_posix(),
289 input_names=ordered_input_names,
290 output_names=output_names,
291 dynamic_axes=dynamic_axes,
292 do_constant_folding=True,
293 use_external_data_format=use_external_format,
294 enable_onnx_checker=True,
295 opset_version=opset,
296 )
297 else:
298 export(
299 nlp.model,
300 model_args,
(...)
306 opset_version=opset,
307 )

File ~/envs/py39/lib/python3.9/site-packages/torch/onnx/init.py:316, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
38 r"""
39 Exports a model into ONNX format. If model is not a
40 :class:torch.jit.ScriptModule nor a :class:torch.jit.ScriptFunction, this runs
(...)
312 model to the file f even if this is raised.
313 """
315 from torch.onnx import utils
--> 316 return utils.export(model, args, f, export_params, verbose, training,
317 input_names, output_names, operator_export_type, opset_version,
318 _retain_param_name, do_constant_folding, example_outputs,
319 strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
320 custom_opsets, enable_onnx_checker, use_external_data_format)

File ~/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py:109, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
104 if use_external_data_format is not None:
105 warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next "
106 "PyTorch release. The code will work as it is False if models are not larger than 2GB, "
107 "Otherwise set to False because of size limits imposed by Protocol Buffers.")
--> 109 _export(model, args, f, export_params, verbose, training, input_names, output_names,
110 operator_export_type=operator_export_type, opset_version=opset_version,
111 do_constant_folding=do_constant_folding, example_outputs=example_outputs,
112 dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
113 custom_opsets=custom_opsets, use_external_data_format=use_external_data_format)

File ~/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py:728, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, use_external_data_format, onnx_shape_inference)
726 if dynamic_axes is None:
727 dynamic_axes = {}
--> 728 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
730 graph, params_dict, torch_out =
731 _model_to_graph(model, args, verbose, input_names,
732 output_names, operator_export_type,
(...)
735 training=training,
736 dynamic_axes=dynamic_axes)
738 # TODO: Don't allocate a in-memory string for the protobuf

File ~/envs/py39/lib/python3.9/site-packages/torch/onnx/utils.py:1314, in _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
1312 for i, x in enumerate(value):
1313 if not isinstance(x, int):
-> 1314 raise ValueError("The type of axis index is expected to be an integer")
1315 if x in value_dict:
1316 warnings.warn("Duplicate dynamic axis index {} was provided for input {}."
1317 .format(x, key))

ValueError: The type of axis index is expected to be an integer

btw, same error when trying to export only the decoder using
convert(framework="pt", model=summarizer.model.base_model.decoder, output=Path(f"onnx/lidiya_dec.onnx"), opset=11, tokenizer=summarizer.tokenizer, pipeline_name="summarization")

option 3:

torch.onnx.export( summarizer.model, (inputs['input_ids'], inputs['attention_mask']), 'onnx/lidiya_torch_onnx_exp.onnx', opset_version=11, )

what i get -

like option 1, successfully exports the encoder (I assume by looking at the exported layers shapes). I still have an issue exporting the decoder.

btw, I saw a bunch of references of how to implement the beam serach, however all links gives are broken/NR, so please can you re-post link to that as well?

thanks a lot!

@Apetree100122
Copy link

Apetree100122 commented Feb 3, 2025

import  graph.onnx 
 convert { err }convert (framework="pt", model::summarize ,r.model,output=Path( f:\ onnx
 \ lidiya_bart1.onnx"), opset=11,tokenizer=summarize, r.tokenize , r,pipeline_name: "summarization")
// envs/py39/lib/python3.9/site_packages/transformers/convert_graph_to_onnx.py:395, 
in convert(framework, model, output, opset,tokenizer, use_external_format, pipeline_name,** model_kwargs)
 # Export the graph framework == "pt": --> convert_pytorch
(nlp, opset,
 output,
 use_external_format)  else: convert_tensorflow
(nlp, opset, output) envs/py39/lib/python3.9/site-packages/transformers/convert_graph_to_onnx.py:  convert_py.torch(n - lp, opset, 
output, use_external_format) # PyTorch 
enable_onnx_checker  use_external_data_format 
arguments in v1.11,# py.torch version bakwards compatibile {x.x.x}  parse(torch.version) <= parse("1.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants