Skip to content

Conversation

@mklimenk
Copy link

@mklimenk mklimenk commented Jul 11, 2025

This PR is adding a functionality to convert bfloat16 models to float16 models. It's using a lot of functionality from the QDQ scales fix introduced recently.

To be added:

  • Tests
  • (Potentially) refactoring to better separate QDQ stripping from bfloat16 conversion

https://jira.devtools.intel.com/browse/CVS-170592

const std::unordered_set<std::string> valid_provider_keys = {"device_type", "device_id", "device_luid", "cache_dir", "precision",
"load_config", "context", "num_of_threads", "model_priority", "num_streams", "enable_opencl_throttling", "enable_qdq_optimizer",
"enable_causallm", "disable_dynamic_shapes", "reshape_input"};
"enable_bfloat16_optimizer", "enable_causallm", "disable_dynamic_shapes", "reshape_input"};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a separate provider option needed, is it possible to detect the model has bfloat16 datatype and intrinsically enable optimization ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfatimar, because at some point OpenVINO might enable the native execution of bfloat16 models. This is a workaround until this functionality is enabled. Let's discuss it with Mayuresh and act accordingly.
Regarding the graph optimizations link you've shared: strictly speaking, it's not the same kind of optimizations as are those in the list. It's more like the QDQ scales fix we implemented earlier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have external provider options as workaround because it impacts external users and apps and need to be given a deprecation notice 2 releases in advance. I would prefer it to be handled internally

Copy link

@sfatimar sfatimar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with Mayuresh if ProviderOption can be avoided by intrinsically detecting BFloat16 or adding ep specific optimization pass https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#extended-graph-optimizations

@sfatimar sfatimar requested a review from MayureshV1 July 14, 2025 04:46
@mklimenk
Copy link
Author

Tests are prepared and pushed to a separate branch until we have a confirmation whether we want them in onnxruntime or in the internal testing repo

@mklimenk
Copy link
Author

Closing in favor of #741

@mklimenk mklimenk closed this Jul 29, 2025
@mklimenk mklimenk deleted the private/mklimenk/bfloat16_fix branch September 10, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants