Skip to content

fix bugs in external data helpers and add add size thresholds for ser…#3171

Merged
askhade merged 9 commits intoonnx:masterfrom
askhade:update_external_data_helper
Dec 16, 2020
Merged

fix bugs in external data helpers and add add size thresholds for ser…#3171
askhade merged 9 commits intoonnx:masterfrom
askhade:update_external_data_helper

Conversation

@askhade
Copy link
Copy Markdown
Contributor

@askhade askhade commented Dec 16, 2020

This PR addresses issues mentioned in #2332 and a few more.

  1. Current code marks all tensors to be serialized as external data. Fixing this to only mark the ones which have raw data.
  2. Add a threshold for tensor data size. Only tensor's with size >= threshold will be serialized to external data. This feature was designed to work with model >2GB and with current approach it tries to serialize every single tensor to external file. Therefore adding a threshold so that callers can choose which tensors to serialize
  3. Add checks for filename correctness. Current code uses tensor names as filenames. This fails when the names contains illegal characters.

Signed-off-by: Ashwini Khade askhade@microsoft.com

…ializing tensors

Signed-off-by: Ashwini Khade <askhade@microsoft.com>
@askhade askhade requested a review from a team as a code owner December 16, 2020 00:26
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
@yufenglee
Copy link
Copy Markdown
Contributor

Are there any test cases that need to be updated?

Signed-off-by: Ashwini Khade <askhade@microsoft.com>
@askhade
Copy link
Copy Markdown
Contributor Author

askhade commented Dec 16, 2020

Are there any test cases that need to be updated?

Done

Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Comment thread onnx/external_data_helper.py Outdated


def remove_external_data_field(tensor, field_key): # type: (TensorProto, Text) -> None
def _remove_external_data_field(tensor, field_key): # type: (TensorProto, Text) -> None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't see it has been used. Not sure why it is here.

if not tensor.HasField("raw_data"):
raise ValueError("Tensor " + tensor.name + "does not have raw_data field. Cannot set external data for this tensor.")

del tensor.external_data[:]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should clean tensor.raw_data as well because it's being set to external

Copy link
Copy Markdown
Contributor Author

@askhade askhade Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now converting a model to external data is done in 2 steps. In the first step tensors are simply marked to use external data and in the second step the data is actually serialized. This method is called in both the steps. So we cant simply clean the raw_data here... Moreover convert_model_from_external_data also checks for raw_data filed which I am not sure why... I am planning to address this in another PR along with adding a method to convert model to external data in 1 step.

for tensor in _get_all_tensors(model):
set_external_data(tensor, file_name)
if tensor.HasField("raw_data") and sys.getsizeof(tensor.raw_data) >= size_threshold:
set_external_data(tensor, file_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't multiple tensors be stored at different offsets within the same file? I don't see any offsets here. How does this work? (Alternatively, if that feature is not being used, may be we can remove it for now until we can add a proper version.)

Copy link
Copy Markdown
Contributor Author

@askhade askhade Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method simply marks the tensors to use external data. The offsets are properly added when the data is actually written to the file. See save_external_data line 175

I tested writing all tensors to 1 file and it works.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one reason to expose a single method that does both things. Otherwise, we have a fragile dependence between the two parts, with the model being in a weird non-standard representation in between the two steps ... if the user ends up doing other things in between the two steps, the potential for bugs increases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I agree... we should have a single method to do this... I will cover it as part of a new PR

"""
call to set all tensors as external data. save_model saves all the tensors data as external data after calling this function.
Call to set all tensors with raw data as external data. This call should preceed 'save_model'.
'save_model' saves all the tensors data as external data after calling this function.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just call save_model as the last step of this function and rename it "save_model_with_external_data" or something like that? What is the value in splitting it into two different calls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this was split in 2 calls... I am planning to add another method which will do all this in a single call. For now let's keep this method because there are partners who have a dependency on this and I don't want to change this before prior notification.

Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants