fix bugs in external data helpers and add add size thresholds for ser…#3171
fix bugs in external data helpers and add add size thresholds for ser…#3171askhade merged 9 commits intoonnx:masterfrom
Conversation
…ializing tensors Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
|
Are there any test cases that need to be updated? |
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Done |
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
|
|
||
|
|
||
| def remove_external_data_field(tensor, field_key): # type: (TensorProto, Text) -> None | ||
| def _remove_external_data_field(tensor, field_key): # type: (TensorProto, Text) -> None |
There was a problem hiding this comment.
Actually I don't see it has been used. Not sure why it is here.
| if not tensor.HasField("raw_data"): | ||
| raise ValueError("Tensor " + tensor.name + "does not have raw_data field. Cannot set external data for this tensor.") | ||
|
|
||
| del tensor.external_data[:] |
There was a problem hiding this comment.
Maybe we should clean tensor.raw_data as well because it's being set to external
There was a problem hiding this comment.
right now converting a model to external data is done in 2 steps. In the first step tensors are simply marked to use external data and in the second step the data is actually serialized. This method is called in both the steps. So we cant simply clean the raw_data here... Moreover convert_model_from_external_data also checks for raw_data filed which I am not sure why... I am planning to address this in another PR along with adding a method to convert model to external data in 1 step.
| for tensor in _get_all_tensors(model): | ||
| set_external_data(tensor, file_name) | ||
| if tensor.HasField("raw_data") and sys.getsizeof(tensor.raw_data) >= size_threshold: | ||
| set_external_data(tensor, file_name) |
There was a problem hiding this comment.
Shouldn't multiple tensors be stored at different offsets within the same file? I don't see any offsets here. How does this work? (Alternatively, if that feature is not being used, may be we can remove it for now until we can add a proper version.)
There was a problem hiding this comment.
This method simply marks the tensors to use external data. The offsets are properly added when the data is actually written to the file. See save_external_data line 175
I tested writing all tensors to 1 file and it works.
There was a problem hiding this comment.
This is one reason to expose a single method that does both things. Otherwise, we have a fragile dependence between the two parts, with the model being in a weird non-standard representation in between the two steps ... if the user ends up doing other things in between the two steps, the potential for bugs increases.
There was a problem hiding this comment.
yes I agree... we should have a single method to do this... I will cover it as part of a new PR
| """ | ||
| call to set all tensors as external data. save_model saves all the tensors data as external data after calling this function. | ||
| Call to set all tensors with raw data as external data. This call should preceed 'save_model'. | ||
| 'save_model' saves all the tensors data as external data after calling this function. |
There was a problem hiding this comment.
Could we just call save_model as the last step of this function and rename it "save_model_with_external_data" or something like that? What is the value in splitting it into two different calls?
There was a problem hiding this comment.
I don't know why this was split in 2 calls... I am planning to add another method which will do all this in a single call. For now let's keep this method because there are partners who have a dependency on this and I don't want to change this before prior notification.
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
Signed-off-by: Ashwini Khade <askhade@microsoft.com>
This PR addresses issues mentioned in #2332 and a few more.
Signed-off-by: Ashwini Khade askhade@microsoft.com