Skip to content

TF port of Convnextv2#23155

Closed
IMvision12 wants to merge 7 commits intohuggingface:mainfrom
IMvision12:convnextv2
Closed

TF port of Convnextv2#23155
IMvision12 wants to merge 7 commits intohuggingface:mainfrom
IMvision12:convnextv2

Conversation

@IMvision12
Copy link
Contributor

What does this PR do?

TF port of convnextv2

@amyeroberts

@IMvision12
Copy link
Contributor Author

While converting pt weights to TensorFlow I am getting this error:
how to solve this?

All PyTorch model weights were used when initializing TFConvNextV2ForImageClassification.

All the weights of TFConvNextV2ForImageClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFConvNextV2ForImageClassification for predictions without further training.
Traceback (most recent call last):
  File "/usr/local/bin/transformers-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/transformers/commands/transformers_cli.py", line 55, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/transformers/commands/pt_to_tf.py", line 344, in run
    raise ValueError(
ValueError: The cross-loaded TensorFlow model has different outputs, something went wrong!

List of maximum output differences above the threshold (5e-05):
logits: 3.871e+00

List of maximum hidden layer differences above the threshold (5e-05):
hidden_states[1]: 3.463e-01
hidden_states[2]: 1.682e+00
hidden_states[3]: 2.259e+01
hidden_states[4]: 6.839e-01

Code used:

!transformers-cli pt-to-tf --model-name facebook/convnextv2-nano-1k-224 --no-pr --local-dir /content/convnextv2-nano-1k-224

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 4, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Overall the PR looks good, just some small nits here and there.

Regarding the hidden layer differences, the way to solve it is to find the line(s) of code in the TF model contributing to the difference. The best thing to do is to bisect through the layers and their output activations when the equivalent PT and TF model are fed the same input.

In the output of the conversion script, we can see that the difference between the pytorch and tensorflow hidden states for the first block is already ~0.3, which is large. As a large difference appears in the first stage, I would load a small model to just have one stage, and start comparing the PT and TF models form there e.g.:

from transformers import TFAutoModel, AutoModel

checkpoint = "facebook/convnextv2-tiny-1k-224"
pt_model = AutoModel.from_pretrained(checkpoint, num_stages=1)
tf_model = TFAutoModel.from_pretrained(checkpoint, from_pt=True, num_stages=1)

@IMvision12 IMvision12 closed this May 17, 2023
@IMvision12 IMvision12 deleted the convnextv2 branch May 17, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants