Conversation
b73a79c to
dad1ca1
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@gusdlf93 hey, I used claude to make some additional changes to fit timm norms a bit better, it did require remapping checkpoints though. I verified 80.024% accuracy remains. Unfortunately the diff of the model got messed up (can't see what was changed) because your commit was a mix of CRLF and LF and it got cleaned to LF only which touched every line. An interesting model for higher resolution. |
|
I may add a few more small things like grad checkpointing, and then I guess I'll push a remapped checkpoint to the timm org that references the original |
…inal norm is 2d so we can disable pooling if desired. Still inconsistent line endings
|
Thanks a lot for taking over and polishing the implementation. For reproducibility and detailed training recipes, I’ve documented everything in the Hugging Face model card: |
|
@gusdlf93 okay thanks, I'm probably not going to get a chance to merge this for a few more days, I feel it's in a good state but I have a few days off and wanted to check a few more small things. |
…dynamic for other network shapes, allow drop path option for transformer blocks.
…able and define another model class to have another arch config tested.
|
@gusdlf93 ready to merge, just letting final test run, I've pushed the weights to https://huggingface.co/timm/csatv2 and copied over your model card info... the timm impl of the arch now supports changing model widths/depths via args so can define other related models. |
|
Also, if you could clarify the license for the model card that'd be great ... is it Apache 2.0? |
|
The model is licensed under Apache 2.0, allowing for unrestricted commercial use. Also, I heard that you are currently on vacation. |
Continuation of work in #2624 by @gusdlf93