Releases: huggingface/pytorch-image-models
Releases · huggingface/pytorch-image-models
Release v1.0.26
March 23, 2026
- Improve pickle checkpoint handling security. Default all loading to
weights_only=True, add safe_global for ArgParse. - Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass
is_causalthrough for SSL tasks. - Fix class & register token uses with ViT and no pos embed enabled.
- Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
- Improve consistency of output projection / MLP dimensions for attention pooling layers.
- Hiera model F.SDPA optimization to allow Flash Attention kernel use.
- Caution added to SGDP optimizer.
- Release 1.0.26. First maintenance release since my departure from Hugging Face.
What's Changed
- fix: replace 5 bare except clauses with except Exception by @haosenwang1018 in #2672
- Add timmx model export tool to README by @Boulaouaney in #2673
- Enhance SGDP optimizer with caution parameter by @Yuan-Jinghui in #2675
- Fix CLS and Reg tokens usage when pos_embed is disabled by @sinahmr in #2676
- default weights_only=True for load fns by @rwightman in #2679
- Fix Hiera global attention to use 4D tensors for efficient SDPA dispatch by @Raiden129 in #2680
- Improve 2d and latent attention pool dimension handling. Fix #2682 by @rwightman in #2684
- Improve attention mask handling for vision_transformer and eva and related blocks by @rwightman in #2686
- Implement PRR as a pooling module. Alternative to #2678 by @rwightman in #2685
New Contributors
- @haosenwang1018 made their first contribution in #2672
- @Raiden129 made their first contribution in #2680
Full Changelog: v1.0.25...v1.0.26
Release v1.0.25
Feb 23, 2026
- Add token distillation training support to distillation task wrappers
- Remove some torch.jit usage in prep for official deprecation
- Caution added to AdamP optimizer
- Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
- Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
- Release 1.0.25
Jan 21, 2026
- Compat Break: Fix oversight w/ QKV vs MLP bias in
ParallelScalingBlock(&DiffParallelScalingBlock)- Does not impact any trained
timmmodels but could impact downstream use.
- Does not impact any trained
What's Changed
- Token distill task & distill task refactoring by @rwightman in #2647
- Fix distilled head dropout using wrong token in PiT forward_head by @hassonofer in #2649
- Fix #2653, no models with weights impacted so just a clean fix by @rwightman in #2654
- Add the cautious optimizer to AdamP. by @Yuan-Jinghui in #2657
- Enhance the numerical stability of the Cautious Optimizer by @Yuan-Jinghui in #2658
- Some misc fixes for torch.jit deprecation and meta device init by @rwightman in #2664
- fix(optim): replace bare except with Exception in Lion optimizer by @llukito in #2666
- Change clamp_min_ to clamp_(min=) as former doesn't work with DTensor / FSDP2 by @rwightman in #2668
- Add DTensor compatible NS impl for Muon by @rwightman in #2669
New Contributors
- @Yuan-Jinghui made their first contribution in #2657
- @llukito made their first contribution in #2666
Full Changelog: v1.0.24...v1.0.25
Release v1.0.24
Jan 5 & 6, 2025
- Patch Release 1.0.24 (fix for 1.0.23)
- Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
- Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
- Release 1.0.23
Dec 30, 2025
- Add better NAdaMuon trained
dpwee,dwee,dlittle(differential) ViTs with a small boost over previous runs - Add a ~21M param
timmvariant of the CSATv2 model at 512x512 & 640x640- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
- Factor non-persistent param init out of
__init__into a common method that can be externally called viainit_non_persistent_buffers()after meta-device init.
Dec 12, 2025
- Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
- Add AdaMuon and NAdaMuon optimizer support to existing
timmMuon impl. Appears more competitive vs AdamW with familiar hparams for image tasks. - End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (
DiffAttention), add correspondingDiffParallelScalingBlock(for ViT), train some wee vits - Add a few pooling modules,
LsePlusandSimPool - Cleanup, optimize
DropBlock2d(also add support to ByobNet based models)
- Merge differential attention (
- Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10
Dec 1, 2025
- Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
- Remove old APEX AMP support
What's Changed
- Add val-interval argument by @t0278611 in #2606
- Add coord attn and some variants that I had lying around by @rwightman in #2617
- Distill fixups by @rwightman in #2598
- A simplification and some fixes for DropBlock2d. by @rwightman in #2620
- Other pooling... by @rwightman in #2621
- Experimenting with differential attention by @rwightman in #2314
- Differential + parallel attn by @rwightman in #2625
- AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
- Csatv2 contribution by @rwightman in #2627
- Add HParams sections to hfdocs by @rwightman in #2630
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
- [BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
- 'init_non_persistent_buffers' scheme by @rwightman in #2632
- Add docstrings to layer helper functions and modules by @raimbekovm in #2634
- refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
- A few misc weights to close out 2025 by @rwightman in #2639
- Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641
New Contributors
- @t0278611 made their first contribution in #2606
- @salmanmkc made their first contribution in #2633
- @tesfaldet made their first contribution in #2631
- @raimbekovm made their first contribution in #2634
- @haru-256 made their first contribution in #2640
Full Changelog: v1.0.22...v1.0.24
Release v1.0.23
Dec 30, 2025
- Add better NAdaMuon trained
dpwee,dwee,dlittle(differential) ViTs with a small boost over previous runs - Add a ~21M param
timmvariant of the CSATv2 model at 512x512 & 640x640- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
- Factor non-persistent param init out of
__init__into a common method that can be externally called viainit_non_persistent_buffers()after meta-device init.
Dec 12, 2025
- Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
- Add AdaMuon and NAdaMuon optimizer support to existing
timmMuon impl. Appears more competitive vs AdamW with familiar hparams for image tasks. - End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (
DiffAttention), add correspondingDiffParallelScalingBlock(for ViT), train some wee vits - Add a few pooling modules,
LsePlusandSimPool - Cleanup, optimize
DropBlock2d(also add support to ByobNet based models)
- Merge differential attention (
- Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10
Dec 1, 2025
- Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
- Remove old APEX AMP support
What's Changed
- Add val-interval argument by @t0278611 in #2606
- Add coord attn and some variants that I had lying around by @rwightman in #2617
- Distill fixups by @rwightman in #2598
- A simplification and some fixes for DropBlock2d. by @rwightman in #2620
- Other pooling... by @rwightman in #2621
- Experimenting with differential attention by @rwightman in #2314
- Differential + parallel attn by @rwightman in #2625
- AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
- Csatv2 contribution by @rwightman in #2627
- Add HParams sections to hfdocs by @rwightman in #2630
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
- [BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
- 'init_non_persistent_buffers' scheme by @rwightman in #2632
- Add docstrings to layer helper functions and modules by @raimbekovm in #2634
- refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
- A few misc weights to close out 2025 by @rwightman in #2639
- Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641
New Contributors
- @t0278611 made their first contribution in #2606
- @salmanmkc made their first contribution in #2633
- @tesfaldet made their first contribution in #2631
- @raimbekovm made their first contribution in #2634
- @haru-256 made their first contribution in #2640
Full Changelog: v1.0.22...v1.0.23
Release v1.0.22
Patch release for priority LayerScale initialization regression in 1.0.21
What's Changed
- Add some weights for efficientnet_x / efficientnet_h models by @rwightman in #2602
- Update result csvs by @rwightman in #2603
- Fix LayerScale ignoring init_values by @Ilya-Fradlin in #2605
New Contributors
- @Ilya-Fradlin made their first contribution in #2605
Full Changelog: v1.0.21...v1.0.22
Release v1.0.21
Oct 16-20, 2025
- Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if
nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag) - like torch impl, select from several LR scale adjustment fns via
adjust_lr_fn - select from several NS coefficient presets or specify your own via
ns_coefficients
- First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in
timm
- License fields added to pretrained cfgs in code
- Release 1.0.21
What's Changed
- Add calculate_drop_path_rates helper by @rwightman in #2589
- Review
huggingface_hubintegration by @Wauplin in #2592 - Adding device/dtype factory_kwargs to modules and models by @rwightman in #2591
- Consistent license handling throughout timm by @alexanderdann in #2585
- Add impl of Muon optimizer. Fix #2580 by @rwightman in #2596
- Rename 'simple' flag for Muon to 'fallback' by @rwightman in #2599
New Contributors
- @alexanderdann made their first contribution in #2585
Full Changelog: v1.0.20...v1.0.21
Release v1.0.20
Sept 21, 2025
- Remap DINOv3 ViT weight tags from
lvd_1689m->lvd1689mto match (same forsat_493m->sat493m) - Release 1.0.20
Sept 17, 2025
- DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing
timmmodel. ViT support done via the EVA base model w/ a newRotaryEmbeddingDinoV3to match the DINOv3 specific RoPE impl - MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
- MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
- SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
- Misc fixes and contributions
What's Changed
- Pass init_values at hieradet_sam2 by @hassonofer in #2559
- Add mobileclip2 encoder weights by @rwightman in #2560
- Add support for Gemma 3n MobileNetV5 encoder weight loading by @rwightman in #2561
- Fix #2562, add siglip2 naflex vit encoder weights by @rwightman in #2564
- fix: create results_dir if missing before saving results by @zhima771 in #2576
- feat(validate): add precision, recall, and F1 metrics by @ha405 in #2568
- Allow user to ask for features other than image and label in ImageDataset by @grodino in #2571
- Add MobileCLIP2 image encoders by @rwightman in #2578
- Add DINOv3 support by @rwightman in #2579
New Contributors
- @hassonofer made their first contribution in #2559
- @zhima771 made their first contribution in #2576
- @ha405 made their first contribution in #2568
Full Changelog: v1.0.19...v1.0.20
Release v1.0.19
Patch release for Python 3.9 compat break in 1.0.18
July 23, 2025
- Add
set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. - Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
- Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (
eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time - More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
- Support set_input_size() in EVA models by @rwightman in #2554
Full Changelog: v1.0.17...v1.0.18
Release v1.0.18
July 23, 2025
- Add
set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. - Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (
eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time - More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
What's Changed
- Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
- Support set_input_size() in EVA models by @rwightman in #2554
Full Changelog: v1.0.17...v1.0.18
Release v1.0.17
July 7, 2025
- MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
- Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
- Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
- Some typing, argument cleanup for norm, norm+act layers done with above
- Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in
eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
| model | img_size | top1 | top5 | param_count |
|---|---|---|---|---|
| vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 |
| vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 |
| vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 |
| vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 |
| vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 |
| vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 |
| vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 |
| vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 |
| vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 |
| vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 |
| vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 |
| vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 |
- Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
- Preparing version 1.0.17 release
What's Changed
- Adding Naver rope-vit compatibility to EVA ViT by @rwightman in #2529
- Update no_grad usage to inference_mode if possible by @GuillaumeErhard in #2534
- Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in #2537
- Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in #2538
- Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in #2536
- fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in #2533
- Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in #2539
- Fix H, W ordering for xy indexing in ROPE by @rwightman in #2541
- Fix 3 typos in README.md by @robin-ede in #2544
New Contributors
- @GuillaumeErhard made their first contribution in #2534
- @RyanMullins made their first contribution in #2533
- @robin-ede made their first contribution in #2544
Full Changelog: v1.0.16...v1.0.17