Releases · huggingface/pytorch-image-models

@haosenwang1018

March 23, 2026

Improve pickle checkpoint handling security. Default all loading to weights_only=True, add safe_global for ArgParse.
Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass is_causal through for SSL tasks.
Fix class & register token uses with ViT and no pos embed enabled.
Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
Improve consistency of output projection / MLP dimensions for attention pooling layers.
Hiera model F.SDPA optimization to allow Flash Attention kernel use.
Caution added to SGDP optimizer.
Release 1.0.26. First maintenance release since my departure from Hugging Face.

What's Changed

fix: replace 5 bare except clauses with except Exception by @haosenwang1018 in #2672
Add timmx model export tool to README by @Boulaouaney in #2673
Enhance SGDP optimizer with caution parameter by @Yuan-Jinghui in #2675
Fix CLS and Reg tokens usage when pos_embed is disabled by @sinahmr in #2676
default weights_only=True for load fns by @rwightman in #2679
Fix Hiera global attention to use 4D tensors for efficient SDPA dispatch by @Raiden129 in #2680
Improve 2d and latent attention pool dimension handling. Fix #2682 by @rwightman in #2684
Improve attention mask handling for vision_transformer and eva and related blocks by @rwightman in #2686
Implement PRR as a pooling module. Alternative to #2678 by @rwightman in #2685

New Contributors

@haosenwang1018 made their first contribution in #2672
@Raiden129 made their first contribution in #2680

Full Changelog: v1.0.25...v1.0.26

@rwightman

Feb 23, 2026

Add token distillation training support to distillation task wrappers
Remove some torch.jit usage in prep for official deprecation
Caution added to AdamP optimizer
Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
Release 1.0.25

Jan 21, 2026

Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
- Does not impact any trained timm models but could impact downstream use.

What's Changed

Token distill task & distill task refactoring by @rwightman in #2647
Fix distilled head dropout using wrong token in PiT forward_head by @hassonofer in #2649
Fix #2653, no models with weights impacted so just a clean fix by @rwightman in #2654
Add the cautious optimizer to AdamP. by @Yuan-Jinghui in #2657
Enhance the numerical stability of the Cautious Optimizer by @Yuan-Jinghui in #2658
Some misc fixes for torch.jit deprecation and meta device init by @rwightman in #2664
fix(optim): replace bare except with Exception in Lion optimizer by @llukito in #2666
Change clamp_min_ to clamp_(min=) as former doesn't work with DTensor / FSDP2 by @rwightman in #2668
Add DTensor compatible NS impl for Muon by @rwightman in #2669

New Contributors

@Yuan-Jinghui made their first contribution in #2657
@llukito made their first contribution in #2666

Full Changelog: v1.0.24...v1.0.25

@t0278611

Jan 5 & 6, 2025

Patch Release 1.0.24 (fix for 1.0.23)
Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
Release 1.0.23

Dec 30, 2025

Add better NAdaMuon trained dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
- https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
- https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1)
- https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
Add a ~21M param timm variant of the CSATv2 model at 512x512 & 640x640
- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
Factor non-persistent param init out of __init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.

Dec 12, 2025

Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
Add AdaMuon and NAdaMuon optimizer support to existing timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
  - https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
  - https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
- Add a few pooling modules, LsePlus and SimPool
- Cleanup, optimize DropBlock2d (also add support to ByobNet based models)
Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10

Dec 1, 2025

Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
Remove old APEX AMP support

What's Changed

Add val-interval argument by @t0278611 in #2606
Add coord attn and some variants that I had lying around by @rwightman in #2617
Distill fixups by @rwightman in #2598
A simplification and some fixes for DropBlock2d. by @rwightman in #2620
Other pooling... by @rwightman in #2621
Experimenting with differential attention by @rwightman in #2314
Differential + parallel attn by @rwightman in #2625
AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
Csatv2 contribution by @rwightman in #2627
Add HParams sections to hfdocs by @rwightman in #2630
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
[BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
'init_non_persistent_buffers' scheme by @rwightman in #2632
Add docstrings to layer helper functions and modules by @raimbekovm in #2634
refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
A few misc weights to close out 2025 by @rwightman in #2639
Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641

New Contributors

@t0278611 made their first contribution in #2606
@salmanmkc made their first contribution in #2633
@tesfaldet made their first contribution in #2631
@raimbekovm made their first contribution in #2634
@haru-256 made their first contribution in #2640

Full Changelog: v1.0.22...v1.0.24

@t0278611

Dec 30, 2025

Add better NAdaMuon trained dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
- https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
- https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1)
- https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
Add a ~21M param timm variant of the CSATv2 model at 512x512 & 640x640
- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
Factor non-persistent param init out of __init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.

Dec 12, 2025

Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
Add AdaMuon and NAdaMuon optimizer support to existing timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
  - https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
  - https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
- Add a few pooling modules, LsePlus and SimPool
- Cleanup, optimize DropBlock2d (also add support to ByobNet based models)
Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10

Dec 1, 2025

Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
Remove old APEX AMP support

What's Changed

Add val-interval argument by @t0278611 in #2606
Add coord attn and some variants that I had lying around by @rwightman in #2617
Distill fixups by @rwightman in #2598
A simplification and some fixes for DropBlock2d. by @rwightman in #2620
Other pooling... by @rwightman in #2621
Experimenting with differential attention by @rwightman in #2314
Differential + parallel attn by @rwightman in #2625
AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
Csatv2 contribution by @rwightman in #2627
Add HParams sections to hfdocs by @rwightman in #2630
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
[BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
'init_non_persistent_buffers' scheme by @rwightman in #2632
Add docstrings to layer helper functions and modules by @raimbekovm in #2634
refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
A few misc weights to close out 2025 by @rwightman in #2639
Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641

New Contributors

@t0278611 made their first contribution in #2606
@salmanmkc made their first contribution in #2633
@tesfaldet made their first contribution in #2631
@raimbekovm made their first contribution in #2634
@haru-256 made their first contribution in #2640

Full Changelog: v1.0.22...v1.0.23

@rwightman

Patch release for priority LayerScale initialization regression in 1.0.21

What's Changed

Add some weights for efficientnet_x / efficientnet_h models by @rwightman in #2602
Update result csvs by @rwightman in #2603
Fix LayerScale ignoring init_values by @Ilya-Fradlin in #2605

New Contributors

@Ilya-Fradlin made their first contribution in #2605

Full Changelog: v1.0.21...v1.0.22

@rwightman

Oct 16-20, 2025

Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
- like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
- select from several NS coefficient presets or specify your own via ns_coefficients
First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
License fields added to pretrained cfgs in code
Release 1.0.21

What's Changed

Add calculate_drop_path_rates helper by @rwightman in #2589
Review huggingface_hub integration by @Wauplin in #2592
Adding device/dtype factory_kwargs to modules and models by @rwightman in #2591
Consistent license handling throughout timm by @alexanderdann in #2585
Add impl of Muon optimizer. Fix #2580 by @rwightman in #2596
Rename 'simple' flag for Muon to 'fallback' by @rwightman in #2599

New Contributors

@alexanderdann made their first contribution in #2585

Full Changelog: v1.0.20...v1.0.21

@hassonofer

Sept 21, 2025

Remap DINOv3 ViT weight tags from lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)
Release 1.0.20

Sept 17, 2025

DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing timm model. ViT support done via the EVA base model w/ a new RotaryEmbeddingDinoV3 to match the DINOv3 specific RoPE impl
- HuggingFace Hub: https://huggingface.co/collections/timm/timm-dinov3-68cb08bb0bee365973d52a4d
MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
Misc fixes and contributions

What's Changed

Pass init_values at hieradet_sam2 by @hassonofer in #2559
Add mobileclip2 encoder weights by @rwightman in #2560
Add support for Gemma 3n MobileNetV5 encoder weight loading by @rwightman in #2561
Fix #2562, add siglip2 naflex vit encoder weights by @rwightman in #2564
fix: create results_dir if missing before saving results by @zhima771 in #2576
feat(validate): add precision, recall, and F1 metrics by @ha405 in #2568
Allow user to ask for features other than image and label in ImageDataset by @grodino in #2571
Add MobileCLIP2 image encoders by @rwightman in #2578
Add DINOv3 support by @rwightman in #2579

New Contributors

@hassonofer made their first contribution in #2559
@zhima771 made their first contribution in #2576
@ha405 made their first contribution in #2568

Full Changelog: v1.0.19...v1.0.20

@rwightman

Patch release for Python 3.9 compat break in 1.0.18

July 23, 2025

Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

@rwightman

July 23, 2025

Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0

July 21, 2025

ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

@rwightman

July 7, 2025

MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
Some typing, argument cleanup for norm, norm+act layers done with above
Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub

model	img_size	top1	top5	param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k	224	84.84	97.122	304.4
vit_large_patch16_rope_mixed_224.naver_in1k	224	84.828	97.116	304.2
vit_large_patch16_rope_ape_224.naver_in1k	224	84.65	97.154	304.37
vit_large_patch16_rope_224.naver_in1k	224	84.648	97.122	304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k	224	83.894	96.754	86.59
vit_base_patch16_rope_mixed_224.naver_in1k	224	83.804	96.712	86.44
vit_base_patch16_rope_ape_224.naver_in1k	224	83.782	96.61	86.59
vit_base_patch16_rope_224.naver_in1k	224	83.718	96.672	86.43
vit_small_patch16_rope_224.naver_in1k	224	81.23	95.022	21.98
vit_small_patch16_rope_mixed_224.naver_in1k	224	81.216	95.022	21.99
vit_small_patch16_rope_ape_224.naver_in1k	224	81.004	95.016	22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k	224	80.986	94.976	22.06

Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
Preparing version 1.0.17 release

What's Changed

Adding Naver rope-vit compatibility to EVA ViT by @rwightman in #2529
Update no_grad usage to inference_mode if possible by @GuillaumeErhard in #2534
Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in #2537
Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in #2538
Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in #2536
fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in #2533
Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in #2539
Fix H, W ordering for xy indexing in ROPE by @rwightman in #2541
Fix 3 typos in README.md by @robin-ede in #2544

New Contributors

@GuillaumeErhard made their first contribution in #2534
@RyanMullins made their first contribution in #2533
@robin-ede made their first contribution in #2544

Full Changelog: v1.0.16...v1.0.17

Uh oh!

Releases: huggingface/pytorch-image-models

Release v1.0.26

March 23, 2026

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.25

Feb 23, 2026

Jan 21, 2026

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.24

Jan 5 & 6, 2025

Dec 30, 2025

Dec 12, 2025

Dec 1, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.23

Dec 30, 2025

Dec 12, 2025

Dec 1, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.22

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.21

Oct 16-20, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.20

Sept 21, 2025

Sept 17, 2025

What's Changed

New Contributors

Contributors

Uh oh!

Release v1.0.19

July 23, 2025

July 21, 2025

What's Changed

Contributors

Uh oh!

Release v1.0.18

July 23, 2025

July 21, 2025

What's Changed

Contributors

Uh oh!

Release v1.0.17

July 7, 2025

What's Changed

New Contributors

Contributors

Uh oh!