Skip to content

Releases: huggingface/pytorch-image-models

Release v1.0.26

23 Mar 18:13

Choose a tag to compare

March 23, 2026

  • Improve pickle checkpoint handling security. Default all loading to weights_only=True, add safe_global for ArgParse.
  • Improve attention mask handling for core ViT/EVA models & layers. Resolve bool masks, pass is_causal through for SSL tasks.
  • Fix class & register token uses with ViT and no pos embed enabled.
  • Add Patch Representation Refinement (PRR) as a pooling option in ViT. Thanks Sina (https://github.com/sinahmr).
  • Improve consistency of output projection / MLP dimensions for attention pooling layers.
  • Hiera model F.SDPA optimization to allow Flash Attention kernel use.
  • Caution added to SGDP optimizer.
  • Release 1.0.26. First maintenance release since my departure from Hugging Face.

What's Changed

New Contributors

Full Changelog: v1.0.25...v1.0.26

Release v1.0.25

23 Feb 17:22

Choose a tag to compare

Feb 23, 2026

  • Add token distillation training support to distillation task wrappers
  • Remove some torch.jit usage in prep for official deprecation
  • Caution added to AdamP optimizer
  • Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
  • Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
  • Release 1.0.25

Jan 21, 2026

  • Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
    • Does not impact any trained timm models but could impact downstream use.

What's Changed

  • Token distill task & distill task refactoring by @rwightman in #2647
  • Fix distilled head dropout using wrong token in PiT forward_head by @hassonofer in #2649
  • Fix #2653, no models with weights impacted so just a clean fix by @rwightman in #2654
  • Add the cautious optimizer to AdamP. by @Yuan-Jinghui in #2657
  • Enhance the numerical stability of the Cautious Optimizer by @Yuan-Jinghui in #2658
  • Some misc fixes for torch.jit deprecation and meta device init by @rwightman in #2664
  • fix(optim): replace bare except with Exception in Lion optimizer by @llukito in #2666
  • Change clamp_min_ to clamp_(min=) as former doesn't work with DTensor / FSDP2 by @rwightman in #2668
  • Add DTensor compatible NS impl for Muon by @rwightman in #2669

New Contributors

Full Changelog: v1.0.24...v1.0.25

Release v1.0.24

07 Jan 00:28

Choose a tag to compare

Jan 5 & 6, 2025

  • Patch Release 1.0.24 (fix for 1.0.23)
  • Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
  • Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
  • Release 1.0.23

Dec 30, 2025

Dec 12, 2025

Dec 1, 2025

  • Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
  • Remove old APEX AMP support

What's Changed

New Contributors

Full Changelog: v1.0.22...v1.0.24

Release v1.0.23

05 Jan 21:42

Choose a tag to compare

Dec 30, 2025

Dec 12, 2025

Dec 1, 2025

  • Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
  • Remove old APEX AMP support

What's Changed

New Contributors

Full Changelog: v1.0.22...v1.0.23

Release v1.0.22

05 Nov 04:08

Choose a tag to compare

Patch release for priority LayerScale initialization regression in 1.0.21

What's Changed

New Contributors

Full Changelog: v1.0.21...v1.0.22

Release v1.0.21

24 Oct 22:39

Choose a tag to compare

Oct 16-20, 2025

  • Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
    • extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
    • small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
    • by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
    • like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
    • select from several NS coefficient presets or specify your own via ns_coefficients
  • First 2 steps of 'meta' device model initialization supported
    • Fix several ops that were breaking creation under 'meta' device context
    • Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
  • License fields added to pretrained cfgs in code
  • Release 1.0.21

What's Changed

New Contributors

Full Changelog: v1.0.20...v1.0.21

Release v1.0.20

21 Sep 17:28

Choose a tag to compare

Sept 21, 2025

  • Remap DINOv3 ViT weight tags from lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)
  • Release 1.0.20

Sept 17, 2025

What's Changed

New Contributors

Full Changelog: v1.0.19...v1.0.20

Release v1.0.19

24 Jul 03:06

Choose a tag to compare

Patch release for Python 3.9 compat break in 1.0.18

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
  • Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

  • Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
  • Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

Release v1.0.18

23 Jul 20:03

Choose a tag to compare

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

  • Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
  • Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

Release v1.0.17

10 Jul 16:04

Choose a tag to compare

July 7, 2025

  • MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
    • Add stem bias (zero'd in updated weights, compat break with old weights)
    • GELU -> GELU (tanh approx). A minor change to be closer to JAX
  • Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
  • Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
  • Some typing, argument cleanup for norm, norm+act layers done with above
  • Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
model img_size top1 top5 param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k 224 84.84 97.122 304.4
vit_large_patch16_rope_mixed_224.naver_in1k 224 84.828 97.116 304.2
vit_large_patch16_rope_ape_224.naver_in1k 224 84.65 97.154 304.37
vit_large_patch16_rope_224.naver_in1k 224 84.648 97.122 304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k 224 83.894 96.754 86.59
vit_base_patch16_rope_mixed_224.naver_in1k 224 83.804 96.712 86.44
vit_base_patch16_rope_ape_224.naver_in1k 224 83.782 96.61 86.59
vit_base_patch16_rope_224.naver_in1k 224 83.718 96.672 86.43
vit_small_patch16_rope_224.naver_in1k 224 81.23 95.022 21.98
vit_small_patch16_rope_mixed_224.naver_in1k 224 81.216 95.022 21.99
vit_small_patch16_rope_ape_224.naver_in1k 224 81.004 95.016 22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k 224 80.986 94.976 22.06
  • Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
  • Preparing version 1.0.17 release

What's Changed

  • Adding Naver rope-vit compatibility to EVA ViT by @rwightman in #2529
  • Update no_grad usage to inference_mode if possible by @GuillaumeErhard in #2534
  • Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in #2537
  • Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in #2538
  • Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in #2536
  • fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in #2533
  • Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in #2539
  • Fix H, W ordering for xy indexing in ROPE by @rwightman in #2541
  • Fix 3 typos in README.md by @robin-ede in #2544

New Contributors

Full Changelog: v1.0.16...v1.0.17