Internalise the NomicBERT model by ed22699 · Pull Request #43067 · huggingface/transformers

ed22699 · 2025-12-29T18:31:22Z

What does this PR do?

This PR internalises the NomicBERT model, following the basic structure of the https://huggingface.co/nomic-ai/nomic-bert-2048

Fixes #42738

Problem

BERT-like models using RoPE are currently not internalized in our codebase, e.g. https://huggingface.co/nomic-ai/nomic-bert-2048

Solution

This PR creates a basic internalized version of nomic-bert-2048 with required modifications.

Modular file: modular_nomic_bert.py implemented and verified with python utils/modular_model_converter.py modular_nomic_bert.py
Conversion script: convert_nomic_bert_to_hf.py added with usage examples
Integration tests: End-to-end tests with exact output matching (text or logits)
Documentation: Model docs added/updated in docs/source/en/model_doc/
Pattern reuse: Verified against similar models (LLaVA, Idefics2, etc.)
Quality checks: make fixup passes with no errors

Who Can Review?

@ArthurZucker @Cyrilvallez (text models)

Co-authored-by: Felix Arkle <felixarkle@icloud.com>

…elfAttention signature

Implemented descriptions for the main nomic bert documentation and debugged modular_nomic_bert

Co-authored-by: Felix Arkle <felixarkle@icloud.com>

Add einops to setup and add availibility checks for more graceful exit if not available

previous version overrote bert, leading to forward_unimplemented

Remove code which broke the encoder only assumption

Alter logic so smaller hidden dimensions are still computed correctly and not lost

Although NomicBERT is encoder only model BertGeneration also requires it to have decoder capabilities

…tion

github-actions · 2026-04-01T20:15:29Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

github-actions · 2026-04-01T20:15:45Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

vasqu · 2026-04-01T20:18:41Z

run-slow: jina_embeddings_v3, nomic_bert

github-actions · 2026-04-01T20:20:00Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/jina_embeddings_v3", "models/nomic_bert"]
quantizations: []

github-actions · 2026-04-01T20:28:21Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43067&sha=da43bf

github-actions · 2026-04-01T20:31:02Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	ed2325fb	workflow commit (merge commit)
PR	da43bf34	branch commit (from PR)
main	f38d6639	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

tomaarsen

My understanding is that this incorporates only the non-MoE path? The https://huggingface.co/nomic-ai/nomic-bert-2048/blob/main/modeling_hf_nomic_bert.py modeling code is used for various models, including:

https://huggingface.co/nomic-ai/nomic-embed-text-v1
https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe (but it uses MoE parameters, 8 experts, etc.)

These vision models:

And these research checkpoints:

I assume that this work is only aiming for the text portion. That does mean that we're diverging from the original implementation a bit, which also supports vision and MoE. Not strictly an issue, just something to note.
If we move forward, let's try to support not just the v1.5, but also the v1, it's also getting used a lot.

tomaarsen · 2026-04-02T06:25:05Z

+
+## Overview
+
+The NomicBERT model currently has no academic papers specifically written about it, however, the [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) card clearly describes the model’s architecture and training approach: it extends BERT to a 2048 token context length, and modifies the BERT training procedure. Notable changes include: 


Yes, it does: https://arxiv.org/abs/2402.01613

cc @zanussbaum

Updated the docs 🫡

github-actions · 2026-04-02T12:16:34Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

github-actions · 2026-04-02T12:34:16Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

github-actions · 2026-04-02T12:56:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

vasqu · 2026-04-02T12:58:12Z

run-slow: jina_embeddings_v3, nomic_bert

github-actions · 2026-04-02T12:59:25Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/jina_embeddings_v3", "models/nomic_bert"]
quantizations: []

github-actions · 2026-04-02T13:04:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

github-actions · 2026-04-02T13:13:45Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	59f0c24d	workflow commit (merge commit)
PR	0b61950f	branch commit (from PR)
main	abc417a4	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

vasqu · 2026-04-02T13:25:36Z

run-slow: jina_embeddings_v3, nomic_bert

github-actions · 2026-04-02T13:27:00Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/jina_embeddings_v3", "models/nomic_bert"]
quantizations: []

github-actions · 2026-04-02T13:54:10Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	02e17259	workflow commit (merge commit)
PR	c27a3aa7	branch commit (from PR)
main	abc417a4	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-04-02T13:58:24Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

tomaarsen

Small nits, the general gist is solid I think.

github-actions · 2026-04-02T14:06:17Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jina_embeddings_v3, nomic_bert

vasqu · 2026-04-02T14:20:49Z

hub has problems and the other test is unrelated, merging

github-actions · 2026-04-02T14:21:27Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43067&sha=67fbce

vasqu · 2026-04-02T14:23:32Z

Thanks a lot to everyone involved @ed22699 @tomaarsen 🤗

ed22699 and others added 12 commits December 27, 2025 18:06

Created nomicBert skeleton

439ebf9

Implement modular_nomic_bert

c441f15

Co-authored-by: Felix Arkle <felixarkle@icloud.com>

Implement convert_nomic_bert_to_hf

cdb1416

Complete nomic_bert conversion script key mappings and fix NomicBertS…

8f4e6dd

…elfAttention signature

Create nomic bert documentation

b35d7bf

Implemented descriptions for the main nomic bert documentation and debugged modular_nomic_bert

Remove redundancies and improve documentation for nomic_bert

6844aa9

Co-authored-by: Felix Arkle <felixarkle@icloud.com>

Update dependencies for nomic_bert

a6677c7

Add einops to setup and add availibility checks for more graceful exit if not available

Fix nomic_bert attention mechanism

0e90c68

previous version overrote bert, leading to forward_unimplemented

Correct past_key_value to past_key_values in NomicBertSelfAttention

34fd7d2

Implement cache_position into NomicBertSelfAttention

cb12291

Implement transpose_for_scores in NomicBertSelfAttention

e83e1c8

Fix nomicBertSelfAttention

f9763e1

Remove code which broke the encoder only assumption

ed22699 force-pushed the bert-rope-model branch from 16fa0de to f9763e1 Compare December 30, 2025 10:21

ed22699 added 16 commits December 30, 2025 10:33

Add kwargs to NomicBertSelfAttention and ignore non-encoder logic

34a6ef3

Add past_key_values to NomicBertSelfAttention output

e85d4fa

Alter head dimension logic for NomicBertSelfAttention

6db2c01

Alter logic so smaller hidden dimensions are still computed correctly and not lost

Attempt to reflect hidden dim in NomicBertSelfAttention output shape

849561d

Attempt to format output head shape for NomicBertSelfAttention

19298f1

Add is_decoder check to NomicBertSelfAttention

6f55f1e

Although NomicBERT is encoder only model BertGeneration also requires it to have decoder capabilities

Update NomicBertSelfAttention to handle dynamic cache

6d119a5

Update layer_idx to be a valid integer in NomicBertSelfAttention

62e1115

Alter output value size for NomicBertSelfAttention

49b00be

Improve seq_len_offset and past key robustness for NomicBertSelfAtten…

484cd9f

…tion

Implement left-padded batch inference for Nomic_Bert

4c04d4f

Explicitly add helper functions to NomicBert

7c1f982

Fix dynamic cache issues within modular_nomic_bert

7872c67

Debug key errors of modular_nomic_bert

bd7d8d6

Attempt to prevent typeError noneType in modular_nomic_bert

827459d

Fix cache use in modular_nomic_bert

9330347

ed22699 force-pushed the bert-rope-model branch from ddc59d6 to 9330347 Compare December 31, 2025 00:26

tomaarsen reviewed Apr 2, 2026

View reviewed changes

v1 tests, same code - slightly different config (on the hub)

5843c37

fix wrong defaults

baf9b38

numbers didnt change on a10

0b61950

fix warning

c27a3aa

update docs

a9d415a

tomaarsen reviewed Apr 2, 2026

View reviewed changes

Comment thread docs/source/en/model_doc/nomic_bert.md Outdated

Comment thread docs/source/en/model_doc/nomic_bert.md Outdated

Comment thread docs/source/en/model_doc/nomic_bert.md Outdated

update docs per toms review

67fbcea

noooop mentioned this pull request Apr 30, 2026

Fix error in Dynamic NTK scaling vllm-project/vllm#41277

Merged


		## Overview

		The NomicBERT model currently has no academic papers specifically written about it, however, the [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) card clearly describes the model’s architecture and training approach: it extends BERT to a 2048 token context length, and modifies the BERT training procedure. Notable changes include:

Conversation

ed22699 commented Dec 29, 2025

What does this PR do?

Problem

Solution

Who Can Review?

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

vasqu commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

CI Results

Commit Info

Uh oh!

tomaarsen left a comment

Choose a reason for hiding this comment

Uh oh!

tomaarsen Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

vasqu commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

CI Results

Commit Info

Uh oh!

vasqu commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

CI Results

Commit Info

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

tomaarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

vasqu commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

vasqu commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants