Add NeoBERT by karinazad · Pull Request #187 · prescient-design/lobster

karinazad · 2025-08-27T19:23:43Z

Description

Adds NeoBERT from https://huggingface.co/chandar-lab/NeoBERT/tree/main with few changes

removes xformers dependency - it is only needed for SwiGLU but it seems like the torch version is not that much slower (see issue performance of swiglu operator facebookresearch/xformers#734) and using xformers would prevent inference on CPU
uses custom masking function to avoid using transformers's collator which expects a tokenizer and adds a lot of unnecessary code
packing is disabled for now since it's not clear to me how they handled removing padding tokens. I opened an issue but they don't seem to be checking them often How is unpadding handled when unpacking? chandar-lab/NeoBERT#7. for now, let's get a model running without packing so we can have a baseline to compare against even if the model is slower to train

Type of Change

taylormjs

Looks great!

taylormjs · 2025-09-03T04:46:42Z

src/lobster/model/neobert/_config.py

+        num_hidden_layers: int = 28,
+        num_attention_heads: int = 12,
+        intermediate_size: int = 3072,
+        embedding_init_range: float = 0.02,


Yeah, UME-medium is probably the best model to compare with. Good defaults

taylormjs · 2025-09-03T04:50:37Z

src/lobster/model/neobert/_masking.py

+            "labels": labels,
+            "attention_mask": attention_mask.to(torch.bool),
+        }
+    else:


Q - are the ModernBERT base checkpoints trained without packing? Just noting this so we have the fairest comparison

taylormjs · 2025-09-03T04:52:37Z

src/lobster/model/neobert/_model.py

+    def _init_weights(self, module):
+        if isinstance(module, nn.Linear):
+            module.weight.data.uniform_(-self.config.decoder_init_range, self.config.decoder_init_range)
+        elif isinstance(module, nn.Embedding):


Oh interesting that uniform initialization is used

taylormjs · 2025-09-03T04:56:22Z

src/lobster/model/neobert/neobert_lightning_module.py

+class NeoBERTLightningModule(LightningModule):
+    def __init__(


Nice, good call having a separate LightningModule and nn.Module

karinazad added 6 commits August 27, 2025 19:23

neobert

60c28e1

neobert

07cf4ae

NeoBERT implementation

93f6024

masking tests

70ae701

neobert tests

c2cdc67

remove data collator

f4bf33b

karinazad changed the title ~~[Draft] Add NeoBERT~~ Add NeoBERT Aug 28, 2025

karinazad added 15 commits August 28, 2025 20:29

ckpt_path

a80ca15

revert streaming dataset back

218d515

mask token id

cf6614b

mask token id

621dc48

mask token id

e80385e

special inputs

ae480a6

MLM inside lightning module not pytorch module

73775b1

tests

1621431

remove cuda from generator

b5dbdbb

remove generator

9d9922a

add embeddings functions

34adaef

tests for embed

7d811b3

transformers version, ensure 2d

74eb7b6

to device

efde796

model sizes

97128dc

taylormjs approved these changes Sep 3, 2025

View reviewed changes

taylormjs merged commit ff98e52 into main Sep 3, 2025
4 checks passed

taylormjs deleted the k/neobert branch September 3, 2025 05:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NeoBERT#187

Add NeoBERT#187
taylormjs merged 21 commits intomainfrom
k/neobert

karinazad commented Aug 27, 2025 •

edited

Loading

Uh oh!

taylormjs left a comment

Uh oh!

taylormjs Sep 3, 2025

Uh oh!

taylormjs Sep 3, 2025

Uh oh!

taylormjs Sep 3, 2025

Uh oh!

taylormjs Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karinazad commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Uh oh!

taylormjs left a comment

Choose a reason for hiding this comment

Uh oh!

taylormjs Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

taylormjs Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

taylormjs Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

taylormjs Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karinazad commented Aug 27, 2025 •

edited

Loading