make flash-attn optional for Ume, enable CPU-only inference and testing by ncfrey · Pull Request #90 · prescient-design/lobster

ncfrey · 2025-05-23T16:36:05Z

Add use_flash_attn parameter to Ume and FlexBERT to control flash-attn usage.
Fallback to padded attention if flash-attn is unavailable, enabling CPU-only operation.
Fix input shape handling in Ume for both padded and unpadded attention modes.
Add a test for Ume inference on CPU without flash-attn.
All existing tests pass, ensuring backward compatibility.

… testing\n\n- Add use_flash_attn param to Ume and FlexBERT\n- Fallback to padded attention if flash-attn is unavailable\n- Fix input shape handling for padded/unpadded attention\n- Add test for CPU inference without flash-attn\n- All existing tests pass

Copilot

Pull Request Overview

This PR makes flash-attn optional in Ume and FlexBERT, adds a CPU-only fallback for attention, fixes input shape handling in Ume’s embed method, and adds a test for Ume inference on CPU.

Introduce use_flash_attn parameter to Ume and propagate to FlexBERT via use_fa2
Fallback to padded attention when flash-attn is unavailable and adjust embed input shape handling
Add test_embed_sequences_cpu to verify CPU-only embed_sequences functionality

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
tests/lobster/model/test__ume.py	Add `test_embed_sequences_cpu` to cover embed_sequences without flash-attn
src/lobster/model/modern_bert/_modern_bert.py	Warn and disable flash-attn fallback when unavailable; remove strict assert
src/lobster/model/_ume.py	Add `use_flash_attn` param, set padding mode, and update `embed` shape handling

Comments suppressed due to low confidence (2)

src/lobster/model/_ume.py:357

The unpadded attention code path in embed is not covered by existing tests. Consider adding a test for use_flash_attn=True (when flash-attn is available) to exercise this branch and verify output shapes.

input_ids, attention_mask, cu_seqlens = self.model._prepare_inputs(x["input_ids"], x["attention_mask"])

src/lobster/model/modern_bert/_modern_bert.py:121

[nitpick] The warning refers to the internal use_fa2 flag; consider referencing the user-facing use_flash_attn parameter in the message for clearer guidance.

warnings.warn("flash_attn not available but use_fa2=True. Setting use_fa2=False. "

karinazad · 2025-05-23T19:22:23Z

changes in the _ume.py look good to me though I don't see any changed files for modern_bert?

karinazad · 2025-05-24T23:47:59Z

realized this when testing the code out: we might want to enable passing use_flash_attn=False to load_from_checkpoint.

I think the most common scenario is that we train a model with FA and then run inference without it. For that, we might need to override lightning's load_from_checkpoint to pass it there?

karinazad · 2025-05-24T23:52:08Z

realized this when testing the code out: we might want to enable passing use_flash_attn=False to load_from_checkpoint.

I think the most common scenario is that we train a model with FA and then run inference without it. For that, we might need to override lightning's load_from_checkpoint to pass it there?

edit: you actually already handle this when cpu is not installed

/Users/zadorozk/Desktop/code/lobster/src/lobster/model/modern_bert/_modern_bert.py:122: UserWarning: flash_attn not available but use_fa2=True. Setting use_fa2=False. This will use standard attention instead of flash-attn.

(though it still might be nice to optionally disable it when loading checkpoints even when it's installed in the environment, I opened a similar issue last week for ESM)

But that being said, it looks like I still get the same error when trying to embed sequences without FA?

>>> model.embed_sequences(["AA"], "amino_acid")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zadorozk/Desktop/code/lobster/src/lobster/model/_ume.py", line 428, in embed_sequences
    return self.embed(encoded, aggregate=aggregate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zadorozk/Desktop/code/lobster/src/lobster/model/_ume.py", line 364, in embed
    embeddings = self.model.model(
                 ^^^^^^^^^^^^^^^^^
...
  File "/Users/zadorozk/Desktop/code/lobster/src/lobster/model/modern_bert/_padding.py", line 140, in pad_input
    output = index_put_first_axis(hidden_states, indices, batch * seqlen)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zadorozk/Desktop/code/lobster/.venv/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zadorozk/Desktop/code/lobster/src/lobster/model/modern_bert/_padding.py", line 63, in forward
    assert indices.ndim == 1
           ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'ndim'

ncfrey requested a review from Copilot May 23, 2025 16:36

Copilot AI reviewed May 23, 2025

View reviewed changes

ruff

f584fd0

ncfrey temporarily deployed to test.pypi.org May 23, 2025 16:48 — with GitHub Actions Inactive

update ruff version

fd26e78

ncfrey requested a review from karinazad May 23, 2025 16:53

ncfrey temporarily deployed to test.pypi.org May 23, 2025 16:53 — with GitHub Actions Inactive

ruff

77e8a43

ncfrey temporarily deployed to test.pypi.org May 23, 2025 17:01 — with GitHub Actions Inactive

chore: update ruff configuration and pre-commit hooks

95a7a21

ncfrey temporarily deployed to test.pypi.org May 23, 2025 17:12 — with GitHub Actions Inactive

gpu test

8b9ef20

ncfrey temporarily deployed to test.pypi.org May 23, 2025 18:04 — with GitHub Actions Inactive

ncfrey marked this pull request as ready for review May 23, 2025 21:21

ncfrey mentioned this pull request May 23, 2025

Enable CPU inference for UME by making flash attention optional #88

Closed

karinazad approved these changes May 27, 2025

View reviewed changes

ncfrey merged commit 0a122ca into main May 27, 2025
5 checks passed

ncfrey deleted the feat/optional-flash-attn-cpu-support branch May 27, 2025 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make flash-attn optional for Ume, enable CPU-only inference and testing#90

make flash-attn optional for Ume, enable CPU-only inference and testing#90
ncfrey merged 6 commits intomainfrom
feat/optional-flash-attn-cpu-support

ncfrey commented May 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

karinazad commented May 23, 2025

Uh oh!

karinazad commented May 24, 2025

Uh oh!

karinazad commented May 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ncfrey commented May 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

karinazad commented May 23, 2025

Uh oh!

karinazad commented May 24, 2025

Uh oh!

karinazad commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karinazad commented May 24, 2025 •

edited

Loading