[MODEL] Add Falcon H1 by younesbelkada · Pull Request #38249 · huggingface/transformers

younesbelkada · 2025-05-21T05:55:41Z

No description provided.

ArthurZucker

LGTM

src/transformers/models/falcon_h1/configuration_falcon_h1.py

ArthurZucker · 2025-05-21T05:58:05Z

tests/models/falcon_h1/test_modeling_falcon_h1.py

+class FalconH1ModelIntegrationTest(unittest.TestCase):
+    # TODO: add integration tests for all model sizes
+    pass


missing before we can merge

done @ArthurZucker !

DarkLight1337 · 2025-05-21T06:20:18Z

QQ: Will there be a transformers release for this model soon? Trying to figure out when to update the transformers version for vLLM.

HuggingFaceDocBuilderDev · 2025-05-21T06:22:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-05-21T08:43:31Z

Thanks all for the contribution!!! 🤗

ArthurZucker · 2025-05-21T08:43:48Z

@DarkLight1337 we just did one yesterday 😢

ArthurZucker · 2025-05-21T08:43:59Z

We are gonna do a model based release!

ydshieh · 2025-05-21T09:05:40Z

@younesbelkada test_llama_3_1_hard 😆

ydshieh · 2025-05-21T09:07:45Z

I am getting the following output, is it normal?

(Pdb) print(generated_text)
user
Tell me about the french revolution.
     of U PT  SI      di  the  the Sl    D  �  " ACK \   S  S   S     MOD OF A D\  Ont reli

 \   \   S$
 \) S   es\ $ $\  S\ S\) �  (?   S shortens \ {        $ " $$$ $$                            }{ $       $$$ $] $   $]]$\ � $\   ES    � ] ]  Sus  Space  of Corpor $  Of  Un    User   User Item Un
    Box Med   Med      Med Right or  )                      splits     halfway                                    Box  ]  �   �    ∩                stack               Y ma Bu    for  )  1 User Item—     U  ] S      Food      for     $ �           for       S     O

dhiaEddineRhaiem · 2025-05-21T09:36:28Z

Looks like a decoding issue — locally the model generates valid completions (e.g., detailed summary of the French Revolution). Could be related to model loading precision (should bf16 ) or tokenizer mismatch.

dhiaEddineRhaiem · 2025-05-21T09:48:07Z

Thank you for pointing out the typo, @ydshieh.
It has now been addressed and fixed in this PR

ydshieh · 2025-05-21T10:24:25Z

@dhiaEddineRhaiem Do you want the test or you check it with a different code snippet? I am getting the the same strange outputs both with T4 and A10.

dhiaEddineRhaiem · 2025-05-21T13:57:17Z

hello again @ydshieh ,
thanks for your remark!

one other potential cause is that Falcon H1 is particularly sensitive to temperature changes above 0.3 or 0.4, likely because it already produces well-calibrated and sharply peaked logits by default, Basically:

🔹 Its raw logits are already well-separated, so lowering temperature (e.g. to 0.1) keeps that separation strong → stable behavior.

🔹 Increasing T > 0.3 or 0.4 flattens that, letting weaker tokens sneak in → instability.

Ampirically, i would advise to set T=0.1 !

To experiment chatting with FalconH1 series of models , please use this playground

ydshieh · 2025-05-21T14:08:04Z

@dhiaEddineRhaiem

I suggest to run the test

RUN_SLOW=1 python3 -m pytest -v tests/models/falcon_h1/test_modeling_falcon_h1.py::FalconH1ModelIntegrationTest::test_falcon_h1_hard

or even the same code but in a script (see below).

I don't change anything from what is written in this PR. It's best if you or @younesbelkada take a look what is happening and update the test, so falcon_h1 could be maintained in transformers side.

If the test or the following script still pass or give normal outputs, we can discuss what might be the cause.

import torch
from transformers import AutoTokenizer, FalconH1ForCausalLM

model_id = "tiiuae/Falcon-H1-1.5B-Deep-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = FalconH1ForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
device = "cuda"
messages = [{"role": "user", "content": "Tell me about the french revolution."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

dhiaEddineRhaiem · 2025-05-21T14:17:58Z

i have just right now tested locally with the exact same script
and got this output :

The fast path for FalconH1 will be used when running the model on a GPU
user
Tell me about the french revolution.
assistant
The French Revolution (1789–1799) was a period of radical social and political upheaval in France that fundamentally transformed the nation and had profound effects on the rest of Europe and the world. Here are the key aspects of the revolution:

### **Causes**
1. **Economic Crisis**: France was in severe financial trouble due to costly wars (particularly the American Revolution), extravagant spending by the monarchy, and inefficient taxation.
2. **Social Inequality**: The rigid class system (the Ancien Régime) divided society into the privileged nobility and clergy (First Estate) and the common people (Third Estate), who bore the brunt of taxation and had few rights.
3. **Enlightenment Ideas**: Philosophers like Rousseau, Voltaire, and Montesquieu inspired ideas of liberty, equality, and popular sovereignty.
4. **Settlement of 1789**: The Estates-General convened to address the financial crisis, leading to the Third Estate's assertion of its rights and the eventual formation of the National Assembly.

### **Key Events**
1. **Opening of the Revolution (1789)**:
  - **Storming of the Bastille**: Symbolic of the fall of royal tyranny.
  - **Declaration of the Rights of Man and of the Citizen**: Proclaimed universal rights to liberty, property, and security.
  - **Creation of the National Assembly**: The Third Estate declared itself the representative body of France.

2. **Radical Phase (1792–1794)**:
  - **Reign of Terror**: Led by Maximilien Robespierre, the Committee of Public Safety enforced radical egalitarianism through the guillotine, executing thousands of perceived enemies of the revolution (monarchists, clergy, aristocrats, and counter-revolutionaries).
  - **Execution of Louis XVI**: The king was guillotined in June 1793, symbolizing the end of the monarchy.

3. **Thermidorian Reaction (July

happy to have deeper discussion about it @ydshieh

ydshieh · 2025-05-21T14:27:40Z

oh my god .... that is going be tough this issue. Thank you for checking.

Could you first share your machine type? T4/A10/A100/H100 etc?

And copy paste the output of transformers-cli env 🙏 ?

dhiaEddineRhaiem · 2025-05-21T14:42:10Z

Machine is H100
transformers-cli env gives :

- `transformers` version: 4.53.0.dev0
- Platform: Linux-6.6.72+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.31.4
- Safetensors version: 0.4.5
- Accelerate version: 1.1.1
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.4.0a0+3bcc3cddb5.nv24.07 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA H100 80GB HBM3

ydshieh · 2025-05-21T15:25:04Z

I am seeing

The fast path is not available because on of (selective_state_update, causal_conv1d_fn, causal_conv1d_update) is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d

maybe it's the cause. I will check further

dhiaEddineRhaiem · 2025-05-21T15:47:58Z

i think also it is the cause , we saw similar behaviour locally when fast path is not used.
please try to install mamba-ssm and causal-conv1d and retry.

ydshieh · 2025-05-21T15:54:09Z

Sure, but a question: do we want to maintain the slow path (even if not identical results, at least outputs that look normal ..?)

I change the prompts, and still get non-sense outputs 😰

dhiaEddineRhaiem · 2025-05-21T16:09:28Z

i would say we might want to retain the slow path, as the same logic applies across hybrid and pure Mamba2 models like Zamba, Bamba and others.

Mamba2 is highly sensitive to numerical precision: key components like A_log, dt, and internal ssm_states operate in fp32. Without the Triton fast path, the fallback accumulates precision errors across tokens — especially in long contexts — which likely may lead the degraded outputs.

we will further debug internally to see if other causes may be found and overcomed.

ydshieh · 2025-05-21T16:10:42Z

Got it. I confirmed that with fast path, the outputs look normal.

vasqu · 2025-05-22T19:53:20Z

@dhiaEddineRhaiem I think the slow path is missing a recent fix, see #37533

Tl;dr: the repeat pattern on the mamba heads has been wrong --> we need a repeat interleave, not a simple repeat

Will try to make the mamba model paths inheritable in the future. The copy pasting atm is very error prone atm 😢

dhiaEddineRhaiem · 2025-05-22T20:00:01Z

Hey @vasqu,

Many Thanks for pointing this out:

i confirm torch.repeat gives undesirable repeat pattern in the mamba heads
Another major issue is that mup multipliers that we released with the weights are not included in the slow path fwd pass which explains the wierd generations.

I will soon raise a PR to fix that.
Thanks again.

vasqu · 2025-05-22T20:01:22Z

Nice, glad to hear that :D

ArthurZucker · 2025-05-23T07:21:03Z

thanks all for diving! 🚀

dhiaEddineRhaiem · 2025-05-23T11:58:01Z

hello @vasqu , @ydshieh ,
as discussed here is the PR fixing the two problems found in the slow path of FalconH1.

younesbelkada and others added 5 commits February 23, 2024 07:33

Create push-important-models.yml

91c72e7

merge

580c7f8

Merge remote-tracking branch 'upstream/main'

7e7810a

feat: add falcon-h1

ed2f8f3

fixup

303a7f8

ArthurZucker reviewed May 21, 2025

View reviewed changes

address comment

6f292cf

dhiaEddineRhaiem mentioned this pull request May 21, 2025

[MODEL] FalconH1 vllm-project/vllm#18406

Merged

fix

6688c9e

ArthurZucker marked this pull request as ready for review May 21, 2025 06:09

fix copies

e044445

fix copies

b167ede

younesbelkada and others added 16 commits May 21, 2025 10:31

fix

df485a3

fix

332b143

fix

f3c21a8

fix

387e4af

fix copies

efe9108

fix

1c6a4c5

fix copies

250ca80

fix test import to at least trigget the cis

a62e45b

yups

2178c00

update

7c2c331

fix make fix copies

c1162ae

fix inits?

817f146

fix style

f1257e3

skip annoying test

184491d

add integration test for Falcon H1

e2493d8

fix copies

a3dbbe4

fix

a4d5141

younesbelkada requested a review from ArthurZucker May 21, 2025 08:23

ArthurZucker merged commit 6829936 into huggingface:main May 21, 2025
16 of 18 checks passed

Swipe4057 mentioned this pull request May 22, 2025

[Feature] Model FalconH1 sgl-project/sglang#6517

Closed

2 tasks

dhiaEddineRhaiem mentioned this pull request Jun 18, 2025

add support for Falcon H1 LLama Factory hiyouga/LlamaFactory#8403

Merged

younesbelkada mentioned this pull request Jun 19, 2025

feat: add falcon-h1 into axolotl axolotl-ai-cloud/axolotl#2811

Merged

dhiaEddineRhaiem mentioned this pull request Jun 19, 2025

[Examples] Add Falcon H1 example skypilot-org/skypilot#6042

Merged

Conversation

younesbelkada commented May 21, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

dhiaEddineRhaiem May 21, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented May 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 21, 2025

Uh oh!

Uh oh!

ArthurZucker commented May 21, 2025

Uh oh!

ArthurZucker commented May 21, 2025

Uh oh!

ArthurZucker commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

dhiaEddineRhaiem commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented May 21, 2025

Uh oh!

vasqu commented May 22, 2025

Uh oh!

dhiaEddineRhaiem commented May 22, 2025

Uh oh!

vasqu commented May 22, 2025

Uh oh!

ArthurZucker commented May 23, 2025

Uh oh!

dhiaEddineRhaiem commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dhiaEddineRhaiem commented May 21, 2025 •

edited

Loading

dhiaEddineRhaiem commented May 21, 2025 •

edited

Loading

dhiaEddineRhaiem commented May 23, 2025 •

edited

Loading