Skip to content

[MODEL] Add Falcon H1#38249

Merged
ArthurZucker merged 27 commits intohuggingface:mainfrom
younesbelkada:add-falcon-h1
May 21, 2025
Merged

[MODEL] Add Falcon H1#38249
ArthurZucker merged 27 commits intohuggingface:mainfrom
younesbelkada:add-falcon-h1

Conversation

@younesbelkada
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +483 to +485
class FalconH1ModelIntegrationTest(unittest.TestCase):
# TODO: add integration tests for all model sizes
pass No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing before we can merge

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done @ArthurZucker !

@ArthurZucker ArthurZucker marked this pull request as ready for review May 21, 2025 06:09
@DarkLight1337
Copy link
Copy Markdown

QQ: Will there be a transformers release for this model soon? Trying to figure out when to update the transformers version for vLLM.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker merged commit 6829936 into huggingface:main May 21, 2025
16 of 18 checks passed
@ArthurZucker
Copy link
Copy Markdown
Collaborator

Thanks all for the contribution!!! 🤗

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@DarkLight1337 we just did one yesterday 😢

@ArthurZucker
Copy link
Copy Markdown
Collaborator

We are gonna do a model based release!

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

@younesbelkada test_llama_3_1_hard 😆

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

I am getting the following output, is it normal?

(Pdb) print(generated_text)
user
Tell me about the french revolution.
     of U PT  SI      di  the  the Sl    D  �  " ACK \   S  S   S     MOD OF A D\  Ont reli

 \   \   S$
 \) S   es\ $ $\  S\ S\) �  (?   S shortens \ {        $ " $$$ $$                            }{ $       $$$ $] $   $]]$\ � $\   ES    � ] ]  Sus  Space  of Corpor $  Of  Un    User   User Item Un
    Box Med   Med      Med Right or  )                      splits     halfway                                    Box  ]  �   �    ∩                stack               Y ma Bu    for  )  1 User Item—     U  ] S      Food      for     $ �           for       S     O

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

Looks like a decoding issue — locally the model generates valid completions (e.g., detailed summary of the French Revolution). Could be related to model loading precision (should bf16 ) or tokenizer mismatch.

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

Thank you for pointing out the typo, @ydshieh.
It has now been addressed and fixed in this PR

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

@dhiaEddineRhaiem Do you want the test or you check it with a different code snippet? I am getting the the same strange outputs both with T4 and A10.

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

dhiaEddineRhaiem commented May 21, 2025

hello again @ydshieh ,
thanks for your remark!

one other potential cause is that Falcon H1 is particularly sensitive to temperature changes above 0.3 or 0.4, likely because it already produces well-calibrated and sharply peaked logits by default, Basically:

🔹 Its raw logits are already well-separated, so lowering temperature (e.g. to 0.1) keeps that separation strongstable behavior.

🔹 Increasing T > 0.3 or 0.4 flattens that, letting weaker tokens sneak in → instability.

Ampirically, i would advise to set T=0.1 !

To experiment chatting with FalconH1 series of models , please use this playground

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

@dhiaEddineRhaiem

I suggest to run the test

RUN_SLOW=1 python3 -m pytest -v tests/models/falcon_h1/test_modeling_falcon_h1.py::FalconH1ModelIntegrationTest::test_falcon_h1_hard

or even the same code but in a script (see below).

I don't change anything from what is written in this PR. It's best if you or @younesbelkada take a look what is happening and update the test, so falcon_h1 could be maintained in transformers side.

If the test or the following script still pass or give normal outputs, we can discuss what might be the cause.

import torch
from transformers import AutoTokenizer, FalconH1ForCausalLM

model_id = "tiiuae/Falcon-H1-1.5B-Deep-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = FalconH1ForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
device = "cuda"
messages = [{"role": "user", "content": "Tell me about the french revolution."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

i have just right now tested locally with the exact same script
and got this output :

The fast path for FalconH1 will be used when running the model on a GPU
user
Tell me about the french revolution.
assistant
The French Revolution (1789–1799) was a period of radical social and political upheaval in France that fundamentally transformed the nation and had profound effects on the rest of Europe and the world. Here are the key aspects of the revolution:

### **Causes**
1. **Economic Crisis**: France was in severe financial trouble due to costly wars (particularly the American Revolution), extravagant spending by the monarchy, and inefficient taxation.
2. **Social Inequality**: The rigid class system (the Ancien Régime) divided society into the privileged nobility and clergy (First Estate) and the common people (Third Estate), who bore the brunt of taxation and had few rights.
3. **Enlightenment Ideas**: Philosophers like Rousseau, Voltaire, and Montesquieu inspired ideas of liberty, equality, and popular sovereignty.
4. **Settlement of 1789**: The Estates-General convened to address the financial crisis, leading to the Third Estate's assertion of its rights and the eventual formation of the National Assembly.

### **Key Events**
1. **Opening of the Revolution (1789)**:
  - **Storming of the Bastille**: Symbolic of the fall of royal tyranny.
  - **Declaration of the Rights of Man and of the Citizen**: Proclaimed universal rights to liberty, property, and security.
  - **Creation of the National Assembly**: The Third Estate declared itself the representative body of France.

2. **Radical Phase (1792–1794)**:
  - **Reign of Terror**: Led by Maximilien Robespierre, the Committee of Public Safety enforced radical egalitarianism through the guillotine, executing thousands of perceived enemies of the revolution (monarchists, clergy, aristocrats, and counter-revolutionaries).
  - **Execution of Louis XVI**: The king was guillotined in June 1793, symbolizing the end of the monarchy.

3. **Thermidorian Reaction (July

happy to have deeper discussion about it @ydshieh

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

oh my god .... that is going be tough this issue. Thank you for checking.

Could you first share your machine type? T4/A10/A100/H100 etc?

And copy paste the output of transformers-cli env 🙏 ?

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

  1. Machine is H100
  2. transformers-cli env gives :
- `transformers` version: 4.53.0.dev0
- Platform: Linux-6.6.72+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.31.4
- Safetensors version: 0.4.5
- Accelerate version: 1.1.1
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.4.0a0+3bcc3cddb5.nv24.07 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA H100 80GB HBM3

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

I am seeing

The fast path is not available because on of (selective_state_update, causal_conv1d_fn, causal_conv1d_update) is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d

maybe it's the cause. I will check further

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

i think also it is the cause , we saw similar behaviour locally when fast path is not used.
please try to install mamba-ssm and causal-conv1d and retry.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

Sure, but a question: do we want to maintain the slow path (even if not identical results, at least outputs that look normal ..?)

I change the prompts, and still get non-sense outputs 😰

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

dhiaEddineRhaiem commented May 21, 2025

i would say we might want to retain the slow path, as the same logic applies across hybrid and pure Mamba2 models like Zamba, Bamba and others.

Mamba2 is highly sensitive to numerical precision: key components like A_log, dt, and internal ssm_states operate in fp32. Without the Triton fast path, the fallback accumulates precision errors across tokens — especially in long contexts — which likely may lead the degraded outputs.

we will further debug internally to see if other causes may be found and overcomed.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented May 21, 2025

Got it. I confirmed that with fast path, the outputs look normal.

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented May 22, 2025

@dhiaEddineRhaiem I think the slow path is missing a recent fix, see #37533

Tl;dr: the repeat pattern on the mamba heads has been wrong --> we need a repeat interleave, not a simple repeat

Will try to make the mamba model paths inheritable in the future. The copy pasting atm is very error prone atm 😢

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

Hey @vasqu,

Many Thanks for pointing this out:

  1. i confirm torch.repeat gives undesirable repeat pattern in the mamba heads
  2. Another major issue is that mup multipliers that we released with the weights are not included in the slow path fwd pass which explains the wierd generations.

I will soon raise a PR to fix that.
Thanks again.

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented May 22, 2025

Nice, glad to hear that :D

@ArthurZucker
Copy link
Copy Markdown
Collaborator

thanks all for diving! 🚀

@dhiaEddineRhaiem
Copy link
Copy Markdown
Contributor

dhiaEddineRhaiem commented May 23, 2025

hello @vasqu , @ydshieh ,
as discussed here is the PR fixing the two problems found in the slow path of FalconH1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants