Skip to content

Delta net precision#997

Merged
angeloskath merged 2 commits into
mainfrom
delta-net-precision
Mar 15, 2026
Merged

Delta net precision#997
angeloskath merged 2 commits into
mainfrom
delta-net-precision

Conversation

@angeloskath

Copy link
Copy Markdown
Member

Since both batch and vectorized go through the kernel there is almost 0 overhead for switching to an fp32 state.

Qwen/Qwen3.5-9B

Before
Averages: prompt_tps=1567.420, generation_tps=39.407, peak_memory=19.544

After
Averages: prompt_tps=1568.009, generation_tps=39.199, peak_memory=19.571

I haven't noticed any real difference in daily use with or without this update.

This does affect finetuning fairly heavily but I think we need a kernel for that to be an enjoyable experience anyway.

@nastya236

Copy link
Copy Markdown
Collaborator

thank you! looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants