Breakdown of speedups

Hey there,

Thanks for releasing this! 

Going through the list of kernels:

1. CrossEntropyLoss
2. RMS NORM
3. RopeEmbedding
4. Swiglu
5. FastLoRA 

I'm trying to understand how the various optimizations correlate to performance improvements, is there a chart that shows the gains from #5 alone? 

Secondly, could you please explain what's being done/included in the both the PRO/MAX tiers. The wording from the blog post is very imprecise. 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Breakdown of speedups #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Breakdown of speedups #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions