Hey there,
Thanks for releasing this!
Going through the list of kernels:
- CrossEntropyLoss
- RMS NORM
- RopeEmbedding
- Swiglu
- FastLoRA
I'm trying to understand how the various optimizations correlate to performance improvements, is there a chart that shows the gains from #5 alone?
Secondly, could you please explain what's being done/included in the both the PRO/MAX tiers. The wording from the blog post is very imprecise.
Thanks!
Hey there,
Thanks for releasing this!
Going through the list of kernels:
I'm trying to understand how the various optimizations correlate to performance improvements, is there a chart that shows the gains from #5 alone?
Secondly, could you please explain what's being done/included in the both the PRO/MAX tiers. The wording from the blog post is very imprecise.
Thanks!