Skip to content

Breakdown of speedups #9

@vgoklani

Description

@vgoklani

Hey there,

Thanks for releasing this!

Going through the list of kernels:

  1. CrossEntropyLoss
  2. RMS NORM
  3. RopeEmbedding
  4. Swiglu
  5. FastLoRA

I'm trying to understand how the various optimizations correlate to performance improvements, is there a chart that shows the gains from #5 alone?

Secondly, could you please explain what's being done/included in the both the PRO/MAX tiers. The wording from the blog post is very imprecise.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions