Pinned
this worklog explores optimizing a simple snake-1d activation kernel (used in many neural audio codecs and text-to-speech systems) in triton on an nvidia h100 80gb gpu.
it explores tricks such as 7th degree polynomial approximations for the sine function in order to squeeze out


