Backbone: Attention Backend, batch_invariant library integration
Communication (NCCL)
Radix Cache Support
Model Support
Quantization
Parallelism
Spec Decoding
Perf
Issues
Usability & Documentation
Related resources
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
https://github.com/thinking-machines-lab/batch_invariant_ops
Backbone: Attention Backend, batch_invariant library integration
Communication (NCCL)
Radix Cache Support
Model Support
Quantization
Parallelism
Spec Decoding
Perf
Issues
Usability & Documentation
Related resources
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
https://github.com/thinking-machines-lab/batch_invariant_ops