Zyphra
267 posts
Full stack open superintelligence
- Replying to @ZyphraAIZamba2-VL is released under Apache 2.0 at all three scales. Blog: zyphra.com/our-work/zamba… Technical report: arxiv.org/abs/2606.00390 Weights: huggingface.co/collections/Zy… Code: github.com/Zyphra/transfo…
- Replying to @ZyphraAIHybrid SSM-Transformer models combine SSM layers for speed and efficiency, with a few attention layers for precise recall. Zamba2-VL includes 1.2B, 2.7B, 7B models and is the first open family of vision-language models built natively on this hybrid architecture.
- Replying to @ZyphraAIStay tuned as we extend this to larger dense and MoE models, the backward pass for training, and validation within production serving environments. Zyphra will continue pushing performance across new hardware ecosystems. Technical details on the blog:
- Replying to @ZyphraAIThe optimization philosophy we use on other stacks (topology-aware parallelism, custom kernels, communication scheduling) applies to Trainium/Inferentia, demonstrating our heterogeneous silicon capabilities while showing one way Neuron can be improved for the wider ecosystem.
- Replying to @ZyphraAIWe built on existing NKI kernels in AWS's Neuron stack and added a Domino-style schedule that overlaps compute with chip-to-chip communication. Each transformer block keeps the compute engines busy while data moves between accelerators instead of stopping to wait.
- Replying to @ZyphraAITrainium/inferentia runs communication on dedicated cores in parallel with its tensor, vector, and scalar engines. Combined with large HBM capacity and a fast scale-up fabric, it suits workloads limited more by data movement like decode, MoE, and long-context attention.












