Sai Surya Duvvuri (@dvsaisurya) / X

Sai Surya Duvvuri

263 posts

Sai Surya Duvvuri

@dvsaisurya

Automating AI research @CoreAutoAI | CS PhD student at UT Austin. Prev SR at Google, Meta and RF at MSR | CS IIT KGP saisuryadv.com

Austin, Texas

Joined November 2011

Pinned
Sai Surya Duvvuri
@dvsaisurya
Feb 15
Excited to share LUCID — a new attention mechanism that improves retrieval and reasoning in long-context LLMs! [1/9]🧵 Here's how it works:
127K
Sai Surya Duvvuri
@dvsaisurya
May 3, 2024
Excited to present CASPR - a new optimization algorithm approximating full-matrix Adagrad , theoretically and empirically better than Shampoo. Drop by our poster at ICLR 2024 on May 7th, 4:30pm-6:30pm. Full-paper: openreview.net/pdf?id=8j9hz8D…. Here’s a 🧵:
17K
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Standard attention is softmax(QKᵀ)V. We explore 2-simplicial attention using additional keys K' for richer interactions: softmax(Q(K⊗K')ᵀ). 🧵
Aurko Roy
@aurko79
Jul 4, 2025
Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales
6.4K
Sai Surya Duvvuri
@dvsaisurya
Jul 17, 2025
📢 Thrilled to share our new paper, LASER: Attention with Exponential Transformation, accepted at ICML2025, work done at Google. Come by our poster presentation! 🗓️ Thurs, July 17th, 4:30-7pm 📍 West Exhibition Hall B2-B3, # W-915 Read the full paper here: arxiv.org/abs/2411.03493
6.1K
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Excited to be a part of this effort. TLDR: better scaling law than transformers for logical tasks.
Omead Pooladzandi
@HessianFree
Jul 4, 2025
This will probs be one of the most influential papers of the year
1.1K
Sai Surya Duvvuri
@dvsaisurya
May 20, 2024
This highlights the importance of higher order preconditioning methods for sample efficient pretraining especially in smaller models such as Gemini 1.5 Flash.
Tianle Cai
@tianle_cai
May 20, 2024
Just finished reading the Gemini 1.5 report and I'm blown away by the depth of information shared in such a competitive environment! 🤯 Most surprising was the revelation about their optimizer - they didn't just use Adam! Optimization is still alive and kicking! Kudos to the team
4.2K
Sai Surya Duvvuri
@dvsaisurya
Aug 1, 2024
full-matrix Adaptive regularization at its best, Distributed Shampoo winning Algoperf competition with 28% wall time improvements, over strong first-order baselines in 8 diverse benchmarks: 2 imagenet, 2 librispeech, DLRM, WMT translation, U-Net-FMRI reconstruction, GNN.
Sourabh Medapati @ Neurips 2025
@activelifetribe
Aug 1, 2024
this has been in the making for a while and the results are super exciting!!! second-order optimizers are back on top beating adam style methods in a wall-time based benchmark 😄🥳 , kudos to @GeorgeEDahl @KasimbegPriya for making this happen!!
1.6K
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Replying to @dvsaisurya
The problem? This creates 3D tensors, which are inefficient on GPUs, yet, our work shows how to utilize tensorcores and scale up 🚀.
319
Sai Surya Duvvuri
@dvsaisurya
Jul 29, 2025
I think LoRA-RITE (led by Jui-Nan Yen) is one of very interesting/novel work I've been part of. Consider a function f(AB^T), where A and B are tall and thin parameter matrices. This function is invariant to matrix transformations M, A = A'M, B = B'M^{-T}. But is the optimization
610
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Replying to @dvsaisurya
This improves the beta coefficient in the scaling law for reasoning tasks, compared to transformer.
280
Sai Surya Duvvuri
@dvsaisurya
May 3, 2024
The parameter efficiency in this work seems like magic.
Nilesh Gupta
@nileshgupta2797
May 2, 2024
Excited to announce DEXML, dual-encoders for extreme multi-label learning, poster @iclr_conf TLDR; - InfoNCE/BCE are not suitable for multi-label retrieval (we propose appropriate changes) - Just dual encoders are enough for challenging XMC benchmarks arxiv.org/abs/2310.10636
370
Sai Surya Duvvuri
@dvsaisurya
Sep 23, 2023
Replying to @_arohan_
Congrats!
73
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Replying to @_arohan_ and @clashluke
Hoping I won’t get scooped by at least the long weekend.
616
Sai Surya Duvvuri
@dvsaisurya
Jul 4, 2025
Sample efficient attention mechanism/architectures (for logical/reasoning tasks) are crucial for data bound scenarios..
Evan Walters
@evaninwords
Jul 4, 2025
Excited to try this new form of attention, plus they put the entire triton kernel in the paper! 🤯 Some great authors including @Happylemon56775 @dvsaisurya and @_arohan_ arxiv.org/abs/2507.02754
444