A more serious thread on the DeepSeek-OCR hype / serious misinterpretation going on.
1.
On token reduction via representing text in images, researchers from Cambridge have previously shown that 500x prompt token compression is possible (ACL'25, Li, Su, and Collier).
Without
๐ Excited to share our latest research on Looped Transformers for Length Generalization!
TL;DR: We trained a Looped Transformer that dynamically adjusts the number of iterations based on input difficultyโand it achieves near-perfect length generalization on various tasks!
๐งต๐
๐ Excited to share our work on Encoder-only Next Token Prediction (ENTP)!
While most successful LLMs are decoder-based, we asked: Can encoder-only TFs be used for next-token prediction?
Yes!
Moreover, ENTP might be better than decoder-only models!!! ๐
๐! Finetuning a pretrained lang model (e.g., GPT3) has become a popular approach to solve many text-based tasks. This paradigm is making ML very accessible as all you need to prepare is text data for finetuning.
Does it also work for non-text tasks? Surprisingly, yes!!!
(1/8)
1/ Super excited to share our new work โLLM-Lasso,โ led by my collaborators from Stanford!
tldr; We've reimagined the classic Lasso algorithm (by @robtibshirani), which uses โ1 regularization to select a sparse subset of features!
DLLMs seem promising... but parallel generation is not always possible
Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one.
This makes diffusion-based LLMs highly attractive when we need fast
Is 4o Image Generation really good at native in-context learning (as written on the whiteboard ๐)?
About a year ago, @yzeng58 et al. proposed a very challenging Text-to-Image in-context learning benchmark called CoBSAT...
All models completely failed it.
4o crushed it. ๐งต
๐งตLet me explain why the early ascent phenomenon occurs๐ฅ
We must first understand that in-context learning exhibits two distinct modes.
When given samples from a novel task, the model actually learns the pattern from the examples.
We call this mode the "task learning" mode.
Happy to share that I got tenured last month!
While every phase in life is special, this one feels a bit more meaningful, and it made me reflect on the past 15+ years in academia. I'd like to thank @UWMadison and @UWMadisonECE for tremendous support throughout the past six
1/10: The summer break is the perfect time to share recent research from my lab. Our first story revolves around a fresh interpretation of diffusion-based generative modeling by my brilliant student @yingfan_bot. She proposed "diffusion models are solving a control problem".
I'm honored to receive the NSF CAREER Award!
Our group will develop a unified theory and new algorithms with provable guarantees for learning with frozen pretrained models, also known as foundation models.
Huge thanks to NSF and my amazing collaborators and students! ๐ฅณ