TechTalks | Ben Dickson | Substack

Inside Nvidia's new technique to optimize long-context inference and continual learning

By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers on 128k…

Jan 13 • Ben Dickson

Meta’s new VL-JEPA model shifts from generating tokens to predicting concepts

Meta’s VL-JEPA outperforms massive vision-language models on world modeling tasks by learning to predict "thought vectors" instead of text tokens.

Jan 4 • Ben Dickson

How reinforcement learning changed LLM tool-use

A look at the evolution of LLM tool-use, from supervised fine-tuning to Reinforcement Learning (RLVR) and agentic applications in large and specialized…

Dec 30, 2025 • Ben Dickson

Inside URM, the architecture beating standard Transformers on reasoning tasks

The key to solving complex reasoning isn't stacking more transformer layers, but refining the "thought process" through efficient recurrent loops.

Dec 24, 2025 • Ben Dickson

What (I think) makes Gemini 3 Flash so good and fast

Google didn’t reveal a lot of information about its new Flash model. So we had to speculate a lot on what is going on under the hood.

Dec 22, 2025 • Ben Dickson

TechTalks

About Archive Recommendations Sitemap

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts