Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Researchers from Peking University and DeepSeek-AI developed Engram, a conditional memory module for Large Language Models that integrates efficient knowledge lookup as a new axis of sparsity. This architecture demonstrably improves performance across diverse benchmarks and enables aggressive parameter scaling with minimal inference overhead.

View blog
Resources816
Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero, a framework developed by Meta Superintelligence Labs and UIUC, enables multi-turn search agents to self-evolve without any human-curated training data. It utilizes a proposer-solver co-evolution mechanism and a novel Hop-Grouped Relative Policy Optimization, allowing agents to match or surpass supervised baselines on open-domain QA tasks while drastically reducing computational overhead.

View blog
Resources3
Reward Modeling from Natural Language Human Feedback

The paper addresses the issue of "outcome-process inconsistency" in Generative Reward Models (GRMs), where models predict correct preferences but generate flawed critiques. Researchers from Tongyi Lab, Alibaba Group, developed Reward Modeling from Natural Language Human Feedback (RM-NLHF), a framework that uses an Online Meta Reward Model to scale process-level supervision from limited human critiques, significantly enhancing GRM performance, critique quality, and reasoning alignment across various benchmarks.

View blog
Resources
Ministral 3
13 Jan 2026

The Ministral 3 series introduces a family of parameter-efficient language models, ranging from 3B to 14B parameters, with integrated multimodal and long-context capabilities. These models achieve strong performance, competitive with or surpassing larger open-weight models, by leveraging an efficient Cascade Distillation training method from a 24B parent model.

View blog
Resources7,476
STEP3-VL-10B Technical Report
14 Jan 2026

StepFun's ST E P 3-VL-10B is a 10-billion-parameter multimodal large language model that achieves frontier-level performance in visual perception and reasoning, frequently matching or surpassing models 10-20 times larger and leading proprietary systems. This work demonstrates that advanced multimodal intelligence can be attained with compact efficiency through strategic architectural choices, high-quality data, and advanced reinforcement learning, including a novel Parallel Coordinated Reasoning (PaCoRe) method.

View blog
Resources6
Controlled LLM Training on Spectral Sphere
13 Jan 2026

The Spectral Sphere Optimizer (SSO) is introduced to stabilize large language model (LLM) training by strictly enforcing Maximal Update Parametrization (μP) through module-wise spectral norm constraints on weights and updates. SSO achieves lower validation loss and faster convergence, for instance, reaching the same validation loss 19% faster than AdamW on a 1.7B model, while consistently maintaining activation stability and enabling stable μP learning rate transfer.

View blog
Resources9
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
14 Jan 2026

Fast-ThinkAct proposes an efficient embodied reasoning framework that uses verbalizable latent planning to address the high inference latency in Vision-Language-Action (VLA) models. The approach reduces inference latency by up to 89.3% while achieving state-of-the-art performance in robotic manipulation, long-horizon planning, failure recovery, and few-shot adaptation tasks.

View blog
Resources4,349
Controlled Self-Evolution for Algorithmic Code Optimization
14 Jan 2026

Controlled Self-Evolution (CSE) is introduced as a framework that enables Large Language Models to generate algorithmically optimized code by enhancing self-evolutionary processes. The framework integrates diversified initialization, feedback-guided genetic operations, and hierarchical memory, consistently yielding code with superior time and space complexity on the EffiBench-X benchmark compared to existing methods.

View blog
Resources
The motivic class of the space of genus 00 maps to the flag variety
12 Jan 2026

The paper calculates the motivic class of the space of genus zero maps to the complete flag variety, showing it is motivically equivalent to a product of the general linear group and an affine space. This mathematical result was achieved through a human-AI collaborative research process, where AI systems provided substantial input in proof discovery.

View blog
Resources
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
08 Jan 2026

GDPO (Group reward-Decoupled Normalization Policy Optimization) improves multi-reward reinforcement learning for large language model alignment by addressing "reward signal collapse" inherent in existing GRPO methods. It achieves this by decoupling the normalization of individual reward components, leading to more stable training and enhanced performance across various tasks, such as increasing mathematical reasoning accuracy by up to 6.3% on AIME.

View blog
Resources5Twitter (X) logo80
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
12 Jan 2026

Multi-Head Linear Attention (MHLA) addresses the expressivity limitations of linear attention by employing token-level multi-head partitioning and a learnable mixing strategy for key-value summaries. This method achieves state-of-the-art performance across image classification, image/video generation, and natural language processing tasks, demonstrating improved representational diversity and focus while maintaining linear computational complexity.

View blog
Resources7
BabyVision: Visual Reasoning Beyond Language

BabyVision introduces a benchmark evaluating Multimodal Large Language Models (MLLMs) on foundational visual reasoning tasks, revealing that even state-of-the-art models significantly underperform human children on basic visual perception, often due to a "verbalization bottleneck" that limits true visual understanding.

View blog
Resources50
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
13 Jan 2026

Researchers from the University of Pennsylvania and Microsoft Research developed Multiplex Thinking, a method that uses stochastic continuous "multiplex tokens" formed by aggregating multiple sampled discrete tokens, enabling more effective on-policy reinforcement learning for LLMs. This approach consistently outperforms discrete and continuous baselines on mathematical reasoning tasks, achieving superior Pass@1 performance and higher exploration potential while generating shorter reasoning sequences.

View blog
Resources4
Learning Latent Action World Models In The Wild

Latent action world models are developed to learn from large-scale, unlabeled "in-the-wild" videos, discovering universal, spatially-localized, camera-relative action representations. These models achieve planning performance on robotic manipulation and navigation tasks that are competitive with, or surpass, baselines trained on action-labeled, domain-specific data.

View blog
Resources
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

A new benchmark, VideoDR, is introduced to evaluate multimodal large language models (MLLMs) on "video deep research," requiring models to combine multi-frame visual cues from videos with iterative web search and multi-hop reasoning. Experiments on VideoDR reveal that while the Agentic paradigm can achieve higher accuracy for strong MLLMs (e.g., Gemini-3-pro-preview at 76%), its effectiveness is not universal, often challenged by goal drift and long-horizon consistency issues across various models.

View blog
Resources
Motion Attribution for Video Generation

Motive introduces a framework to attribute specific motion characteristics in generated videos to their training data, enabling a deeper understanding of temporal dynamics in video generative models. This method facilitates targeted data curation, which significantly improves the motion fidelity and temporal consistency of generated video content.

View blog
Resources
KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
12 Jan 2026

Researchers at NVIDIA developed KVzap, an input-adaptive KV cache pruning method that achieves 2-4x compression with negligible accuracy loss. The system is designed for efficient integration into LLM inference engines, outperforming prior approaches on the NVIDIA KVpress Leaderboard.

View blog
Resources753
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

MemGovern is a framework that allows code agents to learn from governed human debugging experiences scraped from GitHub, transforming unstructured data into structured "experience cards" and enabling an "agentic search" mechanism. It boosted the average bug resolution rate of various LLM-backed agents on SWE-bench Verified by 4.65% by facilitating more effective and reliable debugging strategies.

View blog
Resources4
Demystifying the Slash Pattern in Attention: The Role of RoPE

A study provides a dual empirical and theoretical explanation for the emergence of slash attention patterns in large language models, demonstrating their intrinsic nature and the critical role of Rotary Position Embeddings (RoPE), where high and medium frequencies act on approximately rank-one pre-RoPE query/key matrices.

View blog
Resources
Video Generation Models in Robotics - Applications, Research Challenges, Future Directions

A comprehensive survey analyzes video generation models as embodied world models in robotics, detailing their applications in areas like imitation learning and visual planning, while also identifying critical research challenges such as physical consistency and uncertainty quantification. It concludes by proposing future directions for their trustworthy integration into safety-critical robotic systems.

View blog
Resources
There are no more papers matching your filters at the moment.