alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Summarize the latest AI papers (Alt+↵ to search)

Events

Watch Recordings

Training AI Co-Scientists01/16 · Shashwat Goel · Meta AI

Recursive Language Models01/23 · Alex Zhang · MIT

Papers Benchmarks

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

12 Jan 2026

Peking University DeepSeek logo

Researchers from Peking University and DeepSeek-AI developed Engram, a conditional memory module for Large Language Models that integrates efficient knowledge lookup as a new axis of sparsity. This architecture demonstrably improves performance across diverse benchmarks and enables aggressive parameter scaling with minimal inference overhead.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Dr. Zero: Self-Evolving Search Agents without Training Data

11 Jan 2026

Meta Superintelligence Labs

Dr. Zero, a framework developed by Meta Superintelligence Labs and UIUC, enables multi-turn search agents to self-evolve without any human-curated training data. It utilizes a proposer-solver co-evolution mechanism and a novel Hop-Grouped Relative Policy Optimization, allowing agents to match or surpass supervised baselines on open-domain QA tasks while drastically reducing computational overhead.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Reward Modeling from Natural Language Human Feedback

12 Jan 2026

Alibaba Group Tongyi Lab

The paper addresses the issue of "outcome-process inconsistency" in Generative Reward Models (GRMs), where models predict correct preferences but generate flawed critiques. Researchers from Tongyi Lab, Alibaba Group, developed Reward Modeling from Natural Language Human Feedback (RM-NLHF), a framework that uses an Online Meta Reward Model to scale process-level supervision from limited human critiques, significantly enhancing GRM performance, critique quality, and reasoning alignment across various benchmarks.

#computer-science #computation-and-language

Paper thumbnail

13 Jan 2026

The Ministral 3 series introduces a family of parameter-efficient language models, ranging from 3B to 14B parameters, with integrated multimodal and long-context capabilities. These models achieve strong performance, competitive with or surpassing larger open-weight models, by leveraging an efficient Cascade Distillation training method from a 24B parent model.

#computer-science #computation-and-language

Resources 7,476

Paper thumbnail

STEP3-VL-10B Technical Report

14 Jan 2026

StepFun's ST E P 3-VL-10B is a 10-billion-parameter multimodal large language model that achieves frontier-level performance in visual perception and reasoning, frequently matching or surpassing models 10-20 times larger and leading proprietary systems. This work demonstrates that advanced multimodal intelligence can be attained with compact efficiency through strategic architectural choices, high-quality data, and advanced reinforcement learning, including a novel Parallel Coordinated Reasoning (PaCoRe) method.

#computer-science #computer-vision-and-pattern-recognition #fine-tuning

Paper thumbnail

Controlled LLM Training on Spectral Sphere

13 Jan 2026

The Spectral Sphere Optimizer (SSO) is introduced to stabilize large language model (LLM) training by strictly enforcing Maximal Update Parametrization (μP) through module-wise spectral norm constraints on weights and updates. SSO achieves lower validation loss and faster convergence, for instance, reaching the same validation loss 19% faster than AdamW on a 1.7B model, while consistently maintaining activation stability and enabling stable μP learning rate transfer.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

14 Jan 2026

Fast-ThinkAct proposes an efficient embodied reasoning framework that uses verbalizable latent planning to address the high inference latency in Vision-Language-Action (VLA) models. The approach reduces inference latency by up to 89.3% while achieving state-of-the-art performance in robotic manipulation, long-horizon planning, failure recovery, and few-shot adaptation tasks.

#agents #chain-of-thought #computer-science

Resources 4,349

Paper thumbnail

Controlled Self-Evolution for Algorithmic Code Optimization

14 Jan 2026

Controlled Self-Evolution (CSE) is introduced as a framework that enables Large Language Models to generate algorithmically optimized code by enhancing self-evolutionary processes. The framework integrates diversified initialization, feedback-guided genetic operations, and hierarchical memory, consistently yielding code with superior time and space complexity on the EffiBench-X benchmark compared to existing methods.

#agentic-frameworks #agents #computer-science

Paper thumbnail

The motivic class of the space of genus

0

maps to the flag variety

12 Jan 2026

The paper calculates the motivic class of the space of genus zero maps to the complete flag variety, showing it is motivically equivalent to a product of the general linear group and an affine space. This mathematical result was achieved through a human-AI collaborative research process, where AI systems provided substantial input in proof discovery.

#algebraic-geometry #algebraic-topology #mathematics

Paper thumbnail

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

08 Jan 2026

GDPO (Group reward-Decoupled Normalization Policy Optimization) improves multi-reward reinforcement learning for large language model alignment by addressing "reward signal collapse" inherent in existing GRPO methods. It achieves this by decoupling the normalization of individual reward components, leading to more stable training and enhanced performance across various tasks, such as increasing mathematical reasoning accuracy by up to 6.3% on AIME.

#agents #computer-science #artificial-intelligence

Paper thumbnail

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

12 Jan 2026

Multi-Head Linear Attention (MHLA) addresses the expressivity limitations of linear attention by employing token-level multi-head partitioning and a learnable mixing strategy for key-value summaries. This method achieves state-of-the-art performance across image classification, image/video generation, and natural language processing tasks, demonstrating improved representational diversity and focus while maintaining linear computational complexity.

#attention-mechanisms #computer-science #artificial-intelligence

Paper thumbnail

BabyVision: Visual Reasoning Beyond Language

10 Jan 2026

Alibaba Group Tsinghua University logo

Tsinghua University

BabyVision introduces a benchmark evaluating Multimodal Large Language Models (MLLMs) on foundational visual reasoning tasks, revealing that even state-of-the-art models significantly underperform human children on basic visual perception, often due to a "verbalization bottleneck" that limits true visual understanding.

#computer-science #computation-and-language #computer-vision-and-pattern-recognition

Paper thumbnail

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

13 Jan 2026

Researchers from the University of Pennsylvania and Microsoft Research developed Multiplex Thinking, a method that uses stochastic continuous "multiplex tokens" formed by aggregating multiple sampled discrete tokens, enabling more effective on-policy reinforcement learning for LLMs. This approach consistently outperforms discrete and continuous baselines on mathematical reasoning tasks, achieving superior Pass@1 performance and higher exploration potential while generating shorter reasoning sequences.

#chain-of-thought #computer-science #artificial-intelligence

Paper thumbnail

Learning Latent Action World Models In The Wild

08 Jan 2026

New York University Meta logo

Latent action world models are developed to learn from large-scale, unlabeled "in-the-wild" videos, discovering universal, spatially-localized, camera-relative action representations. These models achieve planning performance on robotic manipulation and navigation tasks that are competitive with, or surpass, baselines trained on action-labeled, domain-specific data.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

11 Jan 2026

Chinese Academy of Sciences

University of Southern California

A new benchmark, VideoDR, is introduced to evaluate multimodal large language models (MLLMs) on "video deep research," requiring models to combine multi-frame visual cues from videos with iterative web search and multi-hop reasoning. Experiments on VideoDR reveal that while the Agentic paradigm can achieve higher accuracy for strong MLLMs (e.g., Gemini-3-pro-preview at 76%), its effectiveness is not universal, often challenged by goal drift and long-horizon consistency issues across various models.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Motion Attribution for Video Generation

13 Jan 2026

Princeton University

Motive introduces a framework to attribute specific motion characteristics in generated videos to their training data, enabling a deeper understanding of temporal dynamics in video generative models. This method facilitates targeted data curation, which significantly improves the motion fidelity and temporal consistency of generated video content.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

KVzap: Fast, Adaptive, and Faithful KV Cache Pruning

12 Jan 2026

Researchers at NVIDIA developed KVzap, an input-adaptive KV cache pruning method that achieves 2-4x compression with negligible accuracy loss. The system is designed for efficient integration into LLM inference engines, outperforming prior approaches on the NVIDIA KVpress Leaderboard.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

11 Jan 2026

Chinese Academy of Sciences

National University of Singapore

MemGovern is a framework that allows code agents to learn from governed human debugging experiences scraped from GitHub, transforming unstructured data into structured "experience cards" and enabling an "agentic search" mechanism. It boosted the average bug resolution rate of various LLM-backed agents on SWE-bench Verified by 4.65% by facilitating more effective and reliable debugging strategies.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Demystifying the Slash Pattern in Attention: The Role of RoPE

13 Jan 2026

National University of Singapore Yale University logo

Yale University

A study provides a dual empirical and theoretical explanation for the emergence of slash attention patterns in large language models, demonstrating their intrinsic nature and the critical role of Rotary Position Embeddings (RoPE), where high and medium frequencies act on approximately rank-one pre-RoPE query/key matrices.

#attention-mechanisms #computer-science #artificial-intelligence

Paper thumbnail

Video Generation Models in Robotics - Applications, Research Challenges, Future Directions

12 Jan 2026

Princeton University Temple University

A comprehensive survey analyzes video generation models as embodied world models in robotics, detailing their applications in areas like imitation learning and visual planning, while also identifying critical research challenges such as physical consistency and uncertainty quantification. It concludes by proposing future directions for their trustworthy integration into safety-critical robotic systems.

#computer-science #robotics #electrical-engineering

Paper thumbnail

There are no more papers matching your filters at the moment.