Haolin Liu

Hi! I am Haolin Liu, a third-year PhD student at the University of Virginia, where I am fortunate to be advised by Prof. Chen-Yu Wei. Prior to this, I received my bachelor’s degree in Computer Science from ShanghaiTech University, where I studied chemistry for 1.5 years before transitioning to computer science for 2.5 years.

I am interested in developing principled and practical algorithms for Reinforcement Learning (RL), and understanding the training dynamic of these algorithms. Recently, I mainly focus on RL theory, RL for LLM reasoning and agents.

On the theoretical side, I study unified principles for RL algorithm design and seek to characterize the minimal structure required for sample-efficient learning. My recent works ([1], [2]) propose the most unified RL-theory frameworks to date, capable of handling both model-based and model-free RL in stationary and non-stationary environments.
On the practical side, I develop scalable RL pipelines for self-improving LLM agents with advanced reasoning and continual learning capabilities. My prior work has centered on new RL methods with fine-grained supervision and enhanced exploration mechanisms for LLM reasoning. I am currently building new environments and RL algorithms that enable agents to tackle long-horizon, complex tasks.

Currently, I am an intern at Bytedance Seed-LLM in San Jose, where I work on RL for multi-turn interactive agents. Previously, I was an intern at at Tencent AI Lab in Seattle.

selected publications

Preprint

On the Complexity of Offline Reinforcement Learning with Q^*-Approximation and Partial Coverage

(α-β) Haolin Liu, Braham Snyder, and Chen-Yu Wei

2026

PDF
ICLR

An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs

(α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert

ICLR, 2026

PDF
MATH-AI

One Token to Fool LLM-as-a-Judge

Yulai Zhao^*, Haolin Liu^*, Dian Yu, S.Y. Kung, Haitao Mi, and Dong Yu

NeurIPS 2025 MATH-AI Workshop, 2025

PDF
COLT

Decision Making in Hybrid Environments: A Model Aggregation Approach

(α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert

COLT, 2025

PDF
NeurIPS

Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback

(α-β) Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, and Julian Zimmert

NeurIPS, 2024

PDF
NeurIPS

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification

(α-β) Haolin Liu, Artin Tajdini, Andrew Wagenmaker, and Chen-Yu Wei

NeurIPS, 2024

PDF
ICLR

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback (Spotlight)

(α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert

ICLR, 2024

PDF
NeurIPS

Bypassing the simulator: Near-optimal adversarial linear contextual bandits

(α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert

NeurIPS, 2023

PDF