Haolin Liu

Department of Computer Science, University of Virginia

Hi! I am Haolin Liu, a third-year PhD student at the University of Virginia, where I am fortunate to be advised by Prof. Chen-Yu Wei. Prior to this, I received my bachelor’s degree in Computer Science from ShanghaiTech University, where I studied chemistry for 1.5 years before transitioning to computer science for 2.5 years.

I am interested in developing principled and practical algorithms for Reinforcement Learning (RL), and understanding the training dynamic of these algorithms. Recently, I mainly focus on RL theory, RL for LLM reasoning and agents.

  • On the theoretical side, I study unified principles for RL algorithm design and seek to characterize the minimal structure required for sample-efficient learning. My recent works ([1], [2]) propose the most unified RL-theory frameworks to date, capable of handling both model-based and model-free RL in stationary and non-stationary environments.
  • On the practical side, I develop scalable RL pipelines for self-improving LLM agents with advanced reasoning and continual learning capabilities. My prior work has centered on new RL methods with fine-grained supervision and enhanced exploration mechanisms for LLM reasoning. I am currently building new environments and RL algorithms that enable agents to tackle long-horizon, complex tasks.

Currently, I am an intern at Bytedance Seed-LLM in San Jose, where I work on RL for multi-turn interactive agents. Previously, I was an intern at at Tencent AI Lab in Seattle.

selected publications

  1. Preprint
    On the Complexity of Offline Reinforcement Learning with Q*-Approximation and Partial Coverage
    (α-β) Haolin Liu, Braham Snyder, and Chen-Yu Wei
    2026
  2. ICLR
    An Improved Model-Free Decision-Estimation Coefficient with Applications in Adversarial MDPs
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    ICLR, 2026
  3. MATH-AI
    One Token to Fool LLM-as-a-Judge
    Yulai Zhao*Haolin Liu*, Dian Yu, S.Y. Kung, Haitao Mi, and Dong Yu
    NeurIPS 2025 MATH-AI Workshop, 2025
  4. COLT
    Decision Making in Hybrid Environments: A Model Aggregation Approach
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    COLT, 2025
  5. NeurIPS
    Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback
    (α-β) Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, and Julian Zimmert
    NeurIPS, 2024
  6. NeurIPS
    Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
    (α-β) Haolin Liu, Artin Tajdini, Andrew Wagenmaker, and Chen-Yu Wei
    NeurIPS, 2024
  7. ICLR
    Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback (Spotlight)
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    ICLR, 2024
  8. NeurIPS
    Bypassing the simulator: Near-optimal adversarial linear contextual bandits
    (α-β) Haolin Liu, Chen-Yu Wei, and Julian Zimmert
    NeurIPS, 2023