Hello! I am Jisen Li, an undergraduate student majoring in Mathematics and Computer Science at the University of Illinois Urbana-Champaign (UIUC), graduating in May 2026. My research interests focus on inference, retrieval and ranking, and AI agents. I am currently a Research Intern at Together AI, working on speculative decoding and model quantization under the supervision of Dr. Shirley Wu and Prof. Shuaiwen Leon Song.

In industry research, I have interned at Nexa AI, where I worked on on-device models and AI agents; at Snowflake, focusing on embedded search models and hybrid retrieval systems; and at TikTok, where I developed multimodal recommendation models.

In academic research, I have worked at UIUC U-Lab under the supervision of Prof. Jiaxuan You, researching AI agent communication protocols and multi-agent benchmarking. I have also conducted research at the University of Chicago under the supervision of Dr. Eliu A. Huerta, focusing on AI deployment, and at Tsinghua University under the guidance of Prof. Yongqiang Lyu, focusing on computer vision and digital signal processing.

Outside of research, I enjoy fitness and strategy games such as Go. Please feel free to reach out at jisenli2@illinois.edu—I would be happy to discuss collaboration opportunities.

News 🎉

  • [Feb 2026] Aurora Day Zero support for Qwen3-Coder-Next-FP8! [Checkpoint].
  • [Jan 2026] Joined Together AI as Research Intern!
  • [Jan 2026] Kitty paper accepted to MLSys 2026!
  • [Dec 2025] Grateful for OpenAI offer, but chose different path!

Research Interests

  • Inference: Speculative Decoding, KV Cache Quantization, and efficient model serving.
  • LLM Fine-tuning: Instruction tuning, contrastive learning, CoT, and RLHF for alignment.
  • Multimodal: Vision-text models and cross-modal alignment using frozen LLMs.
  • Agentic AI: Agents with memory, planning, and decision-making for collaboration.

Industry Experience

Together AI Together AI – Research Intern Jan 2026 – Present

Speculative Decoding (Aurora): Developed a unified training-serving system that continuously learns speculators from live inference traces, achieving 1.45× speedup on frontier models and 1.25× additional speedup under distribution shift. [Paper]

Nexa AI Nexa AI – AI Research Engineer Intern Oct 2025 – Dec 2025

On-Device AI Agent: Deployed local Qwen3 VL model via Nexa SDK for on-device agent with tool use, and contributed to nexa-sdk for efficient NPU-based inference.

Snowflake Snowflake – Software Engineer Intern (AI/ML) May 2025 – Aug 2025

Cortex Search: Fine-tuned Arctic embed v2.0 for semantic retrieval using contrastive learning and CoT-generated queries, improving nDCG by 15%, and developed a Gaussian Process Optimization system for auto-tuning hybrid search parameters deployed to Global Service.

TikTok TikTok – Machine Learning Engineer Intern Aug 2024 – Dec 2024

Multimodal Recommendation: Developed Rec-Qwen model based on Qwen2.5-0.5B/7B with a LLaVA-inspired architecture to align user profiles and content embeddings under a frozen LLM backbone, achieving a 28% CTR accuracy gain in item recommendation.

Academic Research

Together AI Together AI Sep 2025 – Present

KV Cache Quantization (Kitty): Developed Kitty, a mixed-precision KV caching system. Introduced Dynamic Channel-wise Precision Boost, achieving 8× memory reduction with near-zero accuracy loss and 2.1×–4.1× throughput improvement.

Outcome: Core Contributor, MLSys 2026. Supervised by Dr. Shirley Wu. [Paper] [Code]

UIUC University of Illinois Urbana-Champaign Jul 2025 – Present

AI Agent Protocol Benchmarking: Developed a unified benchmark to evaluate agent communication protocols across Document QA, Collaborative Coding, and MAPF. Contributed to ANP protocol adaptation and Meta Protocol integration.

Outcome: Co-First Author, ICML 2026 Under Review. Supervised by Prof. Jiaxuan You [Paper]

UChicago University of Chicago (Globus Lab) Jun 2024 – Aug 2024

Automated Deployment for Science Models: Implemented a pipeline to deploy AI/ML models on the Garden platform, integrating Delta HPC for real-time inference with GPU support.

Supervised by Dr. Eliu A. Huerta.

Tsinghua Tsinghua University Jan 2020 – Apr 2021

Non-contact HRV Monitoring: Developed a non-contact HRV monitoring algorithm using computer vision and Eulerian Video Magnification, improving accuracy by 12%.

Supervised by Prof. Yongqiang Lyu.

Education

UIUC University of Illinois Urbana-Champaign (UIUC) Aug 2023 – May 2026
B.S. in Mathematics + Computer Science, GPA: 3.98/4.0

🌱 Gap Year Jan 2023 – Aug 2023

HUST Huazhong University of Science and Technology (HUST) Sep 2021 – Jan 2023
B.S. in Computer Science (Transferred)