Ph.D. @ University of Texas at Dallas!
Prev. RA @ Singapore Management University
Prev. Research Intern @ THUNLP
Excited for the future of AI, want to be a part of it!
Email: kelvin.yangzhiyu@outlook.com
Hometown: Chengdu, China
GitHub | Google Scholar | LinkedIn
Ph.D. in Computer Science and Technology
University of Texas at Dallas
August 2025 - now
M.Eng. in Computer Science and Technology
Beijing Language and Culture University
September 2021 - July 2024
B.Eng. in Computer Science and Technology
Sichuan University
September 2017 - July 2021
Singapore Management University
Sept 2024 - April 2025
- Conducting LLM research under the supervision of Professor Yang Deng.
- Exploring LLMs' capabilities to identify and explain multi-hop and multiple logical errors in data analysis code.
Modelbest Co. Ltd. & OpenBMB
April 2024 - July 2024
- Worked during the initial phase of devising an agent framework LLM×MapReduce to adapt regular LLMs to process long context inputs.
- Serving as a team leader, devising research plans, guiding interns into our research group, participating in discussion and collaboration with fellow senior interns.
THUNLP, Tsinghua University
April 2023 - July 2024
- Conducting NLP research under the supervision of Shuo Wang, Ph.D.
- Distilled table reasoning skills from LLMs to small PLMs.
- Developed LLM agents for scientific data visualization.
- Curated multilingual SFT data.
- Participated in devising an agent framework for processing long context inputs.
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors - EMNLP 2025 Oral (First Author)
- Authors: Zhiyu Yang, Shuo Wang, Yukun Yan, Yang Deng.
- Summary: Introduced DSDBench, a challenging benchmark built via an automated framework to test LLMs on realistic data science code with multiple, multi-hop bugs. Our findings reveal that even top models struggle to trace error origins and achieve complete bug detection, exposing a critical gap in their reasoning and debugging capabilities.
- Contribution: I designed the DSDBench dataset construction pipeline, implemented the automated error injection framework, and conducted experiments to evaluate state-of-the-art LLMs, revealing critical performance gaps in dynamic debugging.
MatPlotAgent - ACL 2024 Findings (First Author)
- Authors: Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun.
- Summary: Introduced MatPlotBench for automatic evaluation of AI methods for scientific data visualization. Proposed MatPlotAgent, a framework using visual feedback to enhance LLM performance.
- Contribution: Designed the agent framework and evaluation method, conducted experiments, and curated data for MatPlotBench.
UltraLink - ACL 2024 (Fifth Author)
- Authors: Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun.
- Summary: Developed a multilingual SFT dataset with language-specific and language-agnostic subsets using knowledge-enhanced data augmentation methods with Wikipedia as knowledge source.
- Contribution: Concretized the paper's idea, designed initial prompt templates for data synthesis, and revised the paper.
Enhancing Free-Form Table Question Answering Models by Distilling Relevant-Cell-Based Rationales - CCL 2024 (First Author)
- Authors: Zhiyu Yang, Shuo Wang, Yukun Yan, Pengyuan Liu, Dong Yu.
- Summary: Proposed a knowledge distillation method for table QA tasks using relevant-cell-based rationales, achieving SOTA results on the FeTaQA benchmark.
- Contribution: Developed the distillation method, conducted experiments, and authored the paper.
- Converted MatPlotAgent into an interactive online demo.
- Demonstrated its workflow and performance to scholars attending CCL 2024.
- Explored various pre-trained language models for understanding the plausibility of implicit and underspecified texts.
- Fine-tuned Facebook AI’s MUPPET model for optimal performance.
- Proposed a novel garbage classification deep neural network architecture.
- Achieved superior performance compared to mainstream models on a Huawei Cloud Garbage Classification Competition dataset.
- Mandarin: Native
- English: Fluent (IELTS Overall Band 8.0, Reading: 9, Listening: 9, Writing: 7.5, Speaking: 7)
- Python, C++, Java
- PyTorch, Huggingface, PyG, Keras, TensorFlow 2, Linux, Android Studio, vLLM, Matplotlib, Numpy, Pandas
- Convolutional Neural Networks, Pre-trained NLU and NLG models, LLMs
