Awesome LLM-Based Human-Agent Collaboration and Interaction Systems

Welcome to Awesome-Human-Agent-Collaboration-Interaction-Systems! 🚀 This is the repo for our Survey on LLM-Based Human-Agent Collaboration and Interaction Systems.

🌟 Introduction

Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including (1) limited reliability due to hallucinations, (2) difficulty in handling complex tasks, and (3) substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications.

LLM-based human-agent collaboration systems are interactive frameworks where humans actively provide (1) additional information, (2) feedback, or (3) control during interaction with LLM-powered agents to enhance system performance, reliability, and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. For a detailed introduction, please refer to our survey paper: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey.

Our goal with this project is to build an exhaustive collection of awesome resources relevant to LLM-Based Human-Agent/AI Collaboration and Interaction Systems, encompassing papers, repositories, and more to foster further research and innovation in this rapidly evolving interdisciplinary field of human-ai collaboration. 🤗 Contributions are welcome! 🤗 If you have recommended papers, resources or suggestions, please submit pull requests, open issues or contact us. We will keep updating our repo & survey paper.

📄 Latest Research Papers

(©️click here back to table of contents👆🏻)

🤗 Contributions are welcome! If you have recommended papers and resources, please submit pull requests or open issues.

[1 Apr 2026] [arXiv 2026] When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation
[30 Mar 2026] [arXiv 2026] ViviDoc: Generating Interactive Documents through Human-Agent Collaboration
[18 Feb 2026] [arXiv 2026] Learning Personalized Agents from Human Feedback
[30 Nov 2025] [arXiv 2025] HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding
[4 Nov 2025] [arXiv 2025] Training Proactive and Personalized LLM Agents
[15 Oct 2025] [arXiv 2025] Training LLM Agents to Empower Humans
[10 Oct 2025] [arXiv 2025] How can we assess human-agent interactions? Case studies in software agent design
[7 Oct 2025] [arXiv 2025] RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
[24 Sep 2025] [arXiv 2025] UserRL: Training Proactive User-Centric Agent via Reinforcement Learning
[26 Aug 2025] [arXiv 2025] MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use
[20 Aug 2025] [arXiv 2025] aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists
[31 Jul 2025] [arXiv 2025] MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying
[30 Jul 2025] [arXiv 2025] Magentic-UI: Towards Human-in-the-loop Agentic Systems
[29 Jul 2025] [arXiv 2025] UserBench: An Interactive Gym Environment for User-Centric Agents
[28 Jul 2025] [arXiv 2025] GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
[23 Jul 2025] [arXiv 2025] Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
[21 Jul 2025] [arXiv 2025] Interaction as Intelligence: Deep Research With Human-AI Partnership
[13 Jun 2025] [arXiv 2025] Interaction, Process, Infrastructure: A Unified Architecture for Human-Agent Collaboration
[11 Jun 2025] [arXiv 2025] A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
[9 Jun 2025] [arXiv 2025] τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment
[6 Jun 2025] [arXiv 2025] Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce
[24 May 2025] [ICLR 2025] Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [Code]
[23 May 2025] [arXiv 2025] Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control
[21 May 2025] [arXiv 2025] Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild
[16 May 2025] [arXiv 2025] XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision
[5 May 2025] [arXiv 2025] SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration
[1 May 2025] [arXiv 2025] LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
[13 Apr 2025] [arXiv 2025] EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
[11 Apr 2025] [arXiv 2025] MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
[4 Apr 2025] [arXiv 2025] APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
[24 Mar 2025] [ACL 2025 Findings] SPHERE: An Evaluation Card for Human-AI Systems
[19 Mar 2025] [arXiv 2025] SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
[10 Mar 2025] [arXiv 2025] Experimental Exploration: Investigating Cooperative Interaction Behavior Between Humans and Large Language Model Agents
[4 Mar 2025] [arXiv 2025] FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting
[3 Mar 2025] [ICML 2025] M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
[27 Feb 2025] [ICLR 2025] ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
[17 Feb 2025] [ACL 2025] Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration
[2 Feb 2025] [ICML 2025] CollabLLM: From Passive Responders to Active Collaborators
[28 Jan 2025] [NAACL 2025 Demo] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
[25 Dec 2024] [IROS 2024] To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions
[20 Dec 2024] [arXiv 2024] Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
[8 Dec 2024] [arXiv 2024] Towards Modeling Human-Agentic Collaborative Workflows: A BPMN Extension
[26 Nov 2024] [arXiv 2024] Effect of Adaptive Communication Support on LLM-powered Human-Robot Collaboration
[31 Oct 2024] [ICLR 2025] PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
[30 Oct 2024] [ICLR 2025] ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
[16 Oct 2024] [ICLR 2025] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
[26 Sep 2024] [arXiv 2024] AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment
[25 Sep 2024] [arXiv 2024] AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
[13 Sep 2024] [arXiv 2024] Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task
[27 Aug 2024] [EMNLP 2024] Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
[12 Jul 2024] [SME 2024] Human-LLM collaboration in generative design for customization
[20 Jun 2024] [RAL 2024] Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration
[18 Jun 2024] [ICLR 2025] τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
[17 Jun 2024] [EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning
[14 Jun 2024] [NeurIPS 2024] DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
[4 Jun 2024] [CASE 2024] Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models
[30 May 2024] [arXiv 2024] Safe Multi-agent Reinforcement Learning with Natural Language Constraints
[27 May 2024] [AAAI 2025] REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity
[23 Apr 2024] [NeurIPS 2024] Aligning LLM Agents by Learning Latent Preference from User Edits
[18 Apr 2024] [arXiv 2024] AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration
[5 Apr 2024] [IUI 2024] PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs
[19 Mar 2024] [arXiv 2024] Embodied LLM Agents Learn to Cooperate in Organized Teams
[8 Feb 2024] [arXiv 2024] WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
[7 Feb 2024] [NeurIPS 2024] Can Large Language Model Agents Simulate Human Trust Behavior?
[25 Jan 2024] [arXiv 2024] A2C: A Modular Multi-stage Collaborative Decision Framework for Human-AI Teams
[23 Dec 2023] [AAMAS 2024] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
[18 Oct 2023] [ICLR 2024] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
[19 Sep 2023] [WACV 2024] Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles
[19 Sep 2023] [ICLR 2024] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
[18 Sep 2023] [NAACL 2024] MindAgent: Emergent Gaming Interaction
[1 Aug 2023] [ICML 2023] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
[5 Jul 2023] [ICLR 2024] Building Cooperative Embodied Agents Modularly with Large Language Models
[4 Jul 2023] [ICML 2023] Embodied Task Planning with Large Language Models
[1 Jun 2023] [IEEE 2023] Improved Trust in Human-Robot Collaboration With ChatGPT
[22 May 2023] [EACL 2024] Investigating Agency of LLMs in Human-AI Collaboration Tasks
[21 Apr 2023] [EACL 2024] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback

📚 Applications, Datasets & Benchmarks

(©️click here back to table of contents👆🏻)

💻 Web Navigation & Computer Use

[1 Apr 2026] [arXiv 2026] When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

👨🏻‍💻 Software Engineering, Coding

[7 Oct 2025] [arXiv 2025] RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
[30 Jul 2025] [arXiv 2025] Magentic-UI: Towards Human-in-the-loop Agentic Systems
[19 Mar 2025] [arXiv 2025] SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
[27 Feb 2025] [ICLR 2025] ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
[19 Sep 2023] [ICLR 2024] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
[26 Jun 2023] [NeurIPS 2023] InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

🤖 Embodied AI, Robotics

[31 Oct 2024] [ICLR 2025] PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
[19 Sep 2023] [ICLR 2024] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
[5 Jul 2023] [ICLR 2024] Building Cooperative Embodied Agents Modularly with Large Language Models
[4 Jul 2023] [arXiv 2023] Embodied Task Planning with Large Language Models
[21 Apr 2023] [EACL 2024 Findings] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback

💬 Conversation System

[24 Sep 2025] [arXiv 2025] UserRL: Training Proactive User-Centric Agent via Reinforcement Learning
[27 Aug 2024] [EMNLP 2024] Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
[24 May 2025] [ICLR 2025] Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [Code]
[8 Feb 2024] [arXiv 2024] WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
[19 Sep 2023] [ICLR 2024] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

🎮 Gaming

[11 Apr 2025] [arXiv 2025] MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
[18 Sep 2023] [ICLR 2024] MindAgent: Emergent Gaming Interaction

💰 Finance

[4 Mar 2025] [arXiv 2025] FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting Data Link

🏥 Healthcare, Medicine

[28 Jul 2025] [arXiv 2025] GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
[13 Apr 2025] [arXiv 2025] EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety

🛍️ Retail, Telecom

[9 Jun 2025] [arXiv 2025] τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment
[18 Jun 2024] [ICLR 2025] τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

🛩️ Travel

[29 Jul 2025] [arXiv 2025] UserBench: An Interactive Gym Environment for User-Centric Agents
[9 Jun 2025] [arXiv 2025] τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment
[18 Jun 2024] [ICLR 2025] τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

✍️ Writing

[21 May 2025] [arXiv 2025] Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild
[16 May 2025] [arXiv 2025] XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision

🔍 Taxonomy

(©️click here back to table of contents👆🏻)

For a detailed introduction of the taxonomy, please refer to Section 3 in our survey paper: LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey.

🤝 Human Feedback

(©️click here back to table of contents👆🏻)

Human Feedback can occur during different phases in various types and granularities. In the following table, we summarize different dimensions of Human Feedback in LLM-based human-agent systems, including feedback type, granularity, and phase. For each dimension, a summary, key characteristics, and example works are provided for comparison. More details are in Section 3.2 of our survey paper.

Title	Date & Code	Feedback Type	Feedback Subtype	Feedback Granularity	Feedback Phase
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering	2025/12	Guidance, Evaluative	Critique, Preference Ranking	Segment	Initial Setup, During Task
HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding	2025/11	Guidance, Corrective	Refinements, Critique	Segment, Holistic	During Task
Training Proactive and Personalized LLM Agents	2025/11	Guidance, Evaluative	Scalar Rating, Refinements	Segment, Holistic	During Task
Training LLM Agents to Empower Humans	2025/10	Implicit, Corrective	User Action, Refinements	Segment	During Task
How can we assess human-agent interactions? Case studies in software agent design	2025/10	Evaluative	Scalar Rating	Segment	During Task
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback	2025/10	Guidance, Corrective	Refinements, Critique, Demonstration	Segment	During Task
UserRL: Training Proactive User-Centric Agent via Reinforcement Learning	2025/09	Evaluative, Guidance, Implicit	Scalar Rating, Refinements, Critique	Segment	During Task
MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use	2025/08	Evaluative, Guidance	Binary Assessment, Refinement	Holistic	Initial Setup, During Task
Magentic-UI: Towards Human-in-the-loop Agentic Systems	2025/07	Evaluative, Corrective, Guidance, Implicit	Binary Assessment, Refinement, Critique, User Action	Segment	During Task, Post Task
UserBench: An Interactive Gym Environment for User-Centric Agents	2025/07	Implicit, Guidance	User Action, Refinement	Segment	Initial Setup, During Task
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	2025/07	Corrective, Guidance	Refinement, Critique	Segment	During Task
τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment	2025/06	Evaluative, Implicit	Binary Assessment, User Action	Segment, Holistic	Initial Setup, During Task
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training	2025/05	Corrective	Refinement	Segment	During Task
Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild	2025/05	Corrective, Guidance, Evaluative	Refinement, Binary Assessment, Demonstration	Segment, Holistic	During Task
XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision	2025/05	Guidance, Evaluative	Critique, Preference Ranking	Segment	Initial Setup, Post Task
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft	2025/04	Evaluative	Scalar Rating	Holistic	Post Task
Experimental Exploration: Investigating Cooperative Interaction Behavior Between Humans and Large Language Model Agents	2025/03	Implicit	User Action	Segment	During Task
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks	2025/03	Corrective, Implicit	Refinement, User Action	Segment	During Task
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	2025/03	Guidance	Demonstration	Segment	During Task
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	2025/02	Guidance	Critique	Holistic	During Task
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments	2025/02	Guidance	Demonstration, Critique	Segment	During Task
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation	2025/01	Corrective, Implicit	User Action, Refinement	Segment	During Task
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (Co-Gym)	2024/12	Corrective	Refinement	Segment	During Task
Towards Modeling Human-Agentic Collaborative Workflows: A BPMN Extension	2024/12	Guidance	Demonstration	Holistic	Initial Setup, During Task, Post Task
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions	2024/10	Implicit, Guidance	Demonstration, User Action	Holistic	Initial Setup, During Task
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks	2024/10	Corrective, Guidance	Refinement, Critique	Holistic, Segment	During Task, Post Task
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment	2024/09	Implicit	Human Control	Holistic	Initial Setup
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task	2024/09	Corrective	Refinement	Segment	During Task
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations	2024/08	Guidance	Demonstration	Segment	During Task
Human-LLM collaboration in generative design for customization	2024/07	Guidance, Evaluative	Demonstration, Binary Assessment, Preference Ranking	Holistic, Segment	Initial Setup, Post Task
WebCanvas: Benchmarking Web Agents in Online Environments	2024/06	Evaluative	Scalar Rating	Holistic	Post Task
Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models	2024/06	Corrective, Guidance	Demonstration, Refinement, Critique	Segment	Initial Setup, During Task
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration	2024/06	Corrective	Refinement	Holistic	During Task
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning	2024/06	Evaluative	Binary Assessment	Holistic	During Task, Post Task
REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity	2024/05	Implicit	Human Control	Segment	During Task
Autonomous Evaluation and Refinement of Digital Agents	2024/04	Evaluative	Binary Assessment	Holistic	Post Task
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples	2024/04	Corrective, Guidance	Demonstration, Refinement	Segment	During Task
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration	2024/04	Guidance, Corrective	Demonstration, Refinement	Segment	Initial Setup
PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs	2024/04	Corrective, Guidance	Demonstration, Refinement	Segment	During Task
Embodied LLM Agents Learn to Cooperate in Organized Teams	2024/03	Guidance	Critique	Holistic	During Task
Large Language Model-based Human-Agent Collaboration for Complex Task Solving	2024/02	Implicit	Human Control	Segment	During Task
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	2024/02	Guidance	Demonstration	Holistic, Segment	During Task
Ask-before-Plan: Proactive Language Agents for Real-World Planning	2024/01	Guidance	Demonstration, Critique	Segment	During Task
A2C: A Modular Multi-stage Collaborative Decision Framework for Human-AI Teams	2024/01	Guidance, Implicit, Corrective, Evaluative	Refinement, Binary Assessment, Critique, Human Control	Holistic, Segment	During Task
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination	2023/12	Guidance, Corrective	Demonstration, Refinement	Segment	During Task
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents	2023/10	Evaluative, Implicit	Scalar Rating, User Action	Holistic, Segment	During Task, Post Task
MindAgent: Emergent Gaming Interaction	2023/09	Corrective	Refinement	Segment	During Task
Drive As You Speak: Enabling Human-Like Interaction With Large Language Models in Autonomous Vehicles	2023/09	Guidance	Demonstration	Segment	During Task
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback	2023/09	Evaluative	Binary Assessment	Holistic	During Task
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework	2023/08	Evaluative, Guidance	Binary Assessment	Holistic	Initial Setup, Post Task
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks	2023/08	Corrective, Guidance	Demonstration, Refinement	Segment	During Task
Embodied Task Planning with Large Language Models	2023/07	Guidance, Evaluative	Demonstration, Binary Assessment	Holistic, Segment	Initial Setup, Post Task
Building cooperative embodied agents modularly with large language models	2023/07	Evaluative	Scalar Rating	Holistic	Post Task
Improved Trust in Human-Robot Collaboration With ChatGPT	2023/06	Guidance	Demonstration, Critique	Segment	During Task
Investigating Agency of LLMs in Human-AI Collaboration Tasks	2023/05	Guidance	Demonstration, Critique	Segment	During Task
Improving grounded language understanding in a collaborative environment by interacting with agents through help feedback	2023/04	Corrective, Guidance	Demonstration, Refinement	Holistic	During Task
PaLM-E: An Embodied Multimodal Language Model	2023/03	Guidance, Implicit	Demonstration, User Action	Segment	During Task
Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts	2021/10	Corrective	Refinement	Segment	During Task

🔄 Interaction

Title	Date & Code	Interaction Types	Interaction Variant
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering	2025/12	Collaboration	Supervision, Cooperation
HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding	2025/11	Collaboration	Delegation, Cooperation
Training Proactive and Personalized LLM Agents	2025/11	Collaboration	Cooperation
Training LLM Agents to Empower Humans	2025/10	Collaboration	Cooperation
How can we assess human-agent interactions? Case studies in software agent design	2025/10	Collaboration	Delegation, Supervision
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback	2025/10	Collaboration	Supervision, Cooperation
UserRL: Training Proactive User-Centric Agent via Reinforcement Learning	2025/09	Collaboration	Supervision, Cooperation
MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use	2025/08	Collaboration	Delegation
Magentic-UI: Towards Human-in-the-loop Agentic Systems	2025/07	Collaboration	Cooperation, Coordination
UserBench: An Interactive Gym Environment for User-Centric Agents	2025/07	Collaboration	Cooperation
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	2025/07	Collaboration	Supervision
τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment	2025/06	Collaboration	Cooperation, Coordination
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training	2025/05	Collaboration	Cooperation
Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild	2025/05	Collaboration	Supervision, Cooperation
XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision	2025/05	Collaboration	Delegation, Supervision
MineWorld: A Real-Time and Open-Source Interactive World Model on Minecraft	2025/04	Collaboration	Delegation
Experimental Exploration: Investigating Cooperative Interaction Behavior Between Humans and Large Language Model Agents	2025/03	Coopetition	-
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	2025/03	Collaboration	Delegation
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks	2025/03	Collaboration	Delegation
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments	2025/02	Collaboration	Supervision, Delegation
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	2025/02	Collaboration	-
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation	2025/01	Collaboration	Supervision, Delegation, Coordination
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (Co-Gym)	2024/12	Collaboration	Supervision, Delegation
Towards Modeling Human-Agentic Collaborative Workflows: A BPMN Extension	2024/12	Collaboration	Coordination
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions	2024/10	Collaboration	Coordination
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-Agent Tasks	2024/10	Collaboration	Coordination, Cooperation
AssistantX: An Proactive Assistant in Collaborative Human-Populated Environment	2024/09	Collaboration	Delegation
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task	2024/09	Collaboration	Coordination
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations	2024/08	Collaboration	Delegation
Human-LLM Collaboration in Generative Design for Customization	2024/07	Collaboration	Delegation
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning	2024/06	Collaboration	Delegation
Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models	2024/06	Collaboration	Delegation
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration	2024/06	Collaboration	Delegation
WebCanvas: Benchmarking Web Agents in Online Environments	2024/06	Collaboration	Delegation
REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity	2024/05	Collaboration	Delegation
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples	2024/04	Collaboration	Delegation, Supervision
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration	2024/04	Collaboration	Coordination
An LLM-based approach for Enabling Seamless Human-Robot Collaboration in Assembly	2024/04	Collaboration	Delegation
Autonomous Evaluation and Refinement of Digital Agents	2024/04	Collaboration	Delegation
PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs	2024/04	Collaboration	Delegation
Embodied LLM Agents Learn to Cooperate in Organized Teams	2024/03	Collaboration	Delegation
Large Language Model-based Human-Agent Collaboration for Complex Task Solving	2024/02	Collaboration	Delegation
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	2024/02	Collaboration	Delegation
A2C: A Modular Multi-stage Collaborative Decision Framework for Human-AI Teams	2024/01	Collaboration	Coordination
Ask-before-Plan: Proactive Language Agents for Real-World Planning	2024/01	Collaboration	Coordination
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination	2023/12	Collaboration	Supervision, Delegation
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents	2023/10	Collaboration, Competition, Coopetition	Coordination
Drive As You Speak: Enabling Human-Like Interaction With Large Language Models in Autonomous Vehicles	2023/09	Collaboration	Delegation
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback	2023/09	Collaboration	Delegation
MindAgent: Emergent Gaming Interaction	2023/09	Collaboration	Coordination
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks	2023/08	Collaboration	Supervision, Delegation
MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework	2023/08	Collaboration	Coordination
Building Cooperative Embodied Agents Modularly with Large Language Models	2023/07	Collaboration	Cooperation
Embodied Task Planning with Large Language Models	2023/07	Collaboration	Delegation
Improved Trust in Human-Robot Collaboration With ChatGPT	2023/06	Collaboration	Delegation
Investigating Agency of LLMs in Human-AI Collaboration Tasks	2023/05	Collaboration	Cooperation, Delegation
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents through Help Feedback	2023/04	Collaboration	Delegation
PaLM-E: An Embodied Multimodal Language Model	2023/03	Collaboration	Delegation
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts	2021/10	Collaboration	Delegation

🎛️ Orchestration

Title	Date & Code	Orchestration Strategy	Orchestration Synchronization
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering	2025/12	One-by-One	Synchronous
HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding	2025/11	Simultaneous	Synchronous
Training Proactive and Personalized LLM Agents	2025/11	One-by-One	Synchronous
Training LLM Agents to Empower Humans	2025/10	One-by-One	Synchronous
How can we assess human-agent interactions? Case studies in software agent design	2025/10	One-by-One	Synchronous
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback	2025/10	One-by-One	Synchronous
UserRL: Training Proactive User-Centric Agent via Reinforcement Learning	2025/09	One-by-One	Synchronous
MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use	2025/08	One-by-One	Synchronous
Magentic-UI: Towards Human-in-the-loop Agentic Systems	2025/07	Simultaneous	Synchronous
UserBench: An Interactive Gym Environment for User-Centric Agents	2025/07	One-by-One	Synchronous
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	2025/07	One-by-One	Synchronous
τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment	2025/06	One-by-One	Synchronous
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training	2025/05	One-by-One	Synchronous
Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild	2025/05	One-by-One	Synchronous
XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision	2025/05	One-by-One	Synchronous
MineWorld: A Real-Time and Open-Source Interactive World Model on Minecraft	2025/04	One-by-One	Synchronous
Experimental Exploration: Investigating Cooperative Interaction Behavior Between Humans and Large Language Model Agents	2025/03	One-by-One	Synchronous
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	2025/03	One-by-One	Synchronous
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks	2025/03	One-by-One	Synchronous
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments	2025/02	One-by-One	Asynchronous
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	2025/02	Simultaneous	Asynchronous
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation	2025/01	One-by-One	Synchronous
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (Co-Gym)	2024/12	One-by-One	Asynchronous
Towards Modeling Human-Agentic Collaborative Workflows: A BPMN Extension	2024/12	One-by-One	Asynchronous
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions	2024/10	One-by-One	Synchronous
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-Agent Tasks	2024/10	One-by-One	Asynchronous
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment	2024/09	One-by-One	Asynchronous
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task	2024/09	Simultaneous	Synchronous
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations	2024/08	One-by-One	Synchronous
Human-LLM Collaboration in Generative Design for Customization	2024/07	One-by-One	Synchronous
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning	2024/06	One-by-One	Asynchronous
Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models	2024/06	One-by-One	Synchronous
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration	2024/06	One-by-One	Synchronous
WebCanvas: Benchmarking Web Agents in Online Environments	2024/06	One-by-One	Synchronous
REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity	2024/05	One-by-One	Asynchronous
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples	2024/04	One-by-One	Synchronous
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration	2024/04	One-by-One	Synchronous
An LLM-based approach for Enabling Seamless Human-Robot Collaboration in Assembly	2024/04	One-by-One	Synchronous
Autonomous Evaluation and Refinement of Digital Agents	2024/04	One-by-One	Asynchronous
PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs	2024/04	One-by-One	Synchronous
Embodied LLM Agents Learn to Cooperate in Organized Teams	2024/03	One-by-One	Synchronous
Large Language Model-based Human-Agent Collaboration for Complex Task Solving	2024/02	One-by-One	Synchronous
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	2024/02	One-by-One	Synchronous
A2C: A Modular Multi-stage Collaborative Decision Framework for Human-AI Teams	2024/01	One-by-One	Asynchronous
Ask-before-Plan: Proactive Language Agents for Real-World Planning	2024/01	One-by-One	Synchronous
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination	2023/12	Simultaneous	Synchronous
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents	2023/10	One-by-One	Synchronous
Drive As You Speak: Enabling Human-Like Interaction With Large Language Models in Autonomous Vehicles	2023/09	One-by-One	Synchronous
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback	2023/09	One-by-One	Synchronous
MindAgent: Emergent Gaming Interaction	2023/09	One-by-One	Synchronous
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks	2023/08	One-by-One	Synchronous
MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework	2023/08	One-by-One	Asynchronous
Building Cooperative Embodied Agents Modularly with Large Language Models	2023/07	Simultaneous	Synchronous
Embodied Task Planning with Large Language Models	2023/07	One-by-One	Asynchronous
Improved Trust in Human-Robot Collaboration With ChatGPT	2023/06	One-by-One	Synchronous
Investigating Agency of LLMs in Human-AI Collaboration Tasks	2023/05	One-by-One	Synchronous
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents through Help Feedback	2023/04	One-by-One	Synchronous
PaLM-E: An Embodied Multimodal Language Model	2023/03	One-by-One	Synchronous
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts	2021/10	One-by-One	Synchronous

💬 Communication

Title	Date & Code	Communication Structure	Communication Mode
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering	2025/12	Hierarchical	Conversation
HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding	2025/11	Centralized	Conversation, Observation
Training Proactive and Personalized LLM Agents	2025/11	Decentralized	Conversation
Training LLM Agents to Empower Humans	2025/10	Hierarchical	Conversation, Observation
How can we assess human-agent interactions? Case studies in software agent design	2025/10	Hierarchical	Conversation
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback	2025/10	Hierarchical	Conversation
UserRL: Training Proactive User-Centric Agent via Reinforcement Learning	2025/09	Hierarchical	Conversation
MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use	2025/08	Hierarchical	Conversation
Magentic-UI: Towards Human-in-the-loop Agentic Systems	2025/07	Hierarchical, Centralized	Conversation, Observation
UserBench: An Interactive Gym Environment for User-Centric Agents	2025/07	Decentralized	Conversation
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	2025/07	Hierarchical	Conversation
τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment	2025/06	Decentralized	Conversation, Observation
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training	2025/05	Decentralized	Conversation
Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild	2025/05	Centralized	Conversation
XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision	2025/05	Centralized	Conversation
MineWorld: A Real-Time and Open-Source Interactive World Model on Minecraft	2025/04	Decentralized	Conversation
Experimental Exploration: Investigating Cooperative Interaction Behavior Between Humans and Large Language Model Agents	2025/03	Decentralized	Conversation
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	2025/03	Hierarchical	Conversation
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks	2025/03	Decentralized	Conversation
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments	2025/02	Decentralized	Conversation
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration	2025/02	Decentralized	Observation
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation	2025/01	Decentralized	Conversation
Towards Modeling Human-Agentic Collaborative Workflows: A BPMN Extension	2024/12	Decentralized	Conversation
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (Co-Gym)	2024/12	Decentralized	Conversation
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions	2024/10	Decentralized	Observation
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-Agent Tasks	2024/10	Decentralized	Conversation
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment	2024/09	Decentralized	Message Pool
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task	2024/09	Decentralized	Conversation
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations	2024/08	Decentralized	Conversation
Human-LLM Collaboration in Generative Design for Customization	2024/07	Decentralized	Conversation
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning	2024/06	Decentralized	Conversation
Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models	2024/06	Decentralized	Conversation
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration	2024/06	Decentralized	Conversation
WebCanvas: Benchmarking Web Agents in Online Environments	2024/06	Decentralized	Conversation
REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity	2024/05	Hierarchical	Observation
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples	2024/04	Decentralized	Conversation
AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration	2024/04	Decentralized	Conversation
An LLM-based approach for Enabling Seamless Human-Robot Collaboration in Assembly	2024/05	Centralized	Conversation
Autonomous Evaluation and Refinement of Digital Agents	2024/04	Decentralized	Conversation
PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs	2024/04	Decentralized	Conversation
Embodied LLM Agents Learn to Cooperate in Organized Teams	2024/03	Decentralized	Conversation
Large Language Model-based Human-Agent Collaboration for Complex Task Solving	2024/02	Decentralized	Conversation
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue	2024/02	Decentralized	Conversation
A2C: A Modular Multi-stage Collaborative Decision Framework for Human-AI Teams	2024/01	Decentralized	Conversation
Ask-before-Plan: Proactive Language Agents for Real-World Planning	2024/01	Centralized	Conversation
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination	2023/12	Hierarchical	Conversation
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents	2023/10	Centralized	Conversation
Drive As You Speak: Enabling Human-Like Interaction With Large Language Models in Autonomous Vehicles	2023/09	Centralized	Conversation
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback	2023/09	Decentralized	Conversation
MindAgent: Emergent Gaming Interaction	2023/09	Centralized	Conversation
LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks	2023/08	Decentralized	Conversation
MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework	2023/08	Decentralized	Message Pool
Building Cooperative Embodied Agents Modularly with Large Language Models	2023/07	Decentralized	Conversation
Embodied Task Planning with Large Language Models	2023/07	Decentralized	Conversation
Improved Trust in Human-Robot Collaboration With ChatGPT	2023/06	Decentralized	Conversation
Investigating Agency of LLMs in Human-AI Collaboration Tasks	2023/05	Decentralized	Conversation
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents through Help Feedback	2023/04	Decentralized	Conversation
PaLM-E: An Embodied Multimodal Language Model	2023/03	Decentralized	Conversation
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts	2021/10	Hierarchical	Conversation

📌 Contributing

Contributions are welcome! If you have relevant papers, code, or insights, feel free to submit a request 🤗.

📝 Citation

If you find this repository useful, please consider citing our papers 💕:

@misc{zou2025llmbasedhumanagentcollaborationinteraction,
      title={LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey}, 
      author={Henry Peng Zou and Wei-Chieh Huang and Yaozu Wu and Yankai Chen and Chunyu Miao and Hoang Nguyen and Yue Zhou and Weizhi Zhang and Liancheng Fang and Langzhou He and Yangning Li and Dongyuan Li and Renhe Jiang and Xue Liu and Philip S. Yu},
      year={2025},
      eprint={2505.00753},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.00753}, 
}

@misc{zou2025collaborativeintelligencehumanagentsystems,
      title={A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy}, 
      author={Henry Peng Zou and Wei-Chieh Huang and Yaozu Wu and Chunyu Miao and Dongyuan Li and Aiwei Liu and Yue Zhou and Yankai Chen and Weizhi Zhang and Yangning Li and Liancheng Fang and Renhe Jiang and Philip S. Yu},
      year={2025},
      eprint={2506.09420},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2506.09420}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome LLM-Based Human-Agent Collaboration and Interaction Systems

🌟 Introduction

📄 Contents

📄 Latest Research Papers