Note

Go to the end to download the full example code.

Agent¶

In this tutorial, we first focus on introducing the ReAct agent in AgentScope, then we briefly introduce how to customize your own agent from scratch.

ReAct Agent¶

In AgentScope, the ReActAgent class integrates various features into a final implementation, including

Features of `ReActAgent`¶
Feature	Reference
Support realtime steering
Support memory compression
Support parallel tool calls
Support structured output
Support fine-grained MCP control	MCP
Support agent-controlled tools management (Meta tool)	Tool
Support self-controlled long-term memory	Long-Term Memory
Support automatic state management	State/Session Management

Due to limited space, in this tutorial we only demonstrate the first three features of ReActAgent class, leaving the others to the corresponding sections listed above.

import asyncio
import json
import os
from datetime import datetime
import time

from pydantic import BaseModel, Field

from agentscope.agent import ReActAgent
from agentscope.formatter import DashScopeChatFormatter
from agentscope.memory import InMemoryMemory
from agentscope.message import TextBlock, Msg
from agentscope.model import DashScopeChatModel
from agentscope.tool import Toolkit, ToolResponse

Realtime Steering¶

The realtime steering allows user to interrupt the agent’s reply at any time, which is implemented based on the asyncio cancellation mechanism.

Specifically, when calling the interrupt method of the agent, it will cancel the current reply task, and execute the handle_interrupt method for postprocessing.

Hint

With the feature of supporting streaming tool results in Tool, users can interrupt the tool execution if it takes too long or deviates from user expectations by Ctrl+C in the terminal or calling the interrupt method of the agent in your code.

The interruption logic has been implemented in the AgentBase class as a basic feature, leaving a handle_interrupt method for users to customize the post-processing of interruption as follows:

# code snippet of AgentBase
class AgentBase:
    ...
    async def __call__(self, *args: Any, **kwargs: Any) -> Msg:
        ...
        reply_msg: Msg | None = None
        try:
            self._reply_task = asyncio.current_task()
            reply_msg = await self.reply(*args, **kwargs)

        except asyncio.CancelledError:
            # Catch the interruption and handle it by the handle_interrupt method
            reply_msg = await self.handle_interrupt(*args, **kwargs)

        ...

    @abstractmethod
    async def handle_interrupt(self, *args: Any, **kwargs: Any) -> Msg:
        pass

In ReActAgent class, we return a fixed message “I noticed that you have interrupted me. What can I do for you?” as follows:

You can override it with your own implementation, for example, calling the LLM to generate a simple response to the interruption.

Memory Compression¶

As conversations grow longer, the token count in memory can exceed model context limits or slow down inference. ReActAgent provides an automatic memory compression feature to address this issue.

Basic Usage

To enable memory compression, provide a CompressionConfig instance when initializing the ReActAgent:

from agentscope.agent import ReActAgent
from agentscope.token import CharTokenCounter

agent = ReActAgent(
    name="Assistant",
    sys_prompt="You are a helpful assistant.",
    model=model,
    formatter=formatter,
    compression_config=ReActAgent.CompressionConfig(
        enable=True,
        agent_token_counter=CharTokenCounter(),  # The token counter for the agent
        trigger_threshold=10000,  # Trigger compression when exceeding 10000 tokens
        keep_recent=3,            # Keep the most recent 3 messages uncompressed
    ),
)

When memory compression is enabled, the agent monitors the token count in its memory. Once it exceeds the trigger_threshold, the agent automatically:

Identifies messages that haven’t been compressed yet (via exclude_mark)
Keeps the most recent keep_recent messages uncompressed (to preserve recent context)
Sends older messages to an LLM to generate a structured summary
Marks the compressed messages with MemoryMark.COMPRESSED (via update_messages_mark)
Stores the summary in memory (via update_compressed_summary)

Important

The compression uses a marking mechanism rather than replacing messages. Old messages are marked as compressed and excluded from future retrievals via exclude_mark=MemoryMark.COMPRESSED, while the generated summary is stored separately and retrieved when needed. This approach preserves the original messages and allows flexible memory management. For more details about the mark functionality, please refer to Memory.

By default, the compressed summary is structured into five key fields:

task_overview: The user’s core request and success criteria
current_state: What has been completed so far, including files and outputs
important_discoveries: Technical constraints, decisions, errors, and failed approaches
next_steps: Specific actions needed to complete the task
context_to_preserve: User preferences, domain details, and promises made

Customizing Compression

You can customize how compression works by specifying summary_schema, summary_template, and compression_prompt parameters.

compression_prompt: Guides the LLM on how to generate the summary
summary_schema: Defines the structure of the compressed summary using a Pydantic model
summary_template: Formats how the compressed summary is presented back to the agent

Here’s an example of customizing the compression:

from pydantic import BaseModel, Field

# Define a custom summary structure
class CustomSummary(BaseModel):
    main_topic: str = Field(
        max_length=200,
        description="The main topic of the conversation"
    )
    key_points: str = Field(
        max_length=400,
        description="Important points discussed"
    )
    pending_tasks: str = Field(
        max_length=200,
        description="Tasks that remain to be done"
    )

# Create agent with custom compression configuration
agent = ReActAgent(
    name="Assistant",
    sys_prompt="You are a helpful assistant.",
    model=model,
    formatter=formatter,
    compression_config=ReActAgent.CompressionConfig(
        enable=True,
        agent_token_counter=CharTokenCounter(),
        trigger_threshold=10000,
        keep_recent=3,
        # Custom schema for structured summary
        summary_schema=CustomSummary,
        # Custom prompt to guide compression
        compression_prompt=(
            "<system-hint>Please summarize the above conversation "
            "focusing on the main topic, key discussion points, "
            "and any pending tasks.</system-hint>"
        ),
        # Custom template to format the summary
        summary_template=(
            "<system-info>Conversation Summary:\n"
            "Main Topic: {main_topic}\n\n"
            "Key Points:\n{key_points}\n\n"
            "Pending Tasks:\n{pending_tasks}"
            "</system-info>"
        ),
    ),
)

The summary_template uses the fields defined in summary_schema as placeholders (e.g., {main_topic}, {key_points}). After the LLM generates the structured summary, these placeholders will be replaced with the actual values.

Note

The agent ensures that tool use and tool result pairs are kept together during compression to maintain the integrity of the conversation flow.

Tip

You can use a smaller, faster model for compression by specifying a different compression_model and compression_formatter to reduce costs and latency.

Parallel Tool Calls¶

ReActAgent supports parallel tool calls by providing a parallel_tool_calls argument in its constructor. When multiple tool calls are generated, and parallel_tool_calls is set to True, they will be executed in parallel by the asyncio.gather function.

Note

The parallel tool execution in ReActAgent is implemented based on asyncio.gather. Therefore, to maximize the effect of parallel tool execution, both the tool function itself and the logic within it must be asynchronous.

Note

When running, please ensure that parallel tool calling is supported at the model level and the corresponding parameters are set correctly (can be passed through generate_kwargs). For example, for the DashScope API, you need to set parallel_tool_calls to True, otherwise parallel tool calling will not be possible.

# prepare a tool function
async def example_tool_function(tag: str) -> ToolResponse:
    """A sample example tool function"""
    start_time = datetime.now().strftime("%H:%M:%S.%f")

    # Sleep for 3 seconds to simulate a long-running task
    await asyncio.sleep(3)

    end_time = datetime.now().strftime("%H:%M:%S.%f")
    return ToolResponse(
        content=[
            TextBlock(
                type="text",
                text=f"Tag {tag} started at {start_time} and ended at {end_time}. ",
            ),
        ],
    )


toolkit = Toolkit()
toolkit.register_tool_function(example_tool_function)

# Create an ReAct agent
agent = ReActAgent(
    name="Jarvis",
    sys_prompt="You're a helpful assistant named Jarvis.",
    model=DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        # Preset the generation kwargs to enable parallel tool calls
        generate_kwargs={
            "parallel_tool_calls": True,
        },
    ),
    memory=InMemoryMemory(),
    formatter=DashScopeChatFormatter(),
    toolkit=toolkit,
    parallel_tool_calls=True,
)


async def example_parallel_tool_calls() -> None:
    """Example of parallel tool calls"""
    # prompt the agent to generate two tool calls at once
    await agent(
        Msg(
            "user",
            "Generate two tool calls of the 'example_tool_function' function with tag as 'tag1' and 'tag2' AT ONCE so that they can execute in parallel.",
            "user",
        ),
    )


asyncio.run(example_parallel_tool_calls())

Jarvis: {
    "type": "tool_use",
    "id": "call_a8aef8600668439fabd6d8",
    "name": "example_tool_function",
    "input": {
        "tag": "tag1"
    }
}
Jarvis: {
    "type": "tool_use",
    "id": "call_bb1829d12995476aaa88b2",
    "name": "example_tool_function",
    "input": {
        "tag": "tag2"
    }
}
system: {
    "type": "tool_result",
    "id": "call_a8aef8600668439fabd6d8",
    "name": "example_tool_function",
    "output": [
        {
            "type": "text",
            "text": "Tag tag1 started at 12:09:08.595958 and ended at 12:09:11.599223. "
        }
    ]
}
system: {
    "type": "tool_result",
    "id": "call_bb1829d12995476aaa88b2",
    "name": "example_tool_function",
    "output": [
        {
            "type": "text",
            "text": "Tag tag2 started at 12:09:08.596052 and ended at 12:09:11.600030. "
        }
    ]
}
Jarvis: The 'example_tool_function' has been called with two tags 'tag1' and 'tag2' in parallel. Here are the results:

- For 'tag1', the function started at 12:09:08.595958 and ended at 12:09:11.599223.
- For 'tag2', the function started at 12:09:08.596052 and ended at 12:09:11.600030.

Structured Output¶

To generate a structured output, the ReActAgent instance receives a child class of the pydantic.BaseModel as the structured_model argument in its __call__ function. Then we can get the structured output from the metadata field of the returned message.

Taking introducing Einstein as an example:

# Create an ReAct agent
agent = ReActAgent(
    name="Jarvis",
    sys_prompt="You're a helpful assistant named Jarvis.",
    model=DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        # Preset the generation kwargs to enable parallel tool calls
        generate_kwargs={
            "parallel_tool_calls": True,
        },
    ),
    memory=InMemoryMemory(),
    formatter=DashScopeChatFormatter(),
    toolkit=Toolkit(),
    parallel_tool_calls=True,
)


# The structured model
class Model(BaseModel):
    name: str = Field(description="The name of the person")
    description: str = Field(
        description="A one-sentence description of the person",
    )
    age: int = Field(description="The age")
    honor: list[str] = Field(description="A list of honors of the person")


async def example_structured_output() -> None:
    """The example structured output"""
    res = await agent(
        Msg(
            "user",
            "Introduce Einstein",
            "user",
        ),
        structured_model=Model,
    )
    print("\nThe structured output:")
    print(json.dumps(res.metadata, indent=4))


asyncio.run(example_structured_output())

/home/runner/work/agentscope/agentscope/src/agentscope/model/_dashscope_model.py:182: DeprecationWarning: 'required' is not supported by DashScope API. It will be converted to 'auto'.
  warnings.warn(
Jarvis: {
    "type": "tool_use",
    "id": "call_1285992fb386489c921057",
    "name": "generate_response",
    "input": {
        "name": "Albert Einstein",
        "description": "A renowned physicist who developed the theory of relativity",
        "age": 76,
        "honor": [
            "Nobel Prize in Physics (1921)",
            "Copley Medal (1925)",
            "Max Planck Medal (1929)"
        ]
    }
}
system: {
    "type": "tool_result",
    "id": "call_1285992fb386489c921057",
    "name": "generate_response",
    "output": [
        {
            "type": "text",
            "text": "Successfully generated response."
        }
    ]
}
Jarvis: Allow me to introduce Albert Einstein, a renowned physicist who developed the theory of relativity. Born in 1879, Einstein became one of the most influential scientists of the 20th century. He was awarded the Nobel Prize in Physics in 1921, primarily for his work on the photoelectric effect, and not directly for his theories of relativity as some might assume. Throughout his illustrious career, he also received the Copley Medal in 1925 and the Max Planck Medal in 1929, among many other honors. Einstein's contributions to science have left an indelible mark, and he passed away in 1955 at the age of 76, leaving behind a legacy that continues to inspire and challenge new generations of thinkers and scientists.

The structured output:
{
    "name": "Albert Einstein",
    "description": "A renowned physicist who developed the theory of relativity",
    "age": 76,
    "honor": [
        "Nobel Prize in Physics (1921)",
        "Copley Medal (1925)",
        "Max Planck Medal (1929)"
    ]
}

Customizing Agent¶

AgentScope provides two base classes, AgentBase and ReActAgentBase, which differ in the abstract methods they define and the hooks they support. Specifically, the ReActAgentBase extends AgentBase with additional _reasoning and _acting abstract methods, as well as their pre- and post- hooks.

Developers can choose to inherit from either of these base classes based on their needs. We summarize the agent under agentscope.agent module as follows:

Agent classes in AgentScope¶
Class	Abstract Method	Support Hooks	Description
`AgentBase`	`reply` `observe` `print` `handle_interrupt`	pre_/post_reply pre_/post_observe pre_/post_print	The base class for all agents, providing the basic interface and hooks.
`ReActAgentBase`	`reply` `observe` `print` `handle_interrupt` `_reasoning` `_acting`	pre_/post_reply pre_/post_observe pre_/post_print pre_/post_reasoning pre_/post_acting	The abstract class for ReAct agent, extending `AgentBase` with reasoning and acting abstract methods and their hooks.
`ReActAgent`	-	pre_/post_reply pre_/post_observe pre_/post_print pre_/post_reasoning pre_/post_acting	An implementation of `ReActAgentBase`
`UserAgent`			A special agent that represents the user, used to interact with the agent
`A2aAgent`	-	pre_/post_reply pre_/post_observe pre_/post_print	Agent for communicating with remote A2A agents, see A2A Agent

Agent¶

ReAct Agent¶

Realtime Steering¶

Memory Compression¶

Parallel Tool Calls¶

Structured Output¶

Customizing Agent¶

Further Reading¶