Note
Go to the end to download the full example code.
Agent¶
In this tutorial, we first focus on introducing the ReAct agent in AgentScope, then we briefly introduce how to customize your own agent from scratch.
ReAct Agent¶
In AgentScope, the ReActAgent class integrates various features into a final implementation, including
Feature |
Reference |
|---|---|
Support realtime steering |
|
Support memory compression |
|
Support parallel tool calls |
|
Support structured output |
|
Support fine-grained MCP control |
|
Support agent-controlled tools management (Meta tool) |
|
Support self-controlled long-term memory |
|
Support automatic state management |
Due to limited space, in this tutorial we only demonstrate the first three
features of ReActAgent class, leaving the others to the corresponding sections
listed above.
import asyncio
import json
import os
from datetime import datetime
import time
from pydantic import BaseModel, Field
from agentscope.agent import ReActAgent
from agentscope.formatter import DashScopeChatFormatter
from agentscope.memory import InMemoryMemory
from agentscope.message import TextBlock, Msg
from agentscope.model import DashScopeChatModel
from agentscope.tool import Toolkit, ToolResponse
Realtime Steering¶
The realtime steering allows user to interrupt the agent’s reply at any time, which is implemented based on the asyncio cancellation mechanism.
Specifically, when calling the interrupt method of the agent, it will
cancel the current reply task, and execute the handle_interrupt method
for postprocessing.
Hint
With the feature of supporting streaming tool results in
Tool, users can interrupt the tool execution if it takes too long or
deviates from user expectations by Ctrl+C in the terminal or calling the
interrupt method of the agent in your code.
The interruption logic has been implemented in the AgentBase class as a
basic feature, leaving a handle_interrupt method for users to customize the
post-processing of interruption as follows:
# code snippet of AgentBase
class AgentBase:
...
async def __call__(self, *args: Any, **kwargs: Any) -> Msg:
...
reply_msg: Msg | None = None
try:
self._reply_task = asyncio.current_task()
reply_msg = await self.reply(*args, **kwargs)
except asyncio.CancelledError:
# Catch the interruption and handle it by the handle_interrupt method
reply_msg = await self.handle_interrupt(*args, **kwargs)
...
@abstractmethod
async def handle_interrupt(self, *args: Any, **kwargs: Any) -> Msg:
pass
In ReActAgent class, we return a fixed message “I noticed that you have
interrupted me. What can I do for you?” as follows:
Example of interruption¶
You can override it with your own implementation, for example, calling the LLM to generate a simple response to the interruption.
Memory Compression¶
As conversations grow longer, the token count in memory can exceed model context
limits or slow down inference. ReActAgent provides an automatic memory compression
feature to address this issue.
Basic Usage
To enable memory compression, provide a CompressionConfig instance when initializing
the ReActAgent:
from agentscope.agent import ReActAgent
from agentscope.token import CharTokenCounter
agent = ReActAgent(
name="Assistant",
sys_prompt="You are a helpful assistant.",
model=model,
formatter=formatter,
compression_config=ReActAgent.CompressionConfig(
enable=True,
agent_token_counter=CharTokenCounter(), # The token counter for the agent
trigger_threshold=10000, # Trigger compression when exceeding 10000 tokens
keep_recent=3, # Keep the most recent 3 messages uncompressed
),
)
When memory compression is enabled, the agent monitors the token count in its memory.
Once it exceeds the trigger_threshold, the agent automatically:
Identifies messages that haven’t been compressed yet (via
exclude_mark)Keeps the most recent
keep_recentmessages uncompressed (to preserve recent context)Sends older messages to an LLM to generate a structured summary
Marks the compressed messages with
MemoryMark.COMPRESSED(viaupdate_messages_mark)Stores the summary in memory (via
update_compressed_summary)
Important
The compression uses a marking mechanism rather than replacing messages. Old messages are marked as compressed and excluded from future retrievals via exclude_mark=MemoryMark.COMPRESSED, while the generated summary is stored separately and retrieved when needed. This approach preserves the original messages and allows flexible memory management. For more details about the mark functionality, please refer to Memory.
By default, the compressed summary is structured into five key fields:
task_overview: The user’s core request and success criteria
current_state: What has been completed so far, including files and outputs
important_discoveries: Technical constraints, decisions, errors, and failed approaches
next_steps: Specific actions needed to complete the task
context_to_preserve: User preferences, domain details, and promises made
Customizing Compression
You can customize how compression works by specifying summary_schema,
summary_template, and compression_prompt parameters.
compression_prompt: Guides the LLM on how to generate the summary
summary_schema: Defines the structure of the compressed summary using a Pydantic model
summary_template: Formats how the compressed summary is presented back to the agent
Here’s an example of customizing the compression:
from pydantic import BaseModel, Field
# Define a custom summary structure
class CustomSummary(BaseModel):
main_topic: str = Field(
max_length=200,
description="The main topic of the conversation"
)
key_points: str = Field(
max_length=400,
description="Important points discussed"
)
pending_tasks: str = Field(
max_length=200,
description="Tasks that remain to be done"
)
# Create agent with custom compression configuration
agent = ReActAgent(
name="Assistant",
sys_prompt="You are a helpful assistant.",
model=model,
formatter=formatter,
compression_config=ReActAgent.CompressionConfig(
enable=True,
agent_token_counter=CharTokenCounter(),
trigger_threshold=10000,
keep_recent=3,
# Custom schema for structured summary
summary_schema=CustomSummary,
# Custom prompt to guide compression
compression_prompt=(
"<system-hint>Please summarize the above conversation "
"focusing on the main topic, key discussion points, "
"and any pending tasks.</system-hint>"
),
# Custom template to format the summary
summary_template=(
"<system-info>Conversation Summary:\n"
"Main Topic: {main_topic}\n\n"
"Key Points:\n{key_points}\n\n"
"Pending Tasks:\n{pending_tasks}"
"</system-info>"
),
),
)
The summary_template uses the fields defined in summary_schema as placeholders
(e.g., {main_topic}, {key_points}). After the LLM generates the structured summary,
these placeholders will be replaced with the actual values.
Note
The agent ensures that tool use and tool result pairs are kept together during compression to maintain the integrity of the conversation flow.
Tip
You can use a smaller, faster model for compression by specifying a different compression_model and compression_formatter to reduce costs and latency.
Parallel Tool Calls¶
ReActAgent supports parallel tool calls by providing a parallel_tool_calls
argument in its constructor.
When multiple tool calls are generated, and parallel_tool_calls is set to True,
they will be executed in parallel by the asyncio.gather function.
Note
The parallel tool execution in ReActAgent is implemented based on asyncio.gather. Therefore, to maximize the effect of parallel tool execution, both the tool function itself and the logic within it must be asynchronous.
Note
When running, please ensure that parallel tool calling is supported at the model level and the corresponding parameters are set correctly (can be passed through generate_kwargs). For example, for the DashScope API, you need to set parallel_tool_calls to True, otherwise parallel tool calling will not be possible.
# prepare a tool function
async def example_tool_function(tag: str) -> ToolResponse:
"""A sample example tool function"""
start_time = datetime.now().strftime("%H:%M:%S.%f")
# Sleep for 3 seconds to simulate a long-running task
await asyncio.sleep(3)
end_time = datetime.now().strftime("%H:%M:%S.%f")
return ToolResponse(
content=[
TextBlock(
type="text",
text=f"Tag {tag} started at {start_time} and ended at {end_time}. ",
),
],
)
toolkit = Toolkit()
toolkit.register_tool_function(example_tool_function)
# Create an ReAct agent
agent = ReActAgent(
name="Jarvis",
sys_prompt="You're a helpful assistant named Jarvis.",
model=DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
# Preset the generation kwargs to enable parallel tool calls
generate_kwargs={
"parallel_tool_calls": True,
},
),
memory=InMemoryMemory(),
formatter=DashScopeChatFormatter(),
toolkit=toolkit,
parallel_tool_calls=True,
)
async def example_parallel_tool_calls() -> None:
"""Example of parallel tool calls"""
# prompt the agent to generate two tool calls at once
await agent(
Msg(
"user",
"Generate two tool calls of the 'example_tool_function' function with tag as 'tag1' and 'tag2' AT ONCE so that they can execute in parallel.",
"user",
),
)
asyncio.run(example_parallel_tool_calls())
Jarvis: {
"type": "tool_use",
"id": "call_a8aef8600668439fabd6d8",
"name": "example_tool_function",
"input": {
"tag": "tag1"
}
}
Jarvis: {
"type": "tool_use",
"id": "call_bb1829d12995476aaa88b2",
"name": "example_tool_function",
"input": {
"tag": "tag2"
}
}
system: {
"type": "tool_result",
"id": "call_a8aef8600668439fabd6d8",
"name": "example_tool_function",
"output": [
{
"type": "text",
"text": "Tag tag1 started at 12:09:08.595958 and ended at 12:09:11.599223. "
}
]
}
system: {
"type": "tool_result",
"id": "call_bb1829d12995476aaa88b2",
"name": "example_tool_function",
"output": [
{
"type": "text",
"text": "Tag tag2 started at 12:09:08.596052 and ended at 12:09:11.600030. "
}
]
}
Jarvis: The 'example_tool_function' has been called with two tags 'tag1' and 'tag2' in parallel. Here are the results:
- For 'tag1', the function started at 12:09:08.595958 and ended at 12:09:11.599223.
- For 'tag2', the function started at 12:09:08.596052 and ended at 12:09:11.600030.
Structured Output¶
To generate a structured output, the ReActAgent instance receives a child class
of the pydantic.BaseModel as the structured_model argument in its __call__ function.
Then we can get the structured output from the metadata field of the returned message.
Taking introducing Einstein as an example:
# Create an ReAct agent
agent = ReActAgent(
name="Jarvis",
sys_prompt="You're a helpful assistant named Jarvis.",
model=DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
# Preset the generation kwargs to enable parallel tool calls
generate_kwargs={
"parallel_tool_calls": True,
},
),
memory=InMemoryMemory(),
formatter=DashScopeChatFormatter(),
toolkit=Toolkit(),
parallel_tool_calls=True,
)
# The structured model
class Model(BaseModel):
name: str = Field(description="The name of the person")
description: str = Field(
description="A one-sentence description of the person",
)
age: int = Field(description="The age")
honor: list[str] = Field(description="A list of honors of the person")
async def example_structured_output() -> None:
"""The example structured output"""
res = await agent(
Msg(
"user",
"Introduce Einstein",
"user",
),
structured_model=Model,
)
print("\nThe structured output:")
print(json.dumps(res.metadata, indent=4))
asyncio.run(example_structured_output())
/home/runner/work/agentscope/agentscope/src/agentscope/model/_dashscope_model.py:182: DeprecationWarning: 'required' is not supported by DashScope API. It will be converted to 'auto'.
warnings.warn(
Jarvis: {
"type": "tool_use",
"id": "call_1285992fb386489c921057",
"name": "generate_response",
"input": {
"name": "Albert Einstein",
"description": "A renowned physicist who developed the theory of relativity",
"age": 76,
"honor": [
"Nobel Prize in Physics (1921)",
"Copley Medal (1925)",
"Max Planck Medal (1929)"
]
}
}
system: {
"type": "tool_result",
"id": "call_1285992fb386489c921057",
"name": "generate_response",
"output": [
{
"type": "text",
"text": "Successfully generated response."
}
]
}
Jarvis: Allow me to introduce Albert Einstein, a renowned physicist who developed the theory of relativity. Born in 1879, Einstein became one of the most influential scientists of the 20th century. He was awarded the Nobel Prize in Physics in 1921, primarily for his work on the photoelectric effect, and not directly for his theories of relativity as some might assume. Throughout his illustrious career, he also received the Copley Medal in 1925 and the Max Planck Medal in 1929, among many other honors. Einstein's contributions to science have left an indelible mark, and he passed away in 1955 at the age of 76, leaving behind a legacy that continues to inspire and challenge new generations of thinkers and scientists.
The structured output:
{
"name": "Albert Einstein",
"description": "A renowned physicist who developed the theory of relativity",
"age": 76,
"honor": [
"Nobel Prize in Physics (1921)",
"Copley Medal (1925)",
"Max Planck Medal (1929)"
]
}
Customizing Agent¶
AgentScope provides two base classes, AgentBase and ReActAgentBase, which
differ in the abstract methods they define and the hooks they support.
Specifically, the ReActAgentBase extends AgentBase with additional _reasoning and _acting
abstract methods, as well as their pre- and post- hooks.
Developers can choose to inherit from either of these base classes based on their needs.
We summarize the agent under agentscope.agent module as follows:
Class |
Abstract Method |
Support Hooks |
Description |
|---|---|---|---|
|
replyobserveprinthandle_interrupt |
pre_/post_reply
pre_/post_observe
pre_/post_print
|
The base class for all agents, providing the basic interface and hooks. |
|
replyobserveprinthandle_interrupt_reasoning_acting |
pre_/post_reply
pre_/post_observe
pre_/post_print
pre_/post_reasoning
pre_/post_acting
|
The abstract class for ReAct agent, extending |
|
- |
pre_/post_reply
pre_/post_observe
pre_/post_print
pre_/post_reasoning
pre_/post_acting
|
An implementation of |
|
A special agent that represents the user, used to interact with the agent |
||
|
- |
pre_/post_reply
pre_/post_observe
pre_/post_print
|
Agent for communicating with remote A2A agents, see A2A Agent |
Further Reading¶
Total running time of the script: (0 minutes 19.777 seconds)