Problem
When using agent caching with custom tools (via the tools option in agent()), custom tool executions are not recorded in the cache and are skipped during replay. This causes agent replay to fail when custom tools perform essential actions like form filling with credentials.
Reproduction
- Create an agent with custom tools (e.g., for filling login credentials)
- Enable caching via
cacheDir option
- Execute the agent - custom tools work correctly
- Re-run the same instruction (cache hit) - custom tool calls are skipped
Root Cause
I traced through the source code and found two issues:
1. Custom tools are not recorded during execution
In v3CuaAgentHandler.ts, the custom_tool case just returns success without recording:
case "custom_tool": {
// Custom tools are handled by the agent client directly
return { success: true };
}
Compare this to other action types like goto, scroll, wait which all call this.v3.recordAgentReplayStep().
2. Custom tools are not replayed
In AgentCache.ts, the executeAgentReplayStep method's switch statement doesn't handle custom_tool:
switch (step.type) {
case "act": ...
case "fillForm": ...
case "goto": ...
// ... other cases ...
default:
this.logger({ message: `agent cache skipping step type: ${step.type}` });
return step; // Custom tools fall through here and are skipped
}
Proposed Solution
1. Add new type for custom tool steps
In types/private/cache.ts:
export interface AgentReplayCustomToolStep {
type: "custom_tool";
name: string;
arguments: Record<string, unknown>;
}
export type AgentReplayStep =
| AgentReplayActStep
// ... existing types ...
| AgentReplayCustomToolStep
| { type: string; [key: string]: unknown };
2. Record custom tool executions
In v3CuaAgentHandler.ts or where the action is processed after tool execution:
case "custom_tool": {
if (recording) {
this.v3.recordAgentReplayStep({
type: "custom_tool",
name: action.name as string,
arguments: action.arguments as Record<string, unknown>,
});
}
return { success: true };
}
3. Add replay support for custom tools
This requires passing the tools object to AgentCache so it can re-execute tools during replay:
private async replayAgentCustomToolStep(
step: AgentReplayCustomToolStep,
tools: ToolSet,
): Promise<void> {
const tool = tools[step.name];
if (tool) {
await tool.execute(step.arguments, { toolCallId: `replay_${Date.now()}`, messages: [] });
}
}
Use Case
I'm building a system that uses custom tools for filling forms with runtime credentials. Without custom tool caching, the cache records button clicks but skips the credential filling, causing login failures during replay.
Willingness to Contribute
I'm happy to submit a PR implementing this fix if the approach looks reasonable to the maintainers.
Problem
When using agent caching with custom tools (via the
toolsoption inagent()), custom tool executions are not recorded in the cache and are skipped during replay. This causes agent replay to fail when custom tools perform essential actions like form filling with credentials.Reproduction
cacheDiroptionRoot Cause
I traced through the source code and found two issues:
1. Custom tools are not recorded during execution
In
v3CuaAgentHandler.ts, thecustom_toolcase just returns success without recording:Compare this to other action types like
goto,scroll,waitwhich all callthis.v3.recordAgentReplayStep().2. Custom tools are not replayed
In
AgentCache.ts, theexecuteAgentReplayStepmethod's switch statement doesn't handlecustom_tool:Proposed Solution
1. Add new type for custom tool steps
In
types/private/cache.ts:2. Record custom tool executions
In
v3CuaAgentHandler.tsor where the action is processed after tool execution:3. Add replay support for custom tools
This requires passing the
toolsobject toAgentCacheso it can re-execute tools during replay:Use Case
I'm building a system that uses custom tools for filling forms with runtime credentials. Without custom tool caching, the cache records button clicks but skips the credential filling, causing login failures during replay.
Willingness to Contribute
I'm happy to submit a PR implementing this fix if the approach looks reasonable to the maintainers.