Chat agent with memory
This example builds a chat agent that uses Agent Memory to remember user preferences and provide personalized responses across conversations.
A Cloudflare Worker that:
- Ingests each conversation turn into Agent Memory.
- Uses
getSummary()to build a personalized system prompt. - Uses
recall()to fetch relevant context for the current question. - Generates a response using Workers AI with full memory context.
{ "name": "memory-chat-agent", "main": "src/index.ts", // Set this to today's date "compatibility_date": "2026-04-22", "agent_memory": [ { "binding": "MEMORY", "namespace": "<NAMESPACE_NAME>" } ], "ai": { "binding": "AI", "remote": true, }}name = "memory-chat-agent"main = "src/index.ts"# Set this to today's datecompatibility_date = "2026-04-22"
[[agent_memory]]binding = "MEMORY"namespace = "<NAMESPACE_NAME>"
[ai]binding = "AI"remote = trueexport default { async fetch(request, env, ctx) { if (request.method !== "POST") { return new Response("Method not allowed", { status: 405 }); }
const { userId, sessionId, message } = await request.json(); const profile = await env.MEMORY.getProfile(userId);
// Step 1: Get the profile summary for the system prompt const { summary } = await profile.getSummary({ sessionId });
// Step 2: Recall specific context relevant to the user's message const memory = await profile.recall(message, { thinkingLevel: "medium", responseLength: "short", });
// Step 3: Build the system prompt with memory context let systemPrompt = "You are a helpful assistant."; if (summary) { systemPrompt += `\n\nHere is what you know about the user:\n\n${summary}`; } if (memory.answer) { systemPrompt += `\n\nRelevant context for this question:\n${memory.answer}`; }
// Step 4: Generate a response using Workers AI const aiResponse = await env.AI.run( "@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: systemPrompt }, { role: "user", content: message }, ], }, );
const assistantMessage = typeof aiResponse === "object" && aiResponse !== null && "response" in aiResponse ? aiResponse.response : "";
// Step 5: Ingest this conversation turn for future recall await profile.ingest( [ { role: "user", content: message }, { role: "assistant", content: assistantMessage || "" }, ], { sessionId }, );
return Response.json({ response: assistantMessage, }); },};type ChatRequest = { userId: string; sessionId: string; message: string;};
export default { async fetch(request, env, ctx): Promise<Response> { if (request.method !== "POST") { return new Response("Method not allowed", { status: 405 }); }
const { userId, sessionId, message } = (await request.json()) as ChatRequest; const profile = await env.MEMORY.getProfile(userId);
// Step 1: Get the profile summary for the system prompt const { summary } = await profile.getSummary({ sessionId });
// Step 2: Recall specific context relevant to the user's message const memory = await profile.recall(message, { thinkingLevel: "medium", responseLength: "short", });
// Step 3: Build the system prompt with memory context let systemPrompt = "You are a helpful assistant."; if (summary) { systemPrompt += `\n\nHere is what you know about the user:\n\n${summary}`; } if (memory.answer) { systemPrompt += `\n\nRelevant context for this question:\n${memory.answer}`; }
// Step 4: Generate a response using Workers AI const aiResponse = await env.AI.run( "@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: systemPrompt }, { role: "user", content: message }, ], }, );
const assistantMessage = typeof aiResponse === "object" && aiResponse !== null && "response" in aiResponse ? aiResponse.response : "";
// Step 5: Ingest this conversation turn for future recall await profile.ingest( [ { role: "user", content: message }, { role: "assistant", content: assistantMessage || "" }, ], { sessionId }, );
return Response.json({ response: assistantMessage, }); },} satisfies ExportedHandler<Env>;-
Profile summary as system prompt —
getSummary()returns a Markdown summary of the user's key facts, recent events, active tasks, and stored instructions. This gives the LLM broad context about the user without requiring a specific query. -
Targeted recall —
recall()searches stored memories for content specifically relevant to the current message. This provides focused context beyond what the general summary includes. -
Post-conversation ingestion — After generating a response, the conversation turn is ingested into Agent Memory. The system extracts any new facts, events, or instructions from the exchange.
-
Memory accumulation — Over multiple conversations, the agent builds up a rich profile of the user. Facts like "prefers TypeScript" or "works on the billing team" persist across sessions and surface automatically in future interactions.
curl -X POST "http://localhost:8787" \ -H "Content-Type: application/json" \ -d '{ "userId": "alice", "sessionId": "session-1", "message": "I am working on the billing API and prefer TypeScript." }'In a subsequent request, the agent remembers the context:
curl -X POST "http://localhost:8787" \ -H "Content-Type: application/json" \ -d '{ "userId": "alice", "sessionId": "session-2", "message": "Can you help me with the project I mentioned?" }'The agent responds with awareness of the billing API project and TypeScript preference, even though those details were from a previous session.