Chat agent with memory

This example builds a chat agent that uses Agent Memory to remember user preferences and provide personalized responses across conversations.

What you will build

A Cloudflare Worker that:

Ingests each conversation turn into Agent Memory.
Uses getSummary() to build a personalized system prompt.
Uses recall() to fetch relevant context for the current question.
Generates a response using Workers AI with full memory context.

{
  "name": "memory-chat-agent",
  "main": "src/index.ts",
  // Set this to today's date
  "compatibility_date": "2026-04-22",
  "agent_memory": [
    {
      "binding": "MEMORY",
      "namespace": "<NAMESPACE_NAME>"
    }
  ],
  "ai": {
    "binding": "AI",
    "remote": true,
  }
}

name = "memory-chat-agent"
main = "src/index.ts"
# Set this to today's date
compatibility_date = "2026-04-22"

[[agent_memory]]
binding = "MEMORY"
namespace = "<NAMESPACE_NAME>"

[ai]
binding = "AI"
remote = true

Worker code

JavaScript
TypeScript

export default {
  async fetch(request, env, ctx) {
    if (request.method !== "POST") {
      return new Response("Method not allowed", { status: 405 });
    }

    const { userId, sessionId, message } = await request.json();
    const profile = await env.MEMORY.getProfile(userId);

    // Step 1: Get the profile summary for the system prompt
    const { summary } = await profile.getSummary({ sessionId });

    // Step 2: Recall specific context relevant to the user's message
    const memory = await profile.recall(message, {
      thinkingLevel: "medium",
      responseLength: "short",
    });

    // Step 3: Build the system prompt with memory context
    let systemPrompt = "You are a helpful assistant.";
    if (summary) {
      systemPrompt += `\n\nHere is what you know about the user:\n\n${summary}`;
    }
    if (memory.answer) {
      systemPrompt += `\n\nRelevant context for this question:\n${memory.answer}`;
    }

    // Step 4: Generate a response using Workers AI
    const aiResponse = await env.AI.run(
      "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
      {
        messages: [
          { role: "system", content: systemPrompt },
          { role: "user", content: message },
        ],
      },
    );

    const assistantMessage =
      typeof aiResponse === "object" &&
      aiResponse !== null &&
      "response" in aiResponse
        ? aiResponse.response
        : "";

    // Step 5: Ingest this conversation turn for future recall
    await profile.ingest(
      [
        { role: "user", content: message },
        { role: "assistant", content: assistantMessage || "" },
      ],
      { sessionId },
    );

    return Response.json({
      response: assistantMessage,
    });
  },
};

type ChatRequest = {
  userId: string;
  sessionId: string;
  message: string;
};

export default {
  async fetch(request, env, ctx): Promise<Response> {
    if (request.method !== "POST") {
      return new Response("Method not allowed", { status: 405 });
    }

    const { userId, sessionId, message } = (await request.json()) as ChatRequest;
    const profile = await env.MEMORY.getProfile(userId);

    // Step 1: Get the profile summary for the system prompt
    const { summary } = await profile.getSummary({ sessionId });

    // Step 2: Recall specific context relevant to the user's message
    const memory = await profile.recall(message, {
      thinkingLevel: "medium",
      responseLength: "short",
    });

    // Step 3: Build the system prompt with memory context
    let systemPrompt = "You are a helpful assistant.";
    if (summary) {
      systemPrompt += `\n\nHere is what you know about the user:\n\n${summary}`;
    }
    if (memory.answer) {
      systemPrompt += `\n\nRelevant context for this question:\n${memory.answer}`;
    }

    // Step 4: Generate a response using Workers AI
    const aiResponse = await env.AI.run(
      "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
      {
        messages: [
          { role: "system", content: systemPrompt },
          { role: "user", content: message },
        ],
      },
    );

    const assistantMessage =
      typeof aiResponse === "object" && aiResponse !== null && "response" in aiResponse ? aiResponse.response : "";

    // Step 5: Ingest this conversation turn for future recall
    await profile.ingest(
      [
        { role: "user", content: message },
        { role: "assistant", content: assistantMessage || "" },
      ],
      { sessionId },
    );

    return Response.json({
      response: assistantMessage,
    });
  },
} satisfies ExportedHandler<Env>;

How it works

Profile summary as system prompt — getSummary() returns a Markdown summary of the user's key facts, recent events, active tasks, and stored instructions. This gives the LLM broad context about the user without requiring a specific query.
Targeted recall — recall() searches stored memories for content specifically relevant to the current message. This provides focused context beyond what the general summary includes.
Post-conversation ingestion — After generating a response, the conversation turn is ingested into Agent Memory. The system extracts any new facts, events, or instructions from the exchange.
Memory accumulation — Over multiple conversations, the agent builds up a rich profile of the user. Facts like "prefers TypeScript" or "works on the billing team" persist across sessions and surface automatically in future interactions.

Test the agent

curl -X POST "http://localhost:8787" \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "alice",
    "sessionId": "session-1",
    "message": "I am working on the billing API and prefer TypeScript."
  }'

In a subsequent request, the agent remembers the context:

curl -X POST "http://localhost:8787" \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "alice",
    "sessionId": "session-2",
    "message": "Can you help me with the project I mentioned?"
  }'

The agent responds with awareness of the billing API project and TypeScript preference, even though those details were from a previous session.