Skip to content

fix(evals): add checkpoint drain wait between memory formation and recall eval phases #437

@Aaronontheweb

Description

@Aaronontheweb

Problem

The eval suite runs memory formation and recall cases sequentially without waiting for the async memory curation pipeline to process. Since #410 moved memory formation to a background session observer, memories written in one eval case may not be available for recall in the next — the checkpoint queue hasn't drained yet.

This was always latent (curation was always async via checkpoints), but the session observer architecture makes it more pronounced because formation happens entirely in the background rather than inline.

Observed

After a fresh database (#412 testing), memory recall evals fail because the curation pipeline hasn't processed any checkpoints yet. Even with an existing database, timing-dependent eval failures are possible if formation and recall cases run back-to-back.

Proposed fix

  1. Split eval phases: run all memory-write evals first, then all recall evals
  2. Add a drain wait: between phases, poll pending checkpoint count (netclaw status or equivalent) and wait for the queue to reach zero before proceeding
  3. Configurable timeout: NETCLAW_EVAL_DRAIN_TIMEOUT (default 60s) — fail loudly if checkpoints don't drain in time rather than silently running recall against stale data

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    memoryMemory formation, recall, curation pipeline

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions