Skip to content

fix: implement caching for system prompt to avoid repeated context reprocessing (fixes #607)#617

Merged
yinwm merged 2 commits intosipeed:mainfrom
Zhaoyikaiii:fix/repeated-context-reprocessing
Feb 25, 2026
Merged

fix: implement caching for system prompt to avoid repeated context reprocessing (fixes #607)#617
yinwm merged 2 commits intosipeed:mainfrom
Zhaoyikaiii:fix/repeated-context-reprocessing

Conversation

@Zhaoyikaiii
Copy link
Collaborator

Summary

  • Add system prompt caching with BuildSystemPromptWithCache() using double-check locking pattern
  • Separate conversation summary from system prompt (now an independent assistant message)
  • Add InvalidateCache() method for when workspace files are modified
  • Add comprehensive test coverage for the caching mechanism

Problem

Issue #607: Prompt context keeps changing so the model has to reprocess thousands of tokens every turn. The system prompt was being rebuilt on every BuildMessages() call, and the summary was appended directly to it, causing KV cache to be ineffective.

Solution

  1. Cache the system prompt after first build
  2. Move summary to a separate message so system prompt remains constant
  3. This enables KV cache to work properly - only the changing parts (history, current message) need reprocessing

Test plan

  • TestSystemPromptCaching - Verifies system prompt consistency across calls
  • TestBuildMessagesWithSummary - Confirms summary doesn't modify system prompt
  • TestSystemPromptStructure - Validates summary is in a separate message
  • TestCachedSystemPromptStability - Ensures cache works correctly
  • TestSummaryAsSeparateMessage - Confirms KV cache friendly structure
  • TestInvalidateCache - Validates cache invalidation mechanism
  • BenchmarkBuildMessages - Performance testing

References

Fixes #607

@Zhaoyikaiii Zhaoyikaiii requested a review from yinwm February 22, 2026 04:25
@yinwm
Copy link
Collaborator

yinwm commented Feb 24, 2026

PR Review Comment:

Issue: Multiple system messages are incompatible with Anthropic API

This PR splits the message structure into multiple system messages to optimize KV cache, but this creates a compatibility issue with the Anthropic API.

Reason:

According to the official Anthropic documentation, the Messages API does not support the system role in the messages array:

"Note that if you want to include a system prompt, you can use the top-level system parameter — there is no system role for input messages in the Messages API."

Source: https://docs.anthropic.com/en/api/messages

The messages parameter only accepts user and assistant roles. If messages with system role are sent, the API will return an error:

{
  "error": {
    "message": "messages: Unexpected role \"system\". The Messages API accepts a top-level system parameter, not \"system\" as an input message role.",
    "type": "invalid_request_error"
  }
}

Impact:

  • Direct Anthropic API usage
  • AWS Bedrock Claude integration
  • Google Vertex AI Claude integration

All these scenarios will fail with a 400 error when multiple system messages are sent.

Recommendation:

The format differences between APIs should be handled at the provider adapter layer, rather than enforcing a unified structure in the core message format.


References:

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach to the caching problem. The double-checked locking pattern is correct, the mtime-based invalidation is a pragmatic choice, and sorting tool names for deterministic output is an important detail for KV cache stability.

However, there are some issues worth discussing:

The KV cache benefit is overstated in the PR description.
The comment in buildDynamicContext already acknowledges this honestly:

"since dynamic context is merged into the single system message, the caching benefit is local only (avoids repeated file I/O / string building), not LLM-side KV cache reuse."

Since Current Time changes every minute and is concatenated into the same system message, the LLM prefix hash changes every call regardless. The real benefit here is avoiding repeated file I/O and string construction — which is still worthwhile, but the PR title/description should be updated to reflect this accurately, especially since issue #607 specifically mentions LLM-side KV cache reuse.

To actually achieve LLM-side KV cache benefits, you would need to split the static content into a separate system message (role: system, cache_control: ephemeral for Anthropic) from the dynamic one. Some providers support this, some do not. The PR comment explains why a single system message was chosen (Anthropic/Codex compatibility), which is a reasonable tradeoff.

Removed loadSkills() method — is it still called elsewhere?
The PR removes the loadSkills() method from context.go. I see buildSkillsSection() exists separately, but please confirm no other code references loadSkills() or this will break the build.

sourceFilesChanged() is called under RLock but does file I/O:
The os.Stat calls inside sourceFilesChanged() are technically fine under RLock (it only blocks writers, not other readers), but if the filesystem is slow (NFS, network mount), this could cause reader contention. For a local filesystem this is a non-issue.

Race window between mtime check and cache read:
Between sourceFilesChanged() returning false and reading cachedSystemPrompt, a file could be modified. This is acceptable — the cache is eventually consistent and will catch the change on the next call.

Test coverage is thorough. The mtime auto-invalidation tests, cache stability tests, single-system-message verification, and benchmark are all well-designed.

Sorted tool names — good catch. Non-deterministic map iteration causing different tool definition ordering is a subtle but real issue for KV cache (and also makes diffs/debugging harder). This change in registry.go is correct and independently valuable.

Overall: the code is well-written, the caching mechanism is correct, and the test suite is comprehensive. The main suggestion is to align the PR description with the actual benefit (local I/O avoidance, not LLM KV cache reuse) since the dynamic context prevents true prefix caching.

@Zhaoyikaiii Zhaoyikaiii force-pushed the fix/repeated-context-reprocessing branch 4 times, most recently from 9b2500a to 3200f0d Compare February 25, 2026 03:04
@Zhaoyikaiii Zhaoyikaiii force-pushed the fix/repeated-context-reprocessing branch from 3200f0d to a719938 Compare February 25, 2026 03:18
}

// Log system prompt summary for debugging (debug mode only)
fullSystemPrompt := strings.Join(systemParts, "\n\n---\n\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time the fullSystemPrompt includes dynamic part, which means the final system prompt always change. Is this can hit KV Cache?

@Zhaoyikaiii Zhaoyikaiii force-pushed the fix/repeated-context-reprocessing branch from a719938 to e48bf57 Compare February 25, 2026 07:18
Avoid rebuilding the entire system prompt on every BuildMessages() call
by caching the static portion (identity, bootstrap, skills summary,
memory) and only recomputing it when workspace source files change.

Key changes:

- ContextBuilder caches the static prompt behind an RWMutex with
  double-checked locking. Source file changes are detected via cheap
  os.Stat mtime checks so no explicit invalidation is needed.

- Track file existence at cache time (existedAtCache map) so that
  newly created or deleted bootstrap/memory files also trigger a
  rebuild — the old modifiedSince() silently returned false on
  os.IsNotExist.

- Walk the skills directory recursively with filepath.WalkDir to
  catch content-only edits at any nesting depth; directory mtime
  alone misses in-place file modifications on most filesystems.

- ToolRegistry.sortedToolNames() sorts tool names before iteration,
  ensuring deterministic tool definition order across calls — a
  prerequisite for LLM-side prefix/KV cache reuse.

- Merge all context (static + dynamic + summary) into a single
  system message for provider compatibility: the Anthropic adapter
  extracts messages[0] as the top-level system parameter, and Codex
  reads only the first system message as instructions.

- Fix a data race in BuildMessages() where cachedSystemPrompt was
  read without holding the lock in a debug log statement.

- Add tests: single system message invariant, mtime auto-invalidation,
  new-file creation detection, skill file content change, explicit
  InvalidateCache, cache stability, concurrent access (20 goroutines
  x 50 iterations, passes go test -race), and a benchmark.
@Zhaoyikaiii Zhaoyikaiii force-pushed the fix/repeated-context-reprocessing branch from e48bf57 to 1f7cbd9 Compare February 25, 2026 07:30
Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yinwm yinwm merged commit 53578da into sipeed:main Feb 25, 2026
2 checks passed
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
…eprocessing

fix: implement caching for system prompt to avoid repeated context reprocessing (fixes sipeed#607)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] repeated reprocessing of the entire context

3 participants