fix: implement caching for system prompt to avoid repeated context reprocessing (fixes #607)#617
Conversation
|
PR Review Comment: Issue: Multiple system messages are incompatible with Anthropic API This PR splits the message structure into multiple system messages to optimize KV cache, but this creates a compatibility issue with the Anthropic API. Reason: According to the official Anthropic documentation, the Messages API does not support the
Source: https://docs.anthropic.com/en/api/messages The messages parameter only accepts {
"error": {
"message": "messages: Unexpected role \"system\". The Messages API accepts a top-level system parameter, not \"system\" as an input message role.",
"type": "invalid_request_error"
}
}Impact:
All these scenarios will fail with a 400 error when multiple system messages are sent. Recommendation: The format differences between APIs should be handled at the provider adapter layer, rather than enforcing a unified structure in the core message format. References:
|
nikolasdehor
left a comment
There was a problem hiding this comment.
Good approach to the caching problem. The double-checked locking pattern is correct, the mtime-based invalidation is a pragmatic choice, and sorting tool names for deterministic output is an important detail for KV cache stability.
However, there are some issues worth discussing:
The KV cache benefit is overstated in the PR description.
The comment in buildDynamicContext already acknowledges this honestly:
"since dynamic context is merged into the single system message, the caching benefit is local only (avoids repeated file I/O / string building), not LLM-side KV cache reuse."
Since Current Time changes every minute and is concatenated into the same system message, the LLM prefix hash changes every call regardless. The real benefit here is avoiding repeated file I/O and string construction — which is still worthwhile, but the PR title/description should be updated to reflect this accurately, especially since issue #607 specifically mentions LLM-side KV cache reuse.
To actually achieve LLM-side KV cache benefits, you would need to split the static content into a separate system message (role: system, cache_control: ephemeral for Anthropic) from the dynamic one. Some providers support this, some do not. The PR comment explains why a single system message was chosen (Anthropic/Codex compatibility), which is a reasonable tradeoff.
Removed loadSkills() method — is it still called elsewhere?
The PR removes the loadSkills() method from context.go. I see buildSkillsSection() exists separately, but please confirm no other code references loadSkills() or this will break the build.
sourceFilesChanged() is called under RLock but does file I/O:
The os.Stat calls inside sourceFilesChanged() are technically fine under RLock (it only blocks writers, not other readers), but if the filesystem is slow (NFS, network mount), this could cause reader contention. For a local filesystem this is a non-issue.
Race window between mtime check and cache read:
Between sourceFilesChanged() returning false and reading cachedSystemPrompt, a file could be modified. This is acceptable — the cache is eventually consistent and will catch the change on the next call.
Test coverage is thorough. The mtime auto-invalidation tests, cache stability tests, single-system-message verification, and benchmark are all well-designed.
Sorted tool names — good catch. Non-deterministic map iteration causing different tool definition ordering is a subtle but real issue for KV cache (and also makes diffs/debugging harder). This change in registry.go is correct and independently valuable.
Overall: the code is well-written, the caching mechanism is correct, and the test suite is comprehensive. The main suggestion is to align the PR description with the actual benefit (local I/O avoidance, not LLM KV cache reuse) since the dynamic context prevents true prefix caching.
9b2500a to
3200f0d
Compare
3200f0d to
a719938
Compare
pkg/agent/context.go
Outdated
| } | ||
|
|
||
| // Log system prompt summary for debugging (debug mode only) | ||
| fullSystemPrompt := strings.Join(systemParts, "\n\n---\n\n") |
There was a problem hiding this comment.
Every time the fullSystemPrompt includes dynamic part, which means the final system prompt always change. Is this can hit KV Cache?
a719938 to
e48bf57
Compare
Avoid rebuilding the entire system prompt on every BuildMessages() call by caching the static portion (identity, bootstrap, skills summary, memory) and only recomputing it when workspace source files change. Key changes: - ContextBuilder caches the static prompt behind an RWMutex with double-checked locking. Source file changes are detected via cheap os.Stat mtime checks so no explicit invalidation is needed. - Track file existence at cache time (existedAtCache map) so that newly created or deleted bootstrap/memory files also trigger a rebuild — the old modifiedSince() silently returned false on os.IsNotExist. - Walk the skills directory recursively with filepath.WalkDir to catch content-only edits at any nesting depth; directory mtime alone misses in-place file modifications on most filesystems. - ToolRegistry.sortedToolNames() sorts tool names before iteration, ensuring deterministic tool definition order across calls — a prerequisite for LLM-side prefix/KV cache reuse. - Merge all context (static + dynamic + summary) into a single system message for provider compatibility: the Anthropic adapter extracts messages[0] as the top-level system parameter, and Codex reads only the first system message as instructions. - Fix a data race in BuildMessages() where cachedSystemPrompt was read without holding the lock in a debug log statement. - Add tests: single system message invariant, mtime auto-invalidation, new-file creation detection, skill file content change, explicit InvalidateCache, cache stability, concurrent access (20 goroutines x 50 iterations, passes go test -race), and a benchmark.
e48bf57 to
1f7cbd9
Compare
…eprocessing fix: implement caching for system prompt to avoid repeated context reprocessing (fixes sipeed#607)
Summary
BuildSystemPromptWithCache()using double-check locking patternassistantmessage)InvalidateCache()method for when workspace files are modifiedProblem
Issue #607: Prompt context keeps changing so the model has to reprocess thousands of tokens every turn. The system prompt was being rebuilt on every
BuildMessages()call, and the summary was appended directly to it, causing KV cache to be ineffective.Solution
Test plan
TestSystemPromptCaching- Verifies system prompt consistency across callsTestBuildMessagesWithSummary- Confirms summary doesn't modify system promptTestSystemPromptStructure- Validates summary is in a separate messageTestCachedSystemPromptStability- Ensures cache works correctlyTestSummaryAsSeparateMessage- Confirms KV cache friendly structureTestInvalidateCache- Validates cache invalidation mechanismBenchmarkBuildMessages- Performance testingReferences
Fixes #607