perf: stabilize system prompt timestamp across compression cycles#8689
Closed
malaiwah wants to merge 1 commit into
Closed
perf: stabilize system prompt timestamp across compression cycles#8689malaiwah wants to merge 1 commit into
malaiwah wants to merge 1 commit into
Conversation
The "Conversation started" timestamp was rebuilt with _hermes_now()
on every context compression, changing the system prompt and
invalidating prefix caches on local LLM backends.
Fix: use self.session_start (set once at __init__) for the stable
"Conversation started" line. Add "Last context compaction" with
current time and count only when compression has occurred.
Evidence from MITM proxy analysis of 709 LLM requests:
- The timestamp was the ONLY difference between system prompts
across sessions (same size, different hash by 1 minute)
- On LM Studio (M4 Max): 3.6x latency penalty per cache miss
(2.2s HIT vs 8.0s MISS for phi-3.5-mini with ~17K token prompt)
- On sglang: RadixAttention handles partial prefix matching,
so the impact is lower but still wastes compute
After fix, MITM validation confirms:
- "Conversation started" stays stable across all prompt variants
- Post-compression prompt adds exactly one line ("Last context
compaction: {date} (#{count})"), preserving the prefix before it
Relates-to: NousResearch#3353, NousResearch#4319
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 29, 2026
|
Linking this into the #17459/#17476 plan. This PR is compatible with the selected direction if it strictly stabilizes the session-start anchor across compression cycles. The broader live-current-time fix should come from ephemeral runtime/user-message context (#15872/#17476), not from updating the cached system prompt. Please make sure tests distinguish:
|
T02200059
pushed a commit
to T02200059/hermes-agent
that referenced
this pull request
Apr 30, 2026
Cherry-pick logic from upstream PR NousResearch#8689 (fixes NousResearch#8687). Before: 'Conversation started' timestamp used _hermes_now() which changes every call, invalidating prefix cache on DashScope/Qwen and local LLM backends after every context compression. After: use self.session_start (set once at __init__) so the timestamp stays stable. After compression, append a separate 'Last context compaction' line so the prefix before it remains cache-friendly while still providing timing info. Impact for our setup: DashScope implicit KV cache reuse improves on repeated turns after compression, reducing token cost and TTFT. [yangtb-patch] NOTE: Once official PR NousResearch#8689 is merged into upstream main, prefer the upstream version and remove this local patch during the next yangtb sync.
This was referenced May 17, 2026
teknium1
added a commit
that referenced
this pull request
May 18, 2026
…ogging
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR #20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes #20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
Contributor
|
Superseded by PR #27675 (merged commit 4a3f13b). Your |
Lillard01
pushed a commit
to Lillard01/hermes-agent
that referenced
this pull request
May 21, 2026
…ogging
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR NousResearch#20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…ogging
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR NousResearch#20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stabilizes the "Conversation started" timestamp in the system prompt across compression cycles, preventing prefix cache invalidation on local LLM backends.
Before
After
Evidence
MITM proxy analysis showed the timestamp was the ONLY difference between system prompts. On LM Studio with phi-3.5-mini (~17K token prompt): 3.6x latency penalty per cache miss (2.2s vs 8.0s).
After fix, validated via MITM: "Conversation started" stays stable. Post-compression adds exactly one line, preserving the prefix before it for partial cache matching.
Test plan
Closes #8687. Relates to #3353, #4319.