Skip to content

perf: stabilize system prompt timestamp across compression cycles#8689

Closed
malaiwah wants to merge 1 commit into
NousResearch:mainfrom
malaiwah:upstream/perf/stable-system-prompt-timestamp
Closed

perf: stabilize system prompt timestamp across compression cycles#8689
malaiwah wants to merge 1 commit into
NousResearch:mainfrom
malaiwah:upstream/perf/stable-system-prompt-timestamp

Conversation

@malaiwah

Copy link
Copy Markdown
Contributor

Summary

Stabilizes the "Conversation started" timestamp in the system prompt across compression cycles, preventing prefix cache invalidation on local LLM backends.

Before

now = _hermes_now()  # current time — changes after every compression!
timestamp_line = f"Conversation started: {now.strftime(...)}"

After

_start = getattr(self, "session_start", None) or _hermes_now()  # set once at __init__
timestamp_line = f"Conversation started: {_start.strftime(...)}"
# Only after compression:
timestamp_line += f"\nLast context compaction: {_now.strftime(...)} (#{count})"

Evidence

MITM proxy analysis showed the timestamp was the ONLY difference between system prompts. On LM Studio with phi-3.5-mini (~17K token prompt): 3.6x latency penalty per cache miss (2.2s vs 8.0s).

After fix, validated via MITM: "Conversation started" stays stable. Post-compression adds exactly one line, preserving the prefix before it for partial cache matching.

Test plan

  • MITM proxy validation: "Conversation started" stable across compression
  • Post-compression prompt adds "Last context compaction: {date} (#{count})"
  • Defensive: handles missing session_start and context_compressor gracefully
  • CI tests pass

Closes #8687. Relates to #3353, #4319.

The "Conversation started" timestamp was rebuilt with _hermes_now()
on every context compression, changing the system prompt and
invalidating prefix caches on local LLM backends.

Fix: use self.session_start (set once at __init__) for the stable
"Conversation started" line. Add "Last context compaction" with
current time and count only when compression has occurred.

Evidence from MITM proxy analysis of 709 LLM requests:
- The timestamp was the ONLY difference between system prompts
  across sessions (same size, different hash by 1 minute)
- On LM Studio (M4 Max): 3.6x latency penalty per cache miss
  (2.2s HIT vs 8.0s MISS for phi-3.5-mini with ~17K token prompt)
- On sglang: RadixAttention handles partial prefix matching,
  so the impact is lower but still wastes compute

After fix, MITM validation confirms:
- "Conversation started" stays stable across all prompt variants
- Post-compression prompt adds exactly one line ("Last context
  compaction: {date} (#{count})"), preserving the prefix before it

Relates-to: NousResearch#3353, NousResearch#4319

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@markojak

Copy link
Copy Markdown

Linking this into the #17459/#17476 plan.

This PR is compatible with the selected direction if it strictly stabilizes the session-start anchor across compression cycles. The broader live-current-time fix should come from ephemeral runtime/user-message context (#15872/#17476), not from updating the cached system prompt.

Please make sure tests distinguish:

  • stable session-start timestamp in the cached prompt/cacheable prefix;
  • volatile current-time context outside that cached prefix.

Related: #8687, #15866, #10421, #17459, #17476.

T02200059 pushed a commit to T02200059/hermes-agent that referenced this pull request Apr 30, 2026
Cherry-pick logic from upstream PR NousResearch#8689 (fixes NousResearch#8687).

Before: 'Conversation started' timestamp used _hermes_now() which
changes every call, invalidating prefix cache on DashScope/Qwen and
local LLM backends after every context compression.

After: use self.session_start (set once at __init__) so the timestamp
stays stable. After compression, append a separate
'Last context compaction' line so the prefix before it remains
cache-friendly while still providing timing info.

Impact for our setup: DashScope implicit KV cache reuse improves on
repeated turns after compression, reducing token cost and TTFT.

[yangtb-patch] NOTE: Once official PR NousResearch#8689 is merged into upstream
main, prefer the upstream version and remove this local patch during
the next yangtb sync.
teknium1 added a commit that referenced this pull request May 18, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR #20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes #20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
@teknium1

Copy link
Copy Markdown
Contributor

Superseded by PR #27675 (merged commit 4a3f13b). Your session_start pinning idea correctly identified the timestamp as the cache killer; the merged fix takes a simpler shape (date-only granularity instead of pinning a moment-in-time) so the same prompt is byte-stable across same-day sessions without needing to thread a session_start field through compression. Thanks for the analysis.

@teknium1 teknium1 closed this May 18, 2026
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR NousResearch#20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR NousResearch#20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/perf Performance improvement or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: system prompt timestamp changes after compression, invalidating prefix cache on local LLM backends

4 participants