perf: stabilize system prompt timestamp across compression cycles by malaiwah · Pull Request #8689 · NousResearch/hermes-agent

malaiwah · 2026-04-13T00:26:58Z

Summary

Stabilizes the "Conversation started" timestamp in the system prompt across compression cycles, preventing prefix cache invalidation on local LLM backends.

Before

now = _hermes_now()  # current time — changes after every compression!
timestamp_line = f"Conversation started: {now.strftime(...)}"

After

_start = getattr(self, "session_start", None) or _hermes_now()  # set once at __init__
timestamp_line = f"Conversation started: {_start.strftime(...)}"
# Only after compression:
timestamp_line += f"\nLast context compaction: {_now.strftime(...)} (#{count})"

Evidence

MITM proxy analysis showed the timestamp was the ONLY difference between system prompts. On LM Studio with phi-3.5-mini (~17K token prompt): 3.6x latency penalty per cache miss (2.2s vs 8.0s).

After fix, validated via MITM: "Conversation started" stays stable. Post-compression adds exactly one line, preserving the prefix before it for partial cache matching.

Test plan

MITM proxy validation: "Conversation started" stable across compression
Post-compression prompt adds "Last context compaction: {date} (#{count})"
Defensive: handles missing session_start and context_compressor gracefully
CI tests pass

Closes #8687. Relates to #3353, #4319.

The "Conversation started" timestamp was rebuilt with _hermes_now() on every context compression, changing the system prompt and invalidating prefix caches on local LLM backends. Fix: use self.session_start (set once at __init__) for the stable "Conversation started" line. Add "Last context compaction" with current time and count only when compression has occurred. Evidence from MITM proxy analysis of 709 LLM requests: - The timestamp was the ONLY difference between system prompts across sessions (same size, different hash by 1 minute) - On LM Studio (M4 Max): 3.6x latency penalty per cache miss (2.2s HIT vs 8.0s MISS for phi-3.5-mini with ~17K token prompt) - On sglang: RadixAttention handles partial prefix matching, so the impact is lower but still wastes compute After fix, MITM validation confirms: - "Conversation started" stays stable across all prompt variants - Post-compression prompt adds exactly one line ("Last context compaction: {date} (#{count})"), preserving the prefix before it Relates-to: NousResearch#3353, NousResearch#4319 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

markojak · 2026-04-29T14:09:57Z

Linking this into the #17459/#17476 plan.

This PR is compatible with the selected direction if it strictly stabilizes the session-start anchor across compression cycles. The broader live-current-time fix should come from ephemeral runtime/user-message context (#15872/#17476), not from updating the cached system prompt.

Please make sure tests distinguish:

stable session-start timestamp in the cached prompt/cacheable prefix;
volatile current-time context outside that cached prefix.

Related: #8687, #15866, #10421, #17459, #17476.

Cherry-pick logic from upstream PR NousResearch#8689 (fixes NousResearch#8687). Before: 'Conversation started' timestamp used _hermes_now() which changes every call, invalidating prefix cache on DashScope/Qwen and local LLM backends after every context compression. After: use self.session_start (set once at __init__) so the timestamp stays stable. After compression, append a separate 'Last context compaction' line so the prefix before it remains cache-friendly while still providing timing info. Impact for our setup: DashScope implicit KV cache reuse improves on repeated turns after compression, reducing token cost and TTFT. [yangtb-patch] NOTE: Once official PR NousResearch#8689 is merged into upstream main, prefer the upstream version and remove this local patch during the next yangtb sync.

@iamfoz

…ogging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR #20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes #20451 (date-only), #18547 (prefix stabilization), #8689 (stabilize timestamp across compression), #15866 (timestamp caching question), #8687 (compression timestamp), #27339 (claim #3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>

teknium1 · 2026-05-18T06:21:10Z

Superseded by PR #27675 (merged commit 4a3f13b). Your session_start pinning idea correctly identified the timestamp as the cache killer; the merged fix takes a simpler shape (date-only granularity instead of pinning a moment-in-time) so the same prompt is byte-stable across same-day sessions without needing to thread a session_start field through compression. Thanks for the analysis.

@iamfoz

…ogging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR NousResearch#20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization), NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp caching question), NousResearch#8687 (compression timestamp), NousResearch#27339 (claim NousResearch#3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>

@iamfoz

…ogging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR NousResearch#20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization), NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp caching question), NousResearch#8687 (compression timestamp), NousResearch#27339 (claim NousResearch#3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>

This was referenced Apr 13, 2026

perf(ttft): move runtime metadata out of the cached system prompt #3353

Closed

[Performance] KV cache invalidation on compression hurts local MoE models — defer unnecessary system prompt rebuilds #4319

Open

alt-glitch mentioned this pull request Apr 26, 2026

Question: does the minute-precision timestamp in _build_system_prompt invalidate prompt caching for upstream inference servers? #15866

Closed

alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 28, 2026

This was referenced Apr 29, 2026

Rework quiet-hours/time awareness: surface time to agent/tools, don't enforce in control plane #17459

Open

Consolidate live-time PRs around one ephemeral runtime context path #17476

Open

markojak mentioned this pull request Apr 29, 2026

perf: system prompt timestamp changes after compression, invalidating prefix cache on local LLM backends #8687

Closed

alt-glitch mentioned this pull request May 1, 2026

fix(prompt): stabilize system prompt prefix for KV cache reuse #18547

Closed

This was referenced May 17, 2026

[Bug]: Prompt Cache / KV Cache Invalidation on Follow-Up Messages Due to Dynamic Tool Shuffling #27339

Closed

perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging #27675

Merged

teknium1 closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: stabilize system prompt timestamp across compression cycles#8689

perf: stabilize system prompt timestamp across compression cycles#8689
malaiwah wants to merge 1 commit into
NousResearch:mainfrom
malaiwah:upstream/perf/stable-system-prompt-timestamp

malaiwah commented Apr 13, 2026

Uh oh!

markojak commented Apr 29, 2026

Uh oh!

teknium1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants