Skip to content

fix: date-only timestamp to preserve KV-cache across same-day sessions#20451

Closed
iamfoz wants to merge 1 commit into
NousResearch:mainfrom
iamfoz:pr/kv-cache-timestamp-fix
Closed

fix: date-only timestamp to preserve KV-cache across same-day sessions#20451
iamfoz wants to merge 1 commit into
NousResearch:mainfrom
iamfoz:pr/kv-cache-timestamp-fix

Conversation

@iamfoz

@iamfoz iamfoz commented May 5, 2026

Copy link
Copy Markdown
Contributor

Problem

Every new session injects a dynamic timestamp into the system prompt:

timestamp_line = f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"

Because %I:%M %p changes every session, the model's KV-cache attention for the system prompt is invalidated on every new session. For providers that support prompt caching (Anthropic, Google, etc.), this means the first ~2-3k tokens of context are re-computed on every session instead of being served from cache.

Fix

Remove the time component from the timestamp. Keep the date, drop the precise time:

timestamp_line = f"Conversation started: {now.strftime('%A, %B %d, %Y')}"

The date still changes daily, but sessions within the same day now share cache hits on the timestamp line. The precise time is available to the agent via hermes_time tools if needed.

Impact

  • KV-cache: System prompt timestamp line now stable within a day → cache hit instead of re-compute on same-day sessions
  • Latency: Marginal improvement on same-day sessions (fewer tokens re-processed)
  • Cost: Slight reduction for providers that charge per input token on cache misses
  • User-facing: Agent sees "Conversation started: Tuesday, May 05, 2026" instead of "Conversation started: Tuesday, May 05, 2026 01:15 PM" — still useful, just less granular

The %I:%M %p time component changes every session, invalidating KV-cache
attention for the system prompt. Removing it means same-day sessions share
cache hits on the ~2-3k system prompt tokens.
@alt-glitch alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 5, 2026
@iamfoz

iamfoz commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @teknium1, great to see this expanded into the full stability story. Closing in favour of #27675.

@iamfoz iamfoz closed this May 18, 2026
teknium1 added a commit that referenced this pull request May 18, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR #20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes #20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
@teknium1

Copy link
Copy Markdown
Contributor

Your date-only timestamp fix shipped in PR #27675 (merged commit 4a3f13b) — your authorship is preserved as Co-authored-by. Thank you for the clean diagnosis and minimal repro. The PR also bundled stronger gateway-side logging (loud warnings on session DB write failures + three-way stored-state distinction on read) so future regressions in this area surface in agent.log instead of silently breaking prefix-cache reuse.

Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR NousResearch#20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ogging

The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR NousResearch#20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim NousResearch#3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
@iamfoz iamfoz deleted the pr/kv-cache-timestamp-fix branch June 2, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/perf Performance improvement or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants