Skip to content

feat: Add estimated kv cache hit metric events#30

Merged
rmccorm4 merged 15 commits into
mainfrom
rmccormick/kv_cache_hit/events
Mar 6, 2025
Merged

feat: Add estimated kv cache hit metric events#30
rmccorm4 merged 15 commits into
mainfrom
rmccormick/kv_cache_hit/events

Conversation

@rmccorm4

@rmccorm4 rmccorm4 commented Mar 5, 2025

Copy link
Copy Markdown
Contributor

Overview:

Adds per-worker and cumulative estimated kv cache hit metrics (ISL blocks / Overlapping blocks) in Grafana.

Details:

Changes:

  • Adds EventSubscriber trait and implements it for Namespace
  • Publishes ISL blocks and Overlap blocks as events from the KV Router when selecting a worker for a given request
  • Subscribes to those events from metric aggregator (count)
  • Adds corresponding prometheus metrics and grafana dashboard panels

Future Work:

  • Add event pub/sub to more granular level like Component/Endpoint instead of Namespace
  • Improve mock_worker.rs to publish mock kv cache hit events for lighter debug/testing

Example Grafana Dashboard with new timeseries panels and gauge for Per-worker and Cumulative KV Cache Hit Rate estimation:

image

@github-actions

github-actions Bot commented Mar 5, 2025

Copy link
Copy Markdown
Contributor

Test Results

  2 files    2 suites   54s ⏱️
 80 tests  80 ✅ 0 💤 0 ❌
102 runs  101 ✅ 1 💤 0 ❌

Results for commit f2dcb27.

♻️ This comment has been updated with latest results.

… count and log estimated cumulative hit rate
@rmccorm4 rmccorm4 marked this pull request as ready for review March 6, 2025 03:46
Comment thread applications/llm/count/src/lib.rs
Comment thread applications/llm/count/src/main.rs
Comment thread lib/runtime/src/component.rs
Comment thread lib/runtime/src/component/namespace.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants