feat(telemetry): add Prometheus metrics exporter via observer pattern#806
Conversation
qin-ctx
left a comment
There was a problem hiding this comment.
Thanks for adding Prometheus metrics support — standard observability is a genuine gap in OpenViking. The framework here (config, observer, endpoint) is well-structured and the manual Prometheus text format implementation is correct. However, there are several design issues that need to be addressed before this can be merged.
Key concern: the record_* methods on PrometheusObserver are never called from anywhere in the codebase, so /metrics will always return all-zero values. The existing RetrievalStatsCollector and TokenUsageTracker already collect the data this observer wants to expose — bridging those would be more valuable than building parallel counters.
| self._retrieval_latency_seconds.observe(float(latency_seconds)) | ||
|
|
||
| def record_embedding(self, latency_seconds: float) -> None: | ||
| with self._lock: |
There was a problem hiding this comment.
[Design] (blocking)
record_retrieval(), record_embedding(), record_vlm_call(), record_cache_hit(), record_cache_miss() — none of these methods are called from anywhere in the codebase. The observer is created and stored on app.state, but no business logic writes data into it. /metrics will always return all-zero values.
OpenViking already has data collection in place:
RetrievalStatsCollector(openviking/retrieve/retrieval_stats.py) records query counts, latency, scoresTokenUsageTracker(openviking/models/vlm/base.py) records VLM token usage
Consider either:
- Adding instrumentation calls at the actual retrieval/embedding/VLM call sites, or
- Bridging the existing collectors — e.g.,
render_metrics()reads fromRetrievalStatsCollector.snapshot()instead of maintaining parallel counters
openviking/server/app.py
Outdated
| app.state.prometheus_observer = PrometheusObserver() | ||
| if not getattr(app.state, "metrics_router_registered", False): | ||
| app.include_router(metrics_router) | ||
| app.state.metrics_router_registered = True |
There was a problem hiding this comment.
[Design] (blocking)
Conditionally registering the router inside lifespan() is inconsistent with the rest of the app — all other routers are registered statically in create_app() (lines 169-181). The metrics_router_registered guard suggests concern about lifespan running multiple times, but FastAPI's lifespan executes exactly once.
Since the /metrics handler already returns 404 when the observer is disabled, you can simplify this by always registering the router statically alongside the others:
# In the router registration block at the bottom of create_app()
app.include_router(metrics_router)and remove the conditional registration + guard from lifespan entirely.
| config = ServerConfig( | ||
| host=server_data.get("host", "127.0.0.1"), | ||
| port=server_data.get("port", 1933), | ||
| workers=server_data.get("workers", 1), |
There was a problem hiding this comment.
[Design] (blocking)
These two lines add with_bot and bot_api_url to the config loader — this is an independent bug fix (these fields were declared on ServerConfig but never read from the config file). It should be in a separate PR to keep the change history clean and make git blame meaningful.
with_bot=server_data.get("with_bot", False),
bot_api_url=server_data.get("bot_api_url", "http://localhost:18790"),| lines.append(f'{name}_bucket{{le="{upper:g}"}} {count}') | ||
| lines.append(f'{name}_bucket{{le="+Inf"}} {total_count}') | ||
| lines.append(f"{name}_sum {total_sum}") | ||
| lines.append(f"{name}_count {total_count}") |
There was a problem hiding this comment.
[Suggestion] (non-blocking)
No tests are included in this PR. At minimum, the following should be covered:
_Histogramcumulative bucket counting logicrender_metrics()output format (Prometheus is strict about format)/metricsendpoint behavior when enabled vs disabled- Thread-safety of concurrent
record_*+render_metrics()calls
| """Return Prometheus exposition for observer status views.""" | ||
| return self.render_metrics() | ||
|
|
||
| def is_healthy(self) -> bool: |
There was a problem hiding this comment.
[Design] (non-blocking)
PrometheusObserver inherits BaseObserver but bypasses the existing ObserverService registration in debug_service.py. All other observers are managed through ObserverService and exposed via /api/v1/observer/system. This observer lives on app.state with its own /metrics endpoint — effectively a parallel observability system.
Consider registering it through ObserverService so it appears in the system-level health/status aggregation.
| @@ -0,0 +1,176 @@ | |||
| # Copyright (c) 2026 Beijing Volcano Engine Technology Co., Ltd. | |||
There was a problem hiding this comment.
[Suggestion] (non-blocking)
PR description references #350 and #420 as motivation, but #350 is about decoupling ingestion from indexing and #420 is about integration test coverage — neither is directly about Prometheus/observability. Consider creating a dedicated issue to track this feature for clearer traceability.
|
Addressed all blocking feedback in b41738f:
Re: ObserverService integration (non-blocking) - agreed this should use the existing pattern. Happy to refactor in a follow-up once the core metrics are validated. |
qin-ctx
left a comment
There was a problem hiding this comment.
Thanks for addressing the prior review feedback — the main structural issues are resolved (record_* methods wired, router statically registered, unrelated config changes removed, tests added).
However, the VLM duration metric is broken: duration_seconds always defaults to 0.0 because no VLM backend passes actual timing data. CI lint also fails on 2 files. See inline comment for details.
| def update_token_usage( | ||
| self, model_name: str, provider: str, prompt_tokens: int, completion_tokens: int | ||
| self, model_name: str, provider: str, prompt_tokens: int, completion_tokens: int, | ||
| duration_seconds: float = 0.0, |
There was a problem hiding this comment.
[Bug] (blocking)
duration_seconds defaults to 0.0, but none of the VLM backends (OpenAIVLM, VolcEngineVLM, LiteLLMVLMProvider) pass actual timing data when calling update_token_usage(). All calls go through _update_token_usage_from_response(response) which only passes model_name, provider, prompt_tokens, and completion_tokens.
This means openviking_vlm_call_duration_seconds histogram will record all VLM calls with 0-second duration — the count will be correct but the sum and bucket distribution will be meaningless.
Additionally, mixing call duration into a token-usage tracking method conflates two different concerns. Consider instead timing the VLM call at the call site (in get_completion / get_completion_async) and recording to Prometheus directly:
# In VLM backend call sites:
t0 = time.monotonic()
response = await client.chat.completions.create(**kwargs)
elapsed = time.monotonic() - t0
try:
from openviking.storage.observers.prometheus_observer import get_prometheus_observer
prom = get_prometheus_observer()
if prom is not None:
prom.record_vlm_call(elapsed)
except Exception:
pass
self._update_token_usage_from_response(response)This also avoids changing the update_token_usage() signature.
[Bug] (blocking)
CI lint fails: ruff format --check reports this file and tests/server/test_prometheus_metrics.py need reformatting. Please run ruff format on the changed files.
Adds PrometheusObserver implementing BaseObserver with thread-safe counters and histograms for retrieval, embedding, VLM, and cache metrics. Exposes /metrics endpoint in Prometheus text exposition format. Opt-in via server.telemetry.prometheus.enabled config. No new dependencies - generates Prometheus text format manually.
…ess review feedback - Hook observer into RetrievalStatsCollector and other data paths - Register metrics router statically in create_app() - Remove unrelated with_bot/bot_api_url config changes
Time each VLM API call using time.perf_counter() and pass the measured duration through to update_token_usage(), which records it in the Prometheus histogram. Previously duration_seconds always defaulted to 0.0 because no backend passed actual timing data. Now all three backends (OpenAI, VolcEngine, LiteLLM) measure wall-clock time around the API call in get_completion, get_completion_async, get_vision_completion, and get_vision_completion_async. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b41738f to
3e1da74
Compare
|
Fixed in 3e1da74: added Also rebased on upstream/main to resolve the CI lint failures (they were from unrelated bot/ files in the diff). |
|
Thanks for the thorough review and merge! The observer pattern suggestion made this much cleaner. |
Summary
Adds a Prometheus-compatible metrics exporter that integrates with OpenViking's existing observer infrastructure. Exposes
/metricsendpoint in Prometheus text exposition format. Opt-in via config.Metrics exposed
openviking_retrieval_requests_totalopenviking_retrieval_latency_secondsopenviking_embedding_requests_totalopenviking_embedding_latency_secondsopenviking_vlm_calls_totalopenviking_vlm_call_duration_secondsopenviking_cache_hits_total{level}openviking_cache_misses_total{level}Why this matters
OpenViking has a rich internal observer system (
/openviking/storage/observers/) with 5 existing implementations and a telemetry module that computes metrics internally. But there's zero Prometheus or OpenTelemetry integration -grepfor both returns nothing. The/healthendpoint returns pass/fail only, no quantitative metrics.LangChain ships
langchain-opentelemetryand CrewAI has built-in tracing. This brings standard observability to OpenViking for production deployments.#350 (3 thumbs-up, highest-voted feature) and #420 both imply production-readiness concerns that metrics address.
Changes
openviking/storage/observers/prometheus_observer.py: NewPrometheusObserverimplementingBaseObserver. Thread-safe counters and histograms. Manual Prometheus text format generation (no new dependencies).openviking/server/routers/metrics.py: GET/metricsendpoint returningtext/plain; version=0.0.4.openviking/server/config.py:PrometheusConfigandTelemetryConfigdataclasses.openviking/server/app.py: Lifespan creates observer when enabled, includes metrics router.__init__.pyexports updated.Configuration
Disabled by default. No new pip dependencies.
This contribution was developed with AI assistance (Claude Code).