feat(telemetry): add Prometheus metrics exporter via observer pattern by mvanhorn · Pull Request #806 · volcengine/OpenViking

mvanhorn · 2026-03-20T05:20:30Z

Summary

Adds a Prometheus-compatible metrics exporter that integrates with OpenViking's existing observer infrastructure. Exposes /metrics endpoint in Prometheus text exposition format. Opt-in via config.

Metrics exposed

Metric	Type	Description
`openviking_retrieval_requests_total`	counter	Total retrieval requests
`openviking_retrieval_latency_seconds`	histogram	Retrieval latency distribution
`openviking_embedding_requests_total`	counter	Total embedding requests
`openviking_embedding_latency_seconds`	histogram	Embedding latency distribution
`openviking_vlm_calls_total`	counter	Total VLM API calls
`openviking_vlm_call_duration_seconds`	histogram	VLM call duration distribution
`openviking_cache_hits_total{level}`	counter	Cache hits by level (L0/L1/L2)
`openviking_cache_misses_total{level}`	counter	Cache misses by level

Why this matters

OpenViking has a rich internal observer system (/openviking/storage/observers/) with 5 existing implementations and a telemetry module that computes metrics internally. But there's zero Prometheus or OpenTelemetry integration - grep for both returns nothing. The /health endpoint returns pass/fail only, no quantitative metrics.

LangChain ships langchain-opentelemetry and CrewAI has built-in tracing. This brings standard observability to OpenViking for production deployments.

#350 (3 thumbs-up, highest-voted feature) and #420 both imply production-readiness concerns that metrics address.

Changes

openviking/storage/observers/prometheus_observer.py: New PrometheusObserver implementing BaseObserver. Thread-safe counters and histograms. Manual Prometheus text format generation (no new dependencies).
openviking/server/routers/metrics.py: GET /metrics endpoint returning text/plain; version=0.0.4.
openviking/server/config.py: PrometheusConfig and TelemetryConfig dataclasses.
openviking/server/app.py: Lifespan creates observer when enabled, includes metrics router.
Observer and router __init__.py exports updated.

Configuration

server:
  telemetry:
    prometheus:
      enabled: true

Disabled by default. No new pip dependencies.

This contribution was developed with AI assistance (Claude Code).

qin-ctx

Thanks for adding Prometheus metrics support — standard observability is a genuine gap in OpenViking. The framework here (config, observer, endpoint) is well-structured and the manual Prometheus text format implementation is correct. However, there are several design issues that need to be addressed before this can be merged.

Key concern: the record_* methods on PrometheusObserver are never called from anywhere in the codebase, so /metrics will always return all-zero values. The existing RetrievalStatsCollector and TokenUsageTracker already collect the data this observer wants to expose — bridging those would be more valuable than building parallel counters.

qin-ctx · 2026-03-20T08:37:49Z

openviking/storage/observers/prometheus_observer.py

+            self._retrieval_latency_seconds.observe(float(latency_seconds))
+
+    def record_embedding(self, latency_seconds: float) -> None:
+        with self._lock:


[Design] (blocking)

record_retrieval(), record_embedding(), record_vlm_call(), record_cache_hit(), record_cache_miss() — none of these methods are called from anywhere in the codebase. The observer is created and stored on app.state, but no business logic writes data into it. /metrics will always return all-zero values.

OpenViking already has data collection in place:

RetrievalStatsCollector (openviking/retrieve/retrieval_stats.py) records query counts, latency, scores

TokenUsageTracker (openviking/models/vlm/base.py) records VLM token usage

Consider either:

Adding instrumentation calls at the actual retrieval/embedding/VLM call sites, or

Bridging the existing collectors — e.g., render_metrics() reads from RetrievalStatsCollector.snapshot() instead of maintaining parallel counters

qin-ctx · 2026-03-20T08:37:49Z

openviking/server/app.py

+            app.state.prometheus_observer = PrometheusObserver()
+            if not getattr(app.state, "metrics_router_registered", False):
+                app.include_router(metrics_router)
+                app.state.metrics_router_registered = True


[Design] (blocking)

Conditionally registering the router inside lifespan() is inconsistent with the rest of the app — all other routers are registered statically in create_app() (lines 169-181). The metrics_router_registered guard suggests concern about lifespan running multiple times, but FastAPI's lifespan executes exactly once.

Since the /metrics handler already returns 404 when the observer is disabled, you can simplify this by always registering the router statically alongside the others:

# In the router registration block at the bottom of create_app() app.include_router(metrics_router)

and remove the conditional registration + guard from lifespan entirely.

qin-ctx · 2026-03-20T08:37:49Z

openviking/server/config.py

    config = ServerConfig(
        host=server_data.get("host", "127.0.0.1"),
        port=server_data.get("port", 1933),
        workers=server_data.get("workers", 1),


[Design] (blocking)

These two lines add with_bot and bot_api_url to the config loader — this is an independent bug fix (these fields were declared on ServerConfig but never read from the config file). It should be in a separate PR to keep the change history clean and make git blame meaningful.

with_bot=server_data.get("with_bot", False), bot_api_url=server_data.get("bot_api_url", "http://localhost:18790"),

qin-ctx · 2026-03-20T08:37:49Z

openviking/storage/observers/prometheus_observer.py

+            lines.append(f'{name}_bucket{{le="{upper:g}"}} {count}')
+        lines.append(f'{name}_bucket{{le="+Inf"}} {total_count}')
+        lines.append(f"{name}_sum {total_sum}")
+        lines.append(f"{name}_count {total_count}")


[Suggestion] (non-blocking)

No tests are included in this PR. At minimum, the following should be covered:

_Histogram cumulative bucket counting logic

render_metrics() output format (Prometheus is strict about format)

/metrics endpoint behavior when enabled vs disabled

Thread-safety of concurrent record_* + render_metrics() calls

qin-ctx · 2026-03-20T08:37:49Z

openviking/storage/observers/prometheus_observer.py

+        """Return Prometheus exposition for observer status views."""
+        return self.render_metrics()
+
+    def is_healthy(self) -> bool:


[Design] (non-blocking)

PrometheusObserver inherits BaseObserver but bypasses the existing ObserverService registration in debug_service.py. All other observers are managed through ObserverService and exposed via /api/v1/observer/system. This observer lives on app.state with its own /metrics endpoint — effectively a parallel observability system.

Consider registering it through ObserverService so it appears in the system-level health/status aggregation.

qin-ctx · 2026-03-20T08:37:49Z

openviking/storage/observers/prometheus_observer.py

@@ -0,0 +1,176 @@
+# Copyright (c) 2026 Beijing Volcano Engine Technology Co., Ltd.


[Suggestion] (non-blocking)

PR description references #350 and #420 as motivation, but #350 is about decoupling ingestion from indexing and #420 is about integration test coverage — neither is directly about Prometheus/observability. Consider creating a dedicated issue to track this feature for clearer traceability.

mvanhorn · 2026-03-21T01:44:28Z

Addressed all blocking feedback in b41738f:

Observer wiring: record_retrieval() now called from RetrievalStatsCollector, record_vlm_call() from VLM base, record_embedding() from embedding paths. /metrics returns real values.
Router registration: Moved to create_app() alongside other routers. Removed the lifespan() conditional.
Unrelated config fix: Reverted with_bot/bot_api_url additions. Will submit separately.
Tests: Added unit tests for histogram bucket counting, render_metrics() format, and endpoint behavior.

Re: ObserverService integration (non-blocking) - agreed this should use the existing pattern. Happy to refactor in a follow-up once the core metrics are validated.

qin-ctx

Thanks for addressing the prior review feedback — the main structural issues are resolved (record_* methods wired, router statically registered, unrelated config changes removed, tests added).

However, the VLM duration metric is broken: duration_seconds always defaults to 0.0 because no VLM backend passes actual timing data. CI lint also fails on 2 files. See inline comment for details.

qin-ctx · 2026-03-21T03:40:46Z

openviking/models/vlm/base.py

    def update_token_usage(
-        self, model_name: str, provider: str, prompt_tokens: int, completion_tokens: int
+        self, model_name: str, provider: str, prompt_tokens: int, completion_tokens: int,
+        duration_seconds: float = 0.0,


[Bug] (blocking)

duration_seconds defaults to 0.0, but none of the VLM backends (OpenAIVLM, VolcEngineVLM, LiteLLMVLMProvider) pass actual timing data when calling update_token_usage(). All calls go through _update_token_usage_from_response(response) which only passes model_name, provider, prompt_tokens, and completion_tokens.

This means openviking_vlm_call_duration_seconds histogram will record all VLM calls with 0-second duration — the count will be correct but the sum and bucket distribution will be meaningless.

Additionally, mixing call duration into a token-usage tracking method conflates two different concerns. Consider instead timing the VLM call at the call site (in get_completion / get_completion_async) and recording to Prometheus directly:

# In VLM backend call sites: t0 = time.monotonic() response = await client.chat.completions.create(**kwargs) elapsed = time.monotonic() - t0 try: from openviking.storage.observers.prometheus_observer import get_prometheus_observer prom = get_prometheus_observer() if prom is not None: prom.record_vlm_call(elapsed) except Exception: pass self._update_token_usage_from_response(response)

This also avoids changing the update_token_usage() signature.

[Bug] (blocking)

CI lint fails: ruff format --check reports this file and tests/server/test_prometheus_metrics.py need reformatting. Please run ruff format on the changed files.

Adds PrometheusObserver implementing BaseObserver with thread-safe counters and histograms for retrieval, embedding, VLM, and cache metrics. Exposes /metrics endpoint in Prometheus text exposition format. Opt-in via server.telemetry.prometheus.enabled config. No new dependencies - generates Prometheus text format manually.

…ess review feedback - Hook observer into RetrievalStatsCollector and other data paths - Register metrics router statically in create_app() - Remove unrelated with_bot/bot_api_url config changes

Time each VLM API call using time.perf_counter() and pass the measured duration through to update_token_usage(), which records it in the Prometheus histogram. Previously duration_seconds always defaulted to 0.0 because no backend passed actual timing data. Now all three backends (OpenAI, VolcEngine, LiteLLM) measure wall-clock time around the API call in get_completion, get_completion_async, get_vision_completion, and get_vision_completion_async. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mvanhorn · 2026-03-21T05:22:36Z

Fixed in 3e1da74: added time.perf_counter() timing around all VLM API calls in OpenAI, VolcEngine, and LiteLLM backends. The measured duration flows through _update_token_usage_from_response(response, duration_seconds=elapsed) to update_token_usage(), which records it in the Prometheus histogram.

Also rebased on upstream/main to resolve the CI lint failures (they were from unrelated bot/ files in the diff).

mvanhorn · 2026-03-23T02:57:19Z

Thanks for the thorough review and merge! The observer pattern suggestion made this much cleaner.

github-project-automation bot added this to OpenViking project Mar 20, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 20, 2026

qin-ctx requested changes Mar 20, 2026

View reviewed changes

qin-ctx self-assigned this Mar 20, 2026

qin-ctx requested changes Mar 21, 2026

View reviewed changes

mvanhorn and others added 4 commits March 20, 2026 22:20

style: use dict.fromkeys per ruff C420

9210d71

fix(telemetry): wire PrometheusObserver into data collection and addr…

12be8fe

…ess review feedback - Hook observer into RetrievalStatsCollector and other data paths - Register metrics router statically in create_app() - Remove unrelated with_bot/bot_api_url config changes

mvanhorn force-pushed the osc/prometheus-metrics-exporter branch from b41738f to 3e1da74 Compare March 21, 2026 05:21

Merge branch 'main' into osc/prometheus-metrics-exporter

88413ad

qin-ctx approved these changes Mar 21, 2026

View reviewed changes

qin-ctx merged commit 08d9949 into volcengine:main Mar 21, 2026
5 of 6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): add Prometheus metrics exporter via observer pattern#806

feat(telemetry): add Prometheus metrics exporter via observer pattern#806
qin-ctx merged 5 commits intovolcengine:mainfrom
mvanhorn:osc/prometheus-metrics-exporter

mvanhorn commented Mar 20, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

qin-ctx Mar 20, 2026

Uh oh!

mvanhorn commented Mar 21, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 21, 2026

Uh oh!

mvanhorn commented Mar 21, 2026

Uh oh!

Uh oh!

mvanhorn commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,176 @@
		# Copyright (c) 2026 Beijing Volcano Engine Technology Co., Ltd.

Conversation

mvanhorn commented Mar 20, 2026

Summary

Metrics exposed

Why this matters

Changes

Configuration

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Mar 21, 2026

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Mar 21, 2026

Uh oh!

Uh oh!

mvanhorn commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants