[Chore][Docs]: daily drift check — multi-process mode #3076
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the multiprocess observability documentation to reflect the transition from the PrometheusController and TelemetryController architecture to a new EventBus and OpenTelemetry (OTel) provider-based system. It updates architectural diagrams, configuration examples, extension guides, and file references to align with the new implementation. Feedback was provided to correct documentation inaccuracies regarding the Prometheus and OTel endpoints, which are implemented as mutually exclusive but were described as potentially concurrent in the text and examples.
| calls bind to the real provider. Metrics are exported both to an | ||
| in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``, | ||
| default 9090) and, when ``--otlp-endpoint`` is set, pushed to an OTel | ||
| collector. |
There was a problem hiding this comment.
The documentation states that metrics are exported to both the Prometheus endpoint and the OTel collector when --otlp-endpoint is set. However, the implementation in lmcache/v1/mp_observability/otel_init.py (lines 43-79) shows that these modes are mutually exclusive: the Prometheus HTTP server is only started if otlp_endpoint is None. This inaccuracy should be corrected to reflect the actual behavior.
| calls bind to the real provider. Metrics are exported both to an | |
| in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``, | |
| default 9090) and, when ``--otlp-endpoint`` is set, pushed to an OTel | |
| collector. | |
| calls bind to the real provider. Metrics are exported to an | |
| in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``, | |
| default 9090) or, when ``--otlp-endpoint`` is set, pushed to an OTel | |
| collector. |
References
- The style guide requires fixing inaccurate documentation for user-facing changes. (link)
| --prometheus-port 9090 \ | ||
| --prometheus-log-interval 10 \ | ||
| --enable-telemetry \ | ||
| --telemetry-processor '{"type": "logging", "log_level": "DEBUG"}' | ||
| --metrics-sample-rate 0.01 \ | ||
| --enable-tracing \ | ||
| --otlp-endpoint http://localhost:4317 |
There was a problem hiding this comment.
In this "Full Example", both --prometheus-port and --otlp-endpoint are provided. Since the implementation in otel_init.py makes these mutually exclusive, the --prometheus-port flag will be ignored when an OTLP endpoint is specified. To avoid confusion in a user-facing example, I suggest removing the redundant flag.
| --prometheus-port 9090 \ | |
| --prometheus-log-interval 10 \ | |
| --enable-telemetry \ | |
| --telemetry-processor '{"type": "logging", "log_level": "DEBUG"}' | |
| --metrics-sample-rate 0.01 \ | |
| --enable-tracing \ | |
| --otlp-endpoint http://localhost:4317 | |
| --metrics-sample-rate 0.01 \ | |
| --enable-tracing \ | |
| --otlp-endpoint http://localhost:4317 |
References
- The style guide requires fixing inaccurate documentation for user-facing changes. (link)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
7a4911a to
90c698f
Compare
Summary
Auto-generated by the daily multi-process docs drift-check routine. Aligns MP user docs with the current code in
lmcache/v1/multiprocess/andlmcache/v1/mp_observability/. Docs only — no code changes. Needs a human reviewer before merge.docs/src/(user-facing)docs/source/mp/configuration.rst— Full ExampleStale flags no longer in the MP argparse surface:
--prometheus-log-interval--enable-telemetry--telemetry-processorCurrent equivalents in
add_observability_args():--metrics-sample-rate--enable-tracing--otlp-endpointEvidence: argparse group at
lmcache/v1/mp_observability/config.py#L79-L211.grep -rn "prometheus-log-interval\|enable-telemetry\|telemetry-processor" lmcache/→ 0 matches. Running the old Full Example fails withargparse: unrecognized arguments.docs/source/mp/architecture.rstDescribes an obsolete
PrometheusController/TelemetryControllerstack that has been replaced by the EventBus observability stack. None of the referenced symbols or files exist ondev:PrometheusController + TelemetryController (observability)add_prometheus_args(parser)+add_telemetry_args(parser)PrometheusController,TelemetryController,--telemetry-max-queue-sizeTelemetryProcessorConfig,register_telemetry_processor_type,TelemetryProcessorPrometheusConfig,lmcache/v1/mp_observability/prometheus_controller.py,lmcache/v1/mp_observability/telemetry/config.py,lmcache/v1/mp_observability/telemetry/controller.pyEvidence:
grep -rn "PrometheusController\|TelemetryController\|register_telemetry_processor_type" lmcache/→ 0 matches. The real architecture:lmcache/v1/mp_observability/event_bus.pylmcache/v1/mp_observability/subscribers/init_observability():lmcache/v1/mp_observability/config.py#L253-L340docs/source/mp/observability.rst— "Trace Recording"The new trace section (added in #3063) describes the trace file
Headeras carrying "the LMCache version" in two places. The actualHeaderstruct has atrace_schema_versionfield — not anlmcacheversion. The mirrored design doc already uses the correct name.Evidence:
lmcache/v1/mp_observability/trace/format.py#L48-L80vsdocs/design/v1/mp_observability/trace.md#L264-L275.docs/design/No new drift observed this run — the design docs under
docs/design/v1/mp_observability/trace.mdalready matchlmcache/v1/mp_observability/trace/format.py.Proposed fix
Applied in this PR. Surgical doc edits only — no code touched.
Notes for reviewers
docs/src/mp/drifts are pre-existing (they predate the last 24 h window), not regressions from recent commits. Flagged because they are user-visible and the stale flags inconfiguration.rstwould actively break the documented command.Test plan
docs/without new warnings.mp/architecture.rst"Observability Internals" and "Adding an observability subscriber" sections.mp/configuration.rstruns against a locallmcache serverwithoutunrecognized arguments.Note
Low Risk
Docs-only updates that align CLI flags and observability architecture descriptions with the current implementation; no runtime behavior changes.
Overview
Updates multiprocess-mode docs to reflect the current EventBus + OpenTelemetry observability stack, replacing outdated references to
PrometheusController/TelemetryController, their CLI wiring, and the old “telemetry processor” extension path.Refreshes the
mp/configuration.rstfull command example to remove stale flags and use the current observability options (e.g.--metrics-sample-rate,--enable-tracing,--otlp-endpoint). Also fixes the trace recording docs to describe the header’strace_schema_versionfield correctly.Reviewed by Cursor Bugbot for commit 09cde38. Bugbot is set up for automated code reviews on this repo. Configure here.