Skip to content

[Chore][Docs]: daily drift check — multi-process mode #3076

Merged
ApostaC merged 2 commits intodevfrom
claude/determined-lovelace-l6lOx
Apr 21, 2026
Merged

[Chore][Docs]: daily drift check — multi-process mode #3076
ApostaC merged 2 commits intodevfrom
claude/determined-lovelace-l6lOx

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Apr 18, 2026

Summary

Auto-generated by the daily multi-process docs drift-check routine. Aligns MP user docs with the current code in lmcache/v1/multiprocess/ and lmcache/v1/mp_observability/. Docs only — no code changes. Needs a human reviewer before merge.

docs/src/ (user-facing)

docs/source/mp/configuration.rst — Full Example

Stale flags no longer in the MP argparse surface:

  • --prometheus-log-interval
  • --enable-telemetry
  • --telemetry-processor

Current equivalents in add_observability_args():

  • --metrics-sample-rate
  • --enable-tracing
  • --otlp-endpoint

Evidence: argparse group at lmcache/v1/mp_observability/config.py#L79-L211. grep -rn "prometheus-log-interval\|enable-telemetry\|telemetry-processor" lmcache/ → 0 matches. Running the old Full Example fails with argparse: unrecognized arguments.

docs/source/mp/architecture.rst

Describes an obsolete PrometheusController / TelemetryController stack that has been replaced by the EventBus observability stack. None of the referenced symbols or files exist on dev:

  • High-level diagram: PrometheusController + TelemetryController (observability)
  • Config snippet: add_prometheus_args(parser) + add_telemetry_args(parser)
  • "Observability Internals" prose: PrometheusController, TelemetryController, --telemetry-max-queue-size
  • "How to Extend → Adding a telemetry processor": TelemetryProcessorConfig, register_telemetry_processor_type, TelemetryProcessor
  • "Key Source Files" table: PrometheusConfig, lmcache/v1/mp_observability/prometheus_controller.py, lmcache/v1/mp_observability/telemetry/config.py, lmcache/v1/mp_observability/telemetry/controller.py

Evidence: grep -rn "PrometheusController\|TelemetryController\|register_telemetry_processor_type" lmcache/ → 0 matches. The real architecture:

docs/source/mp/observability.rst — "Trace Recording"

The new trace section (added in #3063) describes the trace file Header as carrying "the LMCache version" in two places. The actual Header struct has a trace_schema_version field — not an lmcache version. The mirrored design doc already uses the correct name.

Evidence: lmcache/v1/mp_observability/trace/format.py#L48-L80 vs docs/design/v1/mp_observability/trace.md#L264-L275.

docs/design/

No new drift observed this run — the design docs under docs/design/v1/mp_observability/trace.md already match lmcache/v1/mp_observability/trace/format.py.

Proposed fix

Applied in this PR. Surgical doc edits only — no code touched.

Notes for reviewers

  • All three docs/src/mp/ drifts are pre-existing (they predate the last 24 h window), not regressions from recent commits. Flagged because they are user-visible and the stale flags in configuration.rst would actively break the documented command.
  • This was auto-generated by the daily drift-check routine and needs human review before merge.

Test plan

  • Sphinx builds docs/ without new warnings.
  • Eyeball the rendered mp/architecture.rst "Observability Internals" and "Adding an observability subscriber" sections.
  • Confirm the updated Full Example in mp/configuration.rst runs against a local lmcache server without unrecognized arguments.

Note

Low Risk
Docs-only updates that align CLI flags and observability architecture descriptions with the current implementation; no runtime behavior changes.

Overview
Updates multiprocess-mode docs to reflect the current EventBus + OpenTelemetry observability stack, replacing outdated references to PrometheusController/TelemetryController, their CLI wiring, and the old “telemetry processor” extension path.

Refreshes the mp/configuration.rst full command example to remove stale flags and use the current observability options (e.g. --metrics-sample-rate, --enable-tracing, --otlp-endpoint). Also fixes the trace recording docs to describe the header’s trace_schema_version field correctly.

Reviewed by Cursor Bugbot for commit 09cde38. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the multiprocess observability documentation to reflect the transition from the PrometheusController and TelemetryController architecture to a new EventBus and OpenTelemetry (OTel) provider-based system. It updates architectural diagrams, configuration examples, extension guides, and file references to align with the new implementation. Feedback was provided to correct documentation inaccuracies regarding the Prometheus and OTel endpoints, which are implemented as mutually exclusive but were described as potentially concurrent in the text and examples.

Comment on lines +303 to +306
calls bind to the real provider. Metrics are exported both to an
in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``,
default 9090) and, when ``--otlp-endpoint`` is set, pushed to an OTel
collector.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation states that metrics are exported to both the Prometheus endpoint and the OTel collector when --otlp-endpoint is set. However, the implementation in lmcache/v1/mp_observability/otel_init.py (lines 43-79) shows that these modes are mutually exclusive: the Prometheus HTTP server is only started if otlp_endpoint is None. This inaccuracy should be corrected to reflect the actual behavior.

Suggested change
calls bind to the real provider. Metrics are exported both to an
in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``,
default 9090) and, when ``--otlp-endpoint`` is set, pushed to an OTel
collector.
calls bind to the real provider. Metrics are exported to an
in-process Prometheus ``/metrics`` endpoint (``--prometheus-port``,
default 9090) or, when ``--otlp-endpoint`` is set, pushed to an OTel
collector.
References
  1. The style guide requires fixing inaccurate documentation for user-facing changes. (link)

Comment on lines 399 to +402
--prometheus-port 9090 \
--prometheus-log-interval 10 \
--enable-telemetry \
--telemetry-processor '{"type": "logging", "log_level": "DEBUG"}'
--metrics-sample-rate 0.01 \
--enable-tracing \
--otlp-endpoint http://localhost:4317
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In this "Full Example", both --prometheus-port and --otlp-endpoint are provided. Since the implementation in otel_init.py makes these mutually exclusive, the --prometheus-port flag will be ignored when an OTLP endpoint is specified. To avoid confusion in a user-facing example, I suggest removing the redundant flag.

Suggested change
--prometheus-port 9090 \
--prometheus-log-interval 10 \
--enable-telemetry \
--telemetry-processor '{"type": "logging", "log_level": "DEBUG"}'
--metrics-sample-rate 0.01 \
--enable-tracing \
--otlp-endpoint http://localhost:4317
--metrics-sample-rate 0.01 \
--enable-tracing \
--otlp-endpoint http://localhost:4317
References
  1. The style guide requires fixing inaccurate documentation for user-facing changes. (link)

@ApostaC ApostaC marked this pull request as ready for review April 21, 2026 18:22
@ApostaC ApostaC changed the title docs: daily drift check — multi-process mode (2026-04-18) [Chore][Docs]: daily drift check — multi-process mode Apr 21, 2026
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@royyhuang royyhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@royyhuang royyhuang added the full Run comprehensive tests on this PR label Apr 21, 2026
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC force-pushed the claude/determined-lovelace-l6lOx branch from 7a4911a to 90c698f Compare April 21, 2026 20:55
@ApostaC ApostaC merged commit 4b2f067 into dev Apr 21, 2026
30 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants