mooncake: read preferred_segment from environment by aoshen02 · Pull Request #37 · ivanium/vllm

aoshen02 · 2026-05-06T00:23:03Z

Problem

Owner-client deployments need MooncakeStore puts to prefer the node-local owner segment. With a shared YAML recipe, preferred_segment is per-instance state and should be injected by the launcher or owner wrapper after it determines the node-local advertised host.

Changes

Keep explicit extra_config.preferred_segment as the highest-priority override for backwards compatibility and manual overrides.
Export MOONCAKE_PREFERRED_SEGMENT=<owner_host>:<owner_segment_port> from run_vllm_with_mooncake_owner.sh.
Read MOONCAKE_PREFERRED_SEGMENT as the narrow vLLM fallback when no explicit preferred_segment is configured.
Let the owner wrapper skip default --kv-transfer-config injection when the wrapped vLLM command already supplies one, so PD MultiConnector recipes can still use the managed owner path.
Avoid scraping Mooncake master /metrics; topology selection stays with the component that starts the node-local owner and knows its advertised host:port.

Why this is not duplicating an existing PR

I checked the related open work in this fork before updating the PR. This is the small wrapper + vLLM consumer path for the owner-wrapper-provided MOONCAKE_PREFERRED_SEGMENT hint, and no open PR in this fork currently carries that complete path.

Validation

bash -n scripts/mooncake/run_vllm_with_mooncake_owner.sh

uv run --active --no-sync /home/aoshen/code/uv_envs/py312/bin/python -m pytest \
  tests/v1/kv_connector/unit/test_mooncake_store_worker.py \
  -k 'get_configured_preferred_segment' -v

Result: 4 passed, 26 deselected.

AI assistance

AI assistance was used. The submitting human should review every changed line and validate the deployment behavior before merge.

Co-authored-by: OpenAI Codex codex@openai.com

Today the only way to route owner-client puts to the local Mooncake segment is to set MOONCAKE_PREFERRED_SEGMENT or extra_config.preferred_segment manually. Both require an external wrapper to know the right host:port, which makes the optimization invisible to vanilla vLLM users and unsafe in multi-NIC environments where the wrapper guesses the wrong IP. Add a best-effort auto-detection path: 1. Enumerate every non-loopback local IPv4 via psutil.net_if_addrs() so multi-NIC hosts work without manual config. 2. GET the master /metrics endpoint (URL is read from extra_config.master_metrics_url or the MOONCAKE_MASTER_METRICS_URL env var) and parse `segment_total_capacity_bytes{segment="host:port"}`. 3. If a segment's host matches any local IP, return it as the preferred segment. Otherwise return None and fall back to the random allocator. Existing override paths are unchanged: an explicit extra_config["preferred_segment"] still takes precedence. If neither the override nor a metrics URL is configured the function returns None exactly as before, so this is purely additive for current deployments. Why this matters: - Owner-client deployments get the put-locality benefit without shipping a wrapper that exports MOONCAKE_PREFERRED_SEGMENT. - Other Mooncake topologies (shared owner / no owner) opt in by setting master_metrics_url. - Failures (no psutil, master unreachable, no segment match) all degrade to "no preference", so the feature can never break startup. Measured benefit (Kimi-K2.5-NVFP4 reference workload, GB200 NVL72, preferred_segment ON vs OFF, single-variable ablation): P:D TTFT regression OFF -> ON 1p1d +5.7% 3p1d +10.2% The TTFT gain comes from RDMA-path locality (puts land on the local segment so reads stay intra-host); ext_hit barely moves between the two runs, confirming the win is not a hit-rate effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-06T00:23:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

gemini-code-assist

Code Review

This pull request implements an auto-detection mechanism for the preferred Mooncake segment by matching local IPv4 addresses against segments listed in master metrics. It introduces utility functions to enumerate local non-loopback IPv4 addresses and fetch/parse Prometheus metrics. Feedback was provided regarding the regular expression used for parsing metrics, which was identified as being too restrictive for real-world Prometheus outputs that may contain multiple labels in varying orders.

gemini-code-assist · 2026-05-06T00:24:52Z

+_SEGMENT_METRIC_RE = re.compile(
+    r'^segment_total_capacity_bytes\{segment="([^"]+)"\}\s+', re.MULTILINE
+)


The regular expression for parsing Prometheus metrics is too restrictive. It assumes that segment is the only label and that there are no spaces within the braces. Prometheus does not guarantee label order, and other labels (such as instance, job, or cluster) are frequently present in real-world deployments. If other labels exist, this auto-detection will silently fail and fall back to the random allocator, negating the performance benefits of this PR.

Updating the regex to allow for other labels before or after the segment label makes the auto-detection much more robust.

Suggested change

_SEGMENT_METRIC_RE = re.compile(

r'^segment_total_capacity_bytes\{segment="([^"]+)"\}\s+', re.MULTILINE

)

_SEGMENT_METRIC_RE = re.compile(

r'^segment_total_capacity_bytes\{[^}]*segment="([^"]+)"[^}]*\}\s+', re.MULTILINE

)

Use MOONCAKE_PREFERRED_SEGMENT as a launcher-provided owner segment hint instead of scraping master metrics. Explicit extra_config.preferred_segment keeps priority over the environment fallback. Co-authored-by: OpenAI Codex <codex@openai.com>

Set MOONCAKE_PREFERRED_SEGMENT from the managed owner host and segment port so MooncakeStore workers can prefer the node-local owner segment. Co-authored-by: OpenAI Codex <codex@openai.com>

Let the owner wrapper skip default kv-transfer-config injection when the wrapped vLLM command already supplies one, so PD MultiConnector recipes can still use the managed owner path. Co-authored-by: OpenAI Codex <codex@openai.com>

…og + perf fixes Squashed forward-port of feat/mooncake-store-int-owner-client + PR-32 (tier-log) + PR-31 (segment readiness probe) + PR-36 (enable_offload gate + count-trigger removal) + PR-37 (preferred_segment env) + bind_gpu_block_pool MultiConnector fix, on top of the rebased feat/mooncake-store-connector. What lands ========== Owner-client topology - scripts/mooncake/: master + owner launcher + RDMA auto-detection helpers - mndp.yaml: Kimi-K2.5-NVFP4 / Qwen3-8B DP=4 vigil recipe - vllm/distributed/.../mooncake/rdma_utils.py: RNIC + GID detection - requester-only setup refactor: vLLM ranks pass global_segment_size=0; the separate mooncake_client owner contributes the CPU pool + SSD tier Disk offload (re-introduced after intentional revert on PR-40900) - batch budget tracking with backpressure - batch-splitting for disk-tier loads - LookupKeyServer wiring (restored after cherry-pick drop) - store_py.cpp + standalone-binary fixes are companion in kvcache-ai#2083 PR-32 observability - VLLM_MOONCAKE_STORE_TIER_LOG=1: per-batch "Mooncake load tier summary" lines showing memory_keys/disk_keys/unknown_keys/success_keys/failed_keys and bytes_by_tier breakdown PR-36 perf - _get_disk_offload_buffer_budget_bytes(enable_offload) returns None when off - enable_offload field on MooncakeStoreConfig (read from JSON or MOONCAKE_ENABLE_OFFLOAD env) - dropped redundant count-based split trigger that was firing on every batch with ≥2 keys, doubling owner GET-RPCs PR-37 preferred_segment env - MOONCAKE_PREFERRED_SEGMENT env var falls through to rdma_utils.py - run_vllm_with_mooncake_owner.sh exports it for the spawned vLLM MultiConnector bind_gpu_block_pool proxy - allow simple_cpu_backend-style child connectors to bind to the GPU block pool when wrapped by MultiConnector (PD-disaggregated setups) Validation ========== End-to-end mndp run (Qwen3-8B DP=4, 100 conv × 3 turns × 32 concurrent, 16K input, --gpu-memory-utilization 0.4 --num-gpu-blocks-override 1024, 4 GiB owner CPU pool to force SSD spillover): - 99/100 conversations completed - 219 tier-log lines, 100% with disk_keys>0, 0 failed_keys - 59.5 GB read back from SSD - After PR-36 layered on: Mean TTFT -61%, Mean E2EL -58%, tier-log batches halved (219 → 115) confirming count-trigger removal Layout ====== The owner-client commits assumed flat module layout (mooncake_store_*.py) while feat/mooncake-store-connector kept store/ subdir. Reconciled to flat layout to match upstream PR-40900 author's pre-revert state. Related ======= - vllm-project#40900 — parent PR (basic connector); this stacks on it - ivanium#31 — segment-port readiness probe (folded in) - ivanium#32 — VLLM_MOONCAKE_STORE_TIER_LOG (folded in) - ivanium#36 — enable_offload gate + count-trigger removal (folded in) - ivanium#37 — MOONCAKE_PREFERRED_SEGMENT env (folded in) - kvcache-ai/Mooncake#2083 — Mooncake-side standalone-binary disk-read fix

Co-authored-by: aoshen524 <aoshen524@gmail.com>

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

aoshen02 changed the title ~~mooncake: auto-detect preferred_segment from master /metrics~~ mooncake: read preferred_segment from environment May 6, 2026

aoshen524 and others added 2 commits May 6, 2026 01:44

fix(mooncake): export owner preferred segment

df65623

Set MOONCAKE_PREFERRED_SEGMENT from the managed owner host and segment port so MooncakeStore workers can prefer the node-local owner segment. Co-authored-by: OpenAI Codex <codex@openai.com>

fix(mooncake): allow wrapped kv-transfer-config

5dc651b

Let the owner wrapper skip default kv-transfer-config injection when the wrapped vLLM command already supplies one, so PD MultiConnector recipes can still use the managed owner path. Co-authored-by: OpenAI Codex <codex@openai.com>

ivanium approved these changes May 12, 2026

View reviewed changes

ivanium added the ready label May 12, 2026

ivanium merged commit e638d8f into ivanium:feat/mooncake-store-int-owner-client May 12, 2026
1 of 2 checks passed

zhewenl mentioned this pull request May 12, 2026

[Mooncake] Forward-port owner-client topology + disk-offload + tier-log + perf fixes (squashed, rebased) #47

Closed

zhewenl mentioned this pull request May 13, 2026

disk offloading #48

Open

4 tasks

huangyibo pushed a commit to huangyibo/vllm that referenced this pull request May 21, 2026

mooncake: read preferred_segment from environment (ivanium#37)

e37dc37

Co-authored-by: aoshen524 <aoshen524@gmail.com>

huangyibo pushed a commit to huangyibo/vllm that referenced this pull request Jun 4, 2026

mooncake: read preferred_segment from environment (ivanium#37)

78df1e8

Co-authored-by: aoshen524 <aoshen524@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mooncake: read preferred_segment from environment#37

mooncake: read preferred_segment from environment#37
ivanium merged 4 commits into
ivanium:feat/mooncake-store-int-owner-clientfrom
aoshen02:auto-detect-preferred-segment

aoshen02 commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aoshen02 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Why this is not duplicating an existing PR

Validation

AI assistance

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aoshen02 commented May 6, 2026 •

edited

Loading