Skip to content

feat: kv aware router + disagg router + prefill queue#11

Merged
tedzhouhk merged 82 commits into
mainfrom
hzhou/disagg_router
Mar 9, 2025
Merged

feat: kv aware router + disagg router + prefill queue#11
tedzhouhk merged 82 commits into
mainfrom
hzhou/disagg_router

Conversation

@tedzhouhk

Copy link
Copy Markdown
Contributor
  1. Integrate kv-aware router to vllm disagg (nixl)
  2. Implement a naive heuristics-based disagg router with etcd watcher in rust and integrate to vllm-nixl disagg via python bindings
  3. Prefill queue + pull-based prefill for load balancing

@rmccorm4 rmccorm4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but will review more in depth later.

Please call out any known issues or known areas to follow up on, if any.

@rmccorm4 rmccorm4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix copyright and precommits

@tedzhouhk tedzhouhk enabled auto-merge (squash) March 9, 2025 00:56
@tedzhouhk tedzhouhk merged commit 039f9a5 into main Mar 9, 2025
@tedzhouhk tedzhouhk deleted the hzhou/disagg_router branch March 9, 2025 01:09
kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
grahamking added a commit that referenced this pull request Oct 30, 2025
…onal

Dynamo frontend and backend both now run without needing etcd. The next
step is making them talk to each other.

Some features such as KV routing still require etcd.

Discovered and removed old unused `DisaggregatedRouter`. Added in #11 !

Signed-off-by: Graham King <grahamk@nvidia.com>
ranrubin added a commit that referenced this pull request Apr 20, 2026
Fixes all actionable items from the second review:

Bug fixes:
- #1: Change returncode=4 → returncode=2 in pytest_configure exit
  (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED)
- #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe
  (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original)

Test quality:
- #7: Add missing assertions to test_apply_hf_home_layout
  (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE)
- #8: Use monkeypatch in tests 3 & 4 for proper env isolation
  (prevents pre-existing env vars from leaking on test failure)

Design / correctness:
- #3: Fix _models_dir_env docstring ("exactly once" → "once per worker")
- #4: Add comment noting TRANSFORMERS_CACHE deprecation
- #5: Update --models-dir help text and docs to reflect both supported
  layouts (bare HF_HUB_CACHE and HF_HOME), not just bare
- #10: Restore pytest.skip() in download_lora() (test-only infra);
  remove now-redundant guard from minio_lora_service fixture
- #11: Raise hub/ detection log to WARNING with guidance
- #12: Replace shutil.rmtree(ignore_errors=True) with try/except
  so cleanup failures are logged rather than silently swallowed

Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester
test deferred — complex due to conftest dependencies, low severity)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: rrubin <rrubin@nvidia.com>
biswapanda added a commit that referenced this pull request May 8, 2026
Quick-win review fixes from PR #9131. Heavy-lift items (#9
prompt_token_ids env-gate, #11 update_weights atomicity, #13
per-choice completion_token_ids) tracked separately as follow-ups.

handlers.py
  - Catch EngineDeadError before the generic except in all 8 RL handlers
    (pause/resume/liveness_probe/get_state/flush_cache/update_weights_from_path/
    load_lora_adapter/unload_lora_adapter): match the existing shutdown
    pattern in this file so admin calls also surface engine death instead
    of leaving a broken worker alive.
  - get_state: fall back to a no-op collective_rpc when check_health is
    absent — same fallback liveness_probe already uses, otherwise older
    engines without check_health always look alive.
  - load_lora_adapter hot-swap path: a remove_lora() failure now returns
    a 400-style error response (was: silent log warn + continue, leaving
    add_lora to no-op against the still-registered ID); a
    reset_prefix_cache() failure after add_lora succeeds also returns
    error (was: log error and continue, leaving stale KV from the old
    adapter routable).
  - unload_lora_adapter: an unregister_model() failure after engine
    remove_lora succeeds now returns error (was: log warn and report
    success, leaving model=<lora_name> still routed to this worker even
    though _resolve_lora_request would now fall back to the base model).

container/deps/vllm/install_vllm.sh
  - Pin prime-rl install to an immutable commit SHA
    (d49f3939e7dca29bceb9ed515cc1782497b67e81 ↔ tag v0.5.1.dev101) so a
    re-pointed tag upstream can't change what we ship. PRIME_RL_REF kept
    in build logs for human readability; PRIME_RL_COMMIT is the
    authoritative pin.
  - Replace `echo "\n=== ..."` with `printf '\n=== ...\n'` (shellcheck SC2028).

lib/llm/src/http/service/openai.rs
  - Force `request.inner.logprobs = Some(true)` unconditionally in both
    RL token-id promotion blocks (was: only when None). RL extraction of
    completion_token_ids depends on logprobs being on at the engine; an
    explicit logprobs=false would otherwise silently drop them.
  - Bound `/v1/rl/ready` per-worker probes with a 5s timeout (override
    via DYN_RL_LIVENESS_TIMEOUT_MS). Was reusing the shared 600s
    http_client, so one wedged worker could block readiness for 10
    minutes instead of failing fast as 503.
  - Tokenize Chat handler: call `request.validate()?` before
    `merged_chat_template_kwargs()` so the
    continue_final_message + add_generation_prompt mutual-exclusion
    constraint is enforced (validate() existed but was never invoked).

lib/llm/src/protocols/openai/chat_completions.rs
  - Update stale doc comments on the legacy `tokens` and
    `return_token_ids` fields: they pointed callers at the now-404
    `/v1/chat/completions/tokens` URI. Direct callers to the canonical
    top-level `prompt_token_ids` extension and `nvext.extra_fields`
    instead.

cargo check -p dynamo-llm: clean (1 pre-existing benign warning).
cargo test -p dynamo-llm --test test_common_ext: 15 passed.
kaim-eng added a commit that referenced this pull request May 11, 2026
Pre-Phase-5 ("hardware validation" per powerplanner-design.md §11) housekeeping:
makes the dev environment shareable across teammates by removing personal
identifiers, parameterizing all dev-pod / probe references, and folding 11
review fixes into the three Phase 1-4 design documents.

Dev-env hardening
-----------------
* Personal namespace (`kaim-dynamo-system*`) and pinned cluster node ID
  (`aks-a100a-36888584-vmss000002`) removed from every dev-env asset:
  - Root-level `dev-pod.yaml`, `qwen3-quickstart-dgd.yaml`, and
    `Dockerfile.planner-dev` moved to `deploy/planner/dev/` with
    `${NS}` / `${DGD}` / `${DYN_NS}` envsubst placeholders and inline
    usage instructions.
  - Root-level `test_k8s_access.py` moved to `scripts/dev/` and reads
    `DYN_PARENT_DGD_K8S_NAMESPACE` (or `POD_NAMESPACE`) at runtime.
  - 5 `scripts/inspect_*.py` cluster probes parameterized via
    `DYN_PARENT_DGD_K8S_NAMESPACE`; failure mode is loud (SystemExit)
    rather than a hard-coded namespace.
  - `deploy/power_agent/dev-pod.yaml`: `nodeName` switched to
    `<GPU_NODE_NAME>` placeholder with a `kubectl get pods ...`
    one-liner showing how to discover the right node.
* `.gitignore` hardened to enforce the existing `.tmp-*` "intentionally
  not committed" convention (matches `examples/deployments/powerplanner/
  .tmp-gp-minimal.yaml`) and to block the four common root-level
  personal-scratch files from sneaking back in via `git add .`.

dpp-dev-env.md updates
----------------------
* All 10 path references rewritten to point at the new
  `deploy/planner/dev/` and `scripts/dev/` homes.
* New §5 ("Deploy the Dev Pod") subsection documenting the
  `${NS}` / `${DGD}` / `${DYN_NS}` placeholder workflow with both an
  `envsubst` (Linux/WSL) path and a Windows edit-in-place path.
* Quick Deploy Checklist filename corrected from the stale-DGDR
  `qwen3-quickstart.yaml` to `qwen3-quickstart-dgd.yaml`, plus a
  cross-reference to the One-Time Setup §4 warning.
* DGD-ready wait command standardized to the programmatic
  `kubectl wait --for=jsonpath='{.status.state}'=successful` form
  in both §4 and the Checklist (removes the manual-`-w` divergence).

Design-doc review pass (powerplanner-design.md, powerplanner-testbed-design.md)
-------------------------------------------------------------------------------
* powerplanner-design.md
  - Header `Status` flipped from `Draft` to `Validated (Phases 1-3 -
    590/4 cold + 86/1 testbed; see §9.0 / §9.0.1). Phases 4-5 still
    draft.` with last-validation date.
  - §3.1: added an explicit note that `aic_interpolation` and `mode`
    are pre-existing PlannerConfig fields (owned by
    `monitoring/aic_interpolation.py`) and intentionally absent from
    the Names Registry.
  - §5.7 / §6.7: reworded the cross-reference to failure mode #5 so
    it correctly points at the *throughput regression* case rather
    than reading as a blanket "config revert".
  - §6.5 pseudocode: replaced module-level `self.device_count` with
    `pynvml.nvmlDeviceGetCount()` (the original was syntactically
    incorrect outside the daemon class).
  - §13 Open Question #11: corrected the duplicated
    `scheduled_decode_kv_tokens` typo so the agg-mode gate now reads
    `scheduled_prefill_tokens + scheduled_decode_kv_tokens` matching
    §5.3.
* powerplanner-testbed-design.md
  - §7: added a "Numeric-suffix convention" note explaining the
    intentional D21/E21 ID collision (IDs unique by filename + tuple,
    not renumbered).
  - §11: corrected "Six guards" -> "Seven guards" to match the seven
    items actually listed.
  - §5.2: typo `MNB` -> `MNBT (max_num_batched_tokens)`.
  - §C.14: verbatim test-output `30 PASSED` -> `31 PASSED` (1 wrapper
    + 30 parametrized) so the listing matches the actual run.

Verification
------------
* Testbed (alpha + gamma): 82 passed, 5 skipped (matches the documented
  Windows baseline; gamma auto-skipped without the Rust mocker).
* AIC no-cluster integration (test_aic_power_optimizer.py +
  test_aic_power_e2e_sim.py): 49 passed.
* Unit tests: 456 passed; the 9 remaining failures are pre-existing
  Windows-only environment limitations (cp1252 codec, `os.killpg`
  POSIX-only, missing `filterpy`) confirmed via `git stash` to be
  untouched by this commit.
* All touched .py files: `python3.10 -m py_compile` clean.
* All touched .yaml files: `yaml.safe_load_all` clean.
* `ReadLints` over all 12 touched files: no errors.
* Final `rg "kaim|aks-a100a-36888584"` outside committed
  `tests/fault_tolerance/...` (pre-existing, separate component) and
  `examples/.../.tmp-gp-minimal.yaml` (now gitignored): zero hits.

Co-authored-by: Cursor <cursoragent@cursor.com>
kaim-eng added a commit that referenced this pull request May 12, 2026
Pre-Phase-5 ("hardware validation" per powerplanner-design.md §11) housekeeping:
makes the dev environment shareable across teammates by removing personal
identifiers, parameterizing all dev-pod / probe references, and folding 11
review fixes into the three Phase 1-4 design documents.

Dev-env hardening
-----------------
* Personal namespace (`kaim-dynamo-system*`) and pinned cluster node ID
  (`aks-a100a-36888584-vmss000002`) removed from every dev-env asset:
  - Root-level `dev-pod.yaml`, `qwen3-quickstart-dgd.yaml`, and
    `Dockerfile.planner-dev` moved to `deploy/planner/dev/` with
    `${NS}` / `${DGD}` / `${DYN_NS}` envsubst placeholders and inline
    usage instructions.
  - Root-level `test_k8s_access.py` moved to `scripts/dev/` and reads
    `DYN_PARENT_DGD_K8S_NAMESPACE` (or `POD_NAMESPACE`) at runtime.
  - 5 `scripts/inspect_*.py` cluster probes parameterized via
    `DYN_PARENT_DGD_K8S_NAMESPACE`; failure mode is loud (SystemExit)
    rather than a hard-coded namespace.
  - `deploy/power_agent/dev-pod.yaml`: `nodeName` switched to
    `<GPU_NODE_NAME>` placeholder with a `kubectl get pods ...`
    one-liner showing how to discover the right node.
* `.gitignore` hardened to enforce the existing `.tmp-*` "intentionally
  not committed" convention (matches `examples/deployments/powerplanner/
  .tmp-gp-minimal.yaml`) and to block the four common root-level
  personal-scratch files from sneaking back in via `git add .`.

dpp-dev-env.md updates
----------------------
* All 10 path references rewritten to point at the new
  `deploy/planner/dev/` and `scripts/dev/` homes.
* New §5 ("Deploy the Dev Pod") subsection documenting the
  `${NS}` / `${DGD}` / `${DYN_NS}` placeholder workflow with both an
  `envsubst` (Linux/WSL) path and a Windows edit-in-place path.
* Quick Deploy Checklist filename corrected from the stale-DGDR
  `qwen3-quickstart.yaml` to `qwen3-quickstart-dgd.yaml`, plus a
  cross-reference to the One-Time Setup §4 warning.
* DGD-ready wait command standardized to the programmatic
  `kubectl wait --for=jsonpath='{.status.state}'=successful` form
  in both §4 and the Checklist (removes the manual-`-w` divergence).

Design-doc review pass (powerplanner-design.md, powerplanner-testbed-design.md)
-------------------------------------------------------------------------------
* powerplanner-design.md
  - Header `Status` flipped from `Draft` to `Validated (Phases 1-3 -
    590/4 cold + 86/1 testbed; see §9.0 / §9.0.1). Phases 4-5 still
    draft.` with last-validation date.
  - §3.1: added an explicit note that `aic_interpolation` and `mode`
    are pre-existing PlannerConfig fields (owned by
    `monitoring/aic_interpolation.py`) and intentionally absent from
    the Names Registry.
  - §5.7 / §6.7: reworded the cross-reference to failure mode #5 so
    it correctly points at the *throughput regression* case rather
    than reading as a blanket "config revert".
  - §6.5 pseudocode: replaced module-level `self.device_count` with
    `pynvml.nvmlDeviceGetCount()` (the original was syntactically
    incorrect outside the daemon class).
  - §13 Open Question #11: corrected the duplicated
    `scheduled_decode_kv_tokens` typo so the agg-mode gate now reads
    `scheduled_prefill_tokens + scheduled_decode_kv_tokens` matching
    §5.3.
* powerplanner-testbed-design.md
  - §7: added a "Numeric-suffix convention" note explaining the
    intentional D21/E21 ID collision (IDs unique by filename + tuple,
    not renumbered).
  - §11: corrected "Six guards" -> "Seven guards" to match the seven
    items actually listed.
  - §5.2: typo `MNB` -> `MNBT (max_num_batched_tokens)`.
  - §C.14: verbatim test-output `30 PASSED` -> `31 PASSED` (1 wrapper
    + 30 parametrized) so the listing matches the actual run.

Verification
------------
* Testbed (alpha + gamma): 82 passed, 5 skipped (matches the documented
  Windows baseline; gamma auto-skipped without the Rust mocker).
* AIC no-cluster integration (test_aic_power_optimizer.py +
  test_aic_power_e2e_sim.py): 49 passed.
* Unit tests: 456 passed; the 9 remaining failures are pre-existing
  Windows-only environment limitations (cp1252 codec, `os.killpg`
  POSIX-only, missing `filterpy`) confirmed via `git stash` to be
  untouched by this commit.
* All touched .py files: `python3.10 -m py_compile` clean.
* All touched .yaml files: `yaml.safe_load_all` clean.
* `ReadLints` over all 12 touched files: no errors.
* Final `rg "kaim|aks-a100a-36888584"` outside committed
  `tests/fault_tolerance/...` (pre-existing, separate component) and
  `examples/.../.tmp-gp-minimal.yaml` (now gitignored): zero hits.

Co-authored-by: Cursor <cursoragent@cursor.com>
kaim-eng added a commit that referenced this pull request May 12, 2026
Pre-Phase-5 ("hardware validation" per powerplanner-design.md §11) housekeeping:
makes the dev environment shareable across teammates by removing personal
identifiers, parameterizing all dev-pod / probe references, and folding 11
review fixes into the three Phase 1-4 design documents.

Dev-env hardening
-----------------
* Personal namespace (`kaim-dynamo-system*`) and pinned cluster node ID
  (`aks-a100a-36888584-vmss000002`) removed from every dev-env asset:
  - Root-level `dev-pod.yaml`, `qwen3-quickstart-dgd.yaml`, and
    `Dockerfile.planner-dev` moved to `deploy/planner/dev/` with
    `${NS}` / `${DGD}` / `${DYN_NS}` envsubst placeholders and inline
    usage instructions.
  - Root-level `test_k8s_access.py` moved to `scripts/dev/` and reads
    `DYN_PARENT_DGD_K8S_NAMESPACE` (or `POD_NAMESPACE`) at runtime.
  - 5 `scripts/inspect_*.py` cluster probes parameterized via
    `DYN_PARENT_DGD_K8S_NAMESPACE`; failure mode is loud (SystemExit)
    rather than a hard-coded namespace.
  - `deploy/power_agent/dev-pod.yaml`: `nodeName` switched to
    `<GPU_NODE_NAME>` placeholder with a `kubectl get pods ...`
    one-liner showing how to discover the right node.
* `.gitignore` hardened to enforce the existing `.tmp-*` "intentionally
  not committed" convention (matches `examples/deployments/powerplanner/
  .tmp-gp-minimal.yaml`) and to block the four common root-level
  personal-scratch files from sneaking back in via `git add .`.

dpp-dev-env.md updates
----------------------
* All 10 path references rewritten to point at the new
  `deploy/planner/dev/` and `scripts/dev/` homes.
* New §5 ("Deploy the Dev Pod") subsection documenting the
  `${NS}` / `${DGD}` / `${DYN_NS}` placeholder workflow with both an
  `envsubst` (Linux/WSL) path and a Windows edit-in-place path.
* Quick Deploy Checklist filename corrected from the stale-DGDR
  `qwen3-quickstart.yaml` to `qwen3-quickstart-dgd.yaml`, plus a
  cross-reference to the One-Time Setup §4 warning.
* DGD-ready wait command standardized to the programmatic
  `kubectl wait --for=jsonpath='{.status.state}'=successful` form
  in both §4 and the Checklist (removes the manual-`-w` divergence).

Design-doc review pass (powerplanner-design.md, powerplanner-testbed-design.md)
-------------------------------------------------------------------------------
* powerplanner-design.md
  - Header `Status` flipped from `Draft` to `Validated (Phases 1-3 -
    590/4 cold + 86/1 testbed; see §9.0 / §9.0.1). Phases 4-5 still
    draft.` with last-validation date.
  - §3.1: added an explicit note that `aic_interpolation` and `mode`
    are pre-existing PlannerConfig fields (owned by
    `monitoring/aic_interpolation.py`) and intentionally absent from
    the Names Registry.
  - §5.7 / §6.7: reworded the cross-reference to failure mode #5 so
    it correctly points at the *throughput regression* case rather
    than reading as a blanket "config revert".
  - §6.5 pseudocode: replaced module-level `self.device_count` with
    `pynvml.nvmlDeviceGetCount()` (the original was syntactically
    incorrect outside the daemon class).
  - §13 Open Question #11: corrected the duplicated
    `scheduled_decode_kv_tokens` typo so the agg-mode gate now reads
    `scheduled_prefill_tokens + scheduled_decode_kv_tokens` matching
    §5.3.
* powerplanner-testbed-design.md
  - §7: added a "Numeric-suffix convention" note explaining the
    intentional D21/E21 ID collision (IDs unique by filename + tuple,
    not renumbered).
  - §11: corrected "Six guards" -> "Seven guards" to match the seven
    items actually listed.
  - §5.2: typo `MNB` -> `MNBT (max_num_batched_tokens)`.
  - §C.14: verbatim test-output `30 PASSED` -> `31 PASSED` (1 wrapper
    + 30 parametrized) so the listing matches the actual run.

Verification
------------
* Testbed (alpha + gamma): 82 passed, 5 skipped (matches the documented
  Windows baseline; gamma auto-skipped without the Rust mocker).
* AIC no-cluster integration (test_aic_power_optimizer.py +
  test_aic_power_e2e_sim.py): 49 passed.
* Unit tests: 456 passed; the 9 remaining failures are pre-existing
  Windows-only environment limitations (cp1252 codec, `os.killpg`
  POSIX-only, missing `filterpy`) confirmed via `git stash` to be
  untouched by this commit.
* All touched .py files: `python3.10 -m py_compile` clean.
* All touched .yaml files: `yaml.safe_load_all` clean.
* `ReadLints` over all 12 touched files: no errors.
* Final `rg "kaim|aks-a100a-36888584"` outside committed
  `tests/fault_tolerance/...` (pre-existing, separate component) and
  `examples/.../.tmp-gp-minimal.yaml` (now gitignored): zero hits.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Kai Ma <kaim@nvidia.com>
kaim-eng added a commit that referenced this pull request May 18, 2026
Pre-Phase-5 ("hardware validation" per powerplanner-design.md §11) housekeeping:
makes the dev environment shareable across teammates by removing personal
identifiers, parameterizing all dev-pod / probe references, and folding 11
review fixes into the three Phase 1-4 design documents.

Dev-env hardening
-----------------
* Personal namespace (`kaim-dynamo-system*`) and pinned cluster node ID
  (`aks-a100a-36888584-vmss000002`) removed from every dev-env asset:
  - Root-level `dev-pod.yaml`, `qwen3-quickstart-dgd.yaml`, and
    `Dockerfile.planner-dev` moved to `deploy/planner/dev/` with
    `${NS}` / `${DGD}` / `${DYN_NS}` envsubst placeholders and inline
    usage instructions.
  - Root-level `test_k8s_access.py` moved to `scripts/dev/` and reads
    `DYN_PARENT_DGD_K8S_NAMESPACE` (or `POD_NAMESPACE`) at runtime.
  - 5 `scripts/inspect_*.py` cluster probes parameterized via
    `DYN_PARENT_DGD_K8S_NAMESPACE`; failure mode is loud (SystemExit)
    rather than a hard-coded namespace.
  - `deploy/power_agent/dev-pod.yaml`: `nodeName` switched to
    `<GPU_NODE_NAME>` placeholder with a `kubectl get pods ...`
    one-liner showing how to discover the right node.
* `.gitignore` hardened to enforce the existing `.tmp-*` "intentionally
  not committed" convention (matches `examples/deployments/powerplanner/
  .tmp-gp-minimal.yaml`) and to block the four common root-level
  personal-scratch files from sneaking back in via `git add .`.

dpp-dev-env.md updates
----------------------
* All 10 path references rewritten to point at the new
  `deploy/planner/dev/` and `scripts/dev/` homes.
* New §5 ("Deploy the Dev Pod") subsection documenting the
  `${NS}` / `${DGD}` / `${DYN_NS}` placeholder workflow with both an
  `envsubst` (Linux/WSL) path and a Windows edit-in-place path.
* Quick Deploy Checklist filename corrected from the stale-DGDR
  `qwen3-quickstart.yaml` to `qwen3-quickstart-dgd.yaml`, plus a
  cross-reference to the One-Time Setup §4 warning.
* DGD-ready wait command standardized to the programmatic
  `kubectl wait --for=jsonpath='{.status.state}'=successful` form
  in both §4 and the Checklist (removes the manual-`-w` divergence).

Design-doc review pass (powerplanner-design.md, powerplanner-testbed-design.md)
-------------------------------------------------------------------------------
* powerplanner-design.md
  - Header `Status` flipped from `Draft` to `Validated (Phases 1-3 -
    590/4 cold + 86/1 testbed; see §9.0 / §9.0.1). Phases 4-5 still
    draft.` with last-validation date.
  - §3.1: added an explicit note that `aic_interpolation` and `mode`
    are pre-existing PlannerConfig fields (owned by
    `monitoring/aic_interpolation.py`) and intentionally absent from
    the Names Registry.
  - §5.7 / §6.7: reworded the cross-reference to failure mode #5 so
    it correctly points at the *throughput regression* case rather
    than reading as a blanket "config revert".
  - §6.5 pseudocode: replaced module-level `self.device_count` with
    `pynvml.nvmlDeviceGetCount()` (the original was syntactically
    incorrect outside the daemon class).
  - §13 Open Question #11: corrected the duplicated
    `scheduled_decode_kv_tokens` typo so the agg-mode gate now reads
    `scheduled_prefill_tokens + scheduled_decode_kv_tokens` matching
    §5.3.
* powerplanner-testbed-design.md
  - §7: added a "Numeric-suffix convention" note explaining the
    intentional D21/E21 ID collision (IDs unique by filename + tuple,
    not renumbered).
  - §11: corrected "Six guards" -> "Seven guards" to match the seven
    items actually listed.
  - §5.2: typo `MNB` -> `MNBT (max_num_batched_tokens)`.
  - §C.14: verbatim test-output `30 PASSED` -> `31 PASSED` (1 wrapper
    + 30 parametrized) so the listing matches the actual run.

Verification
------------
* Testbed (alpha + gamma): 82 passed, 5 skipped (matches the documented
  Windows baseline; gamma auto-skipped without the Rust mocker).
* AIC no-cluster integration (test_aic_power_optimizer.py +
  test_aic_power_e2e_sim.py): 49 passed.
* Unit tests: 456 passed; the 9 remaining failures are pre-existing
  Windows-only environment limitations (cp1252 codec, `os.killpg`
  POSIX-only, missing `filterpy`) confirmed via `git stash` to be
  untouched by this commit.
* All touched .py files: `python3.10 -m py_compile` clean.
* All touched .yaml files: `yaml.safe_load_all` clean.
* `ReadLints` over all 12 touched files: no errors.
* Final `rg "kaim|aks-a100a-36888584"` outside committed
  `tests/fault_tolerance/...` (pre-existing, separate component) and
  `examples/.../.tmp-gp-minimal.yaml` (now gitignored): zero hits.

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Kai Ma <kaim@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants