Update README.md by nvda-mesharma · Pull Request #1 · ai-dynamo/dynamo

nvda-mesharma · 2025-03-04T00:42:20Z

No description provided.

* add metrics to disconnect * fmt * fmt Signed-off-by: michaelfeil <me@michaelfeil.eu>

Signed-off-by: michaelfeil <me@michaelfeil.eu>

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>

Squashed from 8 development commits. See PR description for full context. Infrastructure-only — builtin plugins + dual-path parity tests land in the follow-up PR. Signed-off-by: Kang Zhang <kangz@nvidia.com>

``OrchestratorEngineAdapter.__init__`` hard-coded ``WallClock()``, which left no seam for replay paths to propagate trace time into the plugin layer. Plugin scheduler ``_is_due``, CircuitBreaker cooldown, and HOLD_LAST cache age all read ``self._clock.monotonic()`` — under a fast-forward replay (e.g. 1hr trace in <10s real time), wall-clock barely moves and any plugin with ``execution_interval_seconds`` larger than the real-time duration never re-fires after its first call. This is invisible in PR #1's current ship surface (PR #1 has no builtin plugins, K8s smoke runs in real time, and PSM-only replay goes through ``_PSMEngineAdapter`` instead) but would block PR #10 (``use_orchestrator=True`` default) — once orchestrator becomes the default path, mooncake replay must work. Fix: - ``OrchestratorEngineAdapter.__init__`` accepts an optional ``clock: Clock`` kwarg, defaulting to ``WallClock`` so production behaviour is unchanged. - ``engine_adapter.tick()`` bumps the clock to ``tick_input.now_s`` at the start of every tick when a ``VirtualClock`` is in play (``advance(delta)`` only if ``delta > 0`` — backwards trace time is a silent no-op rather than a crash). - ``ReplayPlannerEngine`` constructs a ``VirtualClock`` and passes it to the adapter on the orchestrator path so plugin scheduler sees trace time. Regression tests in ``test_engine_adapter.py``: - ``test_tick_advances_injected_virtual_clock_to_trace_time``: drive two ticks at trace time 180s and 360s, assert clock follows. - ``test_tick_does_not_advance_clock_backwards``: pre-advance the clock past tick_input.now_s, assert no exception and clock stays put. - ``test_default_clock_is_wallclock``: lock production default so a future refactor that flips it doesn't silently break K8s. Full planner suite: 830 passed, 1 skipped, 0 failed. Signed-off-by: Kang Zhang <kangz@nvidia.com>

Squashed from 8 development commits. See PR description for full context. Infrastructure-only — builtin plugins + dual-path parity tests land in the follow-up PR. Signed-off-by: Kang Zhang <kangz@nvidia.com>

``OrchestratorEngineAdapter.__init__`` hard-coded ``WallClock()``, which left no seam for replay paths to propagate trace time into the plugin layer. Plugin scheduler ``_is_due``, CircuitBreaker cooldown, and HOLD_LAST cache age all read ``self._clock.monotonic()`` — under a fast-forward replay (e.g. 1hr trace in <10s real time), wall-clock barely moves and any plugin with ``execution_interval_seconds`` larger than the real-time duration never re-fires after its first call. This is invisible in PR #1's current ship surface (PR #1 has no builtin plugins, K8s smoke runs in real time, and PSM-only replay goes through ``_PSMEngineAdapter`` instead) but would block PR #10 (``use_orchestrator=True`` default) — once orchestrator becomes the default path, mooncake replay must work. Fix: - ``OrchestratorEngineAdapter.__init__`` accepts an optional ``clock: Clock`` kwarg, defaulting to ``WallClock`` so production behaviour is unchanged. - ``engine_adapter.tick()`` bumps the clock to ``tick_input.now_s`` at the start of every tick when a ``VirtualClock`` is in play (``advance(delta)`` only if ``delta > 0`` — backwards trace time is a silent no-op rather than a crash). - ``ReplayPlannerEngine`` constructs a ``VirtualClock`` and passes it to the adapter on the orchestrator path so plugin scheduler sees trace time. Regression tests in ``test_engine_adapter.py``: - ``test_tick_advances_injected_virtual_clock_to_trace_time``: drive two ticks at trace time 180s and 360s, assert clock follows. - ``test_tick_does_not_advance_clock_backwards``: pre-advance the clock past tick_input.now_s, assert no exception and clock stays put. - ``test_default_clock_is_wallclock``: lock production default so a future refactor that flips it doesn't silently break K8s. Full planner suite: 830 passed, 1 skipped, 0 failed. Signed-off-by: Kang Zhang <kangz@nvidia.com>

…g launch summary Two changes to examples/backends/sglang/launch/agg_multimodal_router.sh: 1. Replace the custom echo banner block with print_launch_banner per .ai/bash-launch-guidelines.md (matches sibling agg_router.sh:60). --no-curl is set because our own wait_ready loop later handles the smoke test, and --multimodal flags the script's nature. 2. Build a KV_EVENTS_PORTS array alongside WORKER_PORTS and reference both in the summary section, so when the harness sets DYN_SYSTEM_PORT{i} the printed URLs match what was actually launched (instead of always showing the default formula). Previously CI logs lied about ports under dynamic test-port allocation. Addresses #9561 review: Devin comments #1 + #2, CodeRabbit nitpick on agg_multimodal_router.sh:65-68. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- decode_handler._extract_mm_hashes docstring now states 16-char hex for the SGLang path (was incorrectly claiming the vLLM 64-char shape; Devin #5). - sglang launch banner drops the "Lightseek" prefix; uses "MM Exact Routing (SGLang)" to match the public name (Ryan suggestion #1). - ModelRuntimeConfig.backend_framework field gains a doc-block on its motivation — frontend uses it for backend-specific routing hints (Ryan #3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Squashed from 8 development commits. See PR description for full context. Infrastructure-only — builtin plugins + dual-path parity tests land in the follow-up PR. Signed-off-by: Kang Zhang <kangz@nvidia.com>

``OrchestratorEngineAdapter.__init__`` hard-coded ``WallClock()``, which left no seam for replay paths to propagate trace time into the plugin layer. Plugin scheduler ``_is_due``, CircuitBreaker cooldown, and HOLD_LAST cache age all read ``self._clock.monotonic()`` — under a fast-forward replay (e.g. 1hr trace in <10s real time), wall-clock barely moves and any plugin with ``execution_interval_seconds`` larger than the real-time duration never re-fires after its first call. This is invisible in PR #1's current ship surface (PR #1 has no builtin plugins, K8s smoke runs in real time, and PSM-only replay goes through ``_PSMEngineAdapter`` instead) but would block PR #10 (``use_orchestrator=True`` default) — once orchestrator becomes the default path, mooncake replay must work. Fix: - ``OrchestratorEngineAdapter.__init__`` accepts an optional ``clock: Clock`` kwarg, defaulting to ``WallClock`` so production behaviour is unchanged. - ``engine_adapter.tick()`` bumps the clock to ``tick_input.now_s`` at the start of every tick when a ``VirtualClock`` is in play (``advance(delta)`` only if ``delta > 0`` — backwards trace time is a silent no-op rather than a crash). - ``ReplayPlannerEngine`` constructs a ``VirtualClock`` and passes it to the adapter on the orchestrator path so plugin scheduler sees trace time. Regression tests in ``test_engine_adapter.py``: - ``test_tick_advances_injected_virtual_clock_to_trace_time``: drive two ticks at trace time 180s and 360s, assert clock follows. - ``test_tick_does_not_advance_clock_backwards``: pre-advance the clock past tick_input.now_s, assert no exception and clock stays put. - ``test_default_clock_is_wallclock``: lock production default so a future refactor that flips it doesn't silently break K8s. Full planner suite: 830 passed, 1 skipped, 0 failed. Signed-off-by: Kang Zhang <kangz@nvidia.com>

The cache-realloc (#1) and metric-handle (#2/#3) commits left a few lines unformatted — `longest_prefix_match`'s collapsed signature and the `merged` binding in l1.rs, and the per-worker gauge assignment + a test assert in metrics.rs. `cargo fmt --all --check` (run by the rust-tests CI job) flagged them, failing the job. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Squashed from 8 development commits. See PR description for full context. Infrastructure-only — builtin plugins + dual-path parity tests land in the follow-up PR. Signed-off-by: Kang Zhang <kangz@nvidia.com>

``OrchestratorEngineAdapter.__init__`` hard-coded ``WallClock()``, which left no seam for replay paths to propagate trace time into the plugin layer. Plugin scheduler ``_is_due``, CircuitBreaker cooldown, and HOLD_LAST cache age all read ``self._clock.monotonic()`` — under a fast-forward replay (e.g. 1hr trace in <10s real time), wall-clock barely moves and any plugin with ``execution_interval_seconds`` larger than the real-time duration never re-fires after its first call. This is invisible in PR #1's current ship surface (PR #1 has no builtin plugins, K8s smoke runs in real time, and PSM-only replay goes through ``_PSMEngineAdapter`` instead) but would block PR #10 (``use_orchestrator=True`` default) — once orchestrator becomes the default path, mooncake replay must work. Fix: - ``OrchestratorEngineAdapter.__init__`` accepts an optional ``clock: Clock`` kwarg, defaulting to ``WallClock`` so production behaviour is unchanged. - ``engine_adapter.tick()`` bumps the clock to ``tick_input.now_s`` at the start of every tick when a ``VirtualClock`` is in play (``advance(delta)`` only if ``delta > 0`` — backwards trace time is a silent no-op rather than a crash). - ``ReplayPlannerEngine`` constructs a ``VirtualClock`` and passes it to the adapter on the orchestrator path so plugin scheduler sees trace time. Regression tests in ``test_engine_adapter.py``: - ``test_tick_advances_injected_virtual_clock_to_trace_time``: drive two ticks at trace time 180s and 360s, assert clock follows. - ``test_tick_does_not_advance_clock_backwards``: pre-advance the clock past tick_input.now_s, assert no exception and clock stays put. - ``test_default_clock_is_wallclock``: lock production default so a future refactor that flips it doesn't silently break K8s. Full planner suite: 830 passed, 1 skipped, 0 failed. Signed-off-by: Kang Zhang <kangz@nvidia.com>

…oder reuse, doc fixes Low-risk cleanup batch from the independent review (no decision-path change): - #4 chain_augment: add ``predicted_kv_hit_rate`` to ``_PREDICTION_FIELDS`` so it participates in first-writer-wins partial merge like the other three predicted_* fields (was silently dropped in any 2+ plugin PREDICT chain, contradicting the proto/Pydantic contract). +2 chain_augment tests. - #10 engine_adapter: add ``scale_down_capped_by_throughput`` to ``_aggregate_disagg_load_reason`` priority (PSM disagg emits it; placed between scale_up and scale_down to mirror PSM's _PRIORITY). - #11 dead code: drop ``contributing_plugin_ids`` (built, never read) in pipeline._run_fanout_stage; drop ``_set_enabled`` + ``_plugin_ids`` (no caller in PR #1; would KeyError if reached). - #18 _encode_fpm: use the canonical ``dynamo.common.forward_pass_metrics.encode`` (shared module-level encoder) instead of allocating a fresh ``msgspec.msgpack.Encoder`` per tick and re-implementing the encoding. Byte-identical wire format; keeps FPM serialization in lock-step with the rest of dynamo. - #17 transport ABC docstring: timeout is enforced by the transport (``call()`` wraps ``asyncio.wait_for``), not the orchestrator — the pipeline uses a bare gather to avoid double-counting the deadline. - #20 scheduler docstring: note the heartbeat-eviction monitor is not wired in this PR (last_heartbeat_at is recorded but unread; monitor is follow-up). - #21 transport contract test: 7 inputs (not 8) → 14 cases (multi_pool fixture was removed with component_name; comments were stale). - #22 metrics test: remove the dead no-op ``pass`` loop in _sample_value. 828 planner tests pass (was 825; +3 chain-augment / merge tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The cache-realloc (#1) and metric-handle (#2/#3) commits left a few lines unformatted — `longest_prefix_match`'s collapsed signature and the `merged` binding in l1.rs, and the per-worker gauge assignment + a test assert in metrics.rs. `cargo fmt --all --check` (run by the rust-tests CI job) flagged them, failing the job. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Signed-off-by: Kang Zhang <kangz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three Dockerfile bugs combined to make the DCGM-mode image unbuildable on a fresh checkout. Fixing any one in isolation leaves the build broken, so they travel together: 1. DCGM_IMAGE default 'nvcr.io/nvidia/cloud-native/dcgm:4.2.3-2-ubuntu22.04' does not exist on NGC (verified 2026-05-21 via 'docker manifest inspect' → 404). Bump to 4.5.1-1-ubuntu22.04, the only resolvable 4.x tag. 2. DCGM 4.5+ relocated python bindings from /usr/local/dcgm/bindings/python3/ to /usr/share/datacenter-gpu-manager-4/bindings/python3/. The previous COPY would silently copy zero files under the new pin. Switch the source path to the 4.5+ location. 3. NGC's DCGM 4.5+ runtime image ships pydcgm with DcgmGroup.py:20 doing 'import logger' — but logger.py lives in DCGM's source tree under testing/python3/ and is NOT packaged. Without a shim every DcgmGroup construction raises ModuleNotFoundError. Add a 10-line stdlib-logging adapter at components/power_agent/logger.py and COPY it into /opt/dcgm/python/logger.py during the runtime stage. This unblocks 'docker build -f components/power_agent/Dockerfile' on a fresh clone (verified locally via 'docker buildx build --build-arg DCGM_IMAGE=...4.5.1-1-ubuntu22.04' against viking-prod-216 on 2026-05-21, image pushed to ttl.sh/dynamo-pa-kaim-dcgm45-v2:24h and used by the Path-B live test on aks-a100b-22138447-vmss000000). Refs: PR #9790 review, Power Agent live-test findings #1/#2/#6. Signed-off-by: Kai Ma <kaim@nvidia.com>

Signed-off-by: Kang Zhang <kangz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address graham-code-review feedback on PR #10351: - Drop the secondary `owners` map; store `Owner = (instance_id, lora_slug)` inline with each entry value. One lock, one source of truth, no nested-write-lock hazard, no two-map sync risk. - `register` takes `&Owner` (one clone inside, not per-file). - Panic on collision: re-registering the same (slug, suffix, filename) with a different owner is a programming error (two attaches of the same model+suffix in one process would let detach-#1 wipe files detach-#2 still needs). Same-owner re-register is fine and just updates the path. - Doc + local var naming aligned on `instance_id` to match `local_model.rs`'s existing usage (the value populates `DiscoveryInstance::Model.instance_id`). - Tests: collision panic + same-owner update path coverage. Signed-off-by: nnshah1 <neelays@nvidia.com>

…i-dynamo#10124) Signed-off-by: Kang Zhang <kangz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: shenls <shenlinshan@kanzhun.com>

Update README.md

c6ef4ad

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 00:42 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:01 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:06 — with GitHub Actions Failure

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:09 — with GitHub Actions Inactive

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:10 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:31 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:33 — with GitHub Actions Failure

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:49 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:57 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 02:00 — with GitHub Actions Inactive

Merge branch 'main' into Update-README.md

6104b6c

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 16:27 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 16:28 — with GitHub Actions Inactive

saturley-hall self-requested a review March 4, 2025 16:46

saturley-hall approved these changes Mar 4, 2025

View reviewed changes

nnshah1 merged commit 5033457 into main Mar 4, 2025

nnshah1 deleted the Update-README.md branch March 4, 2025 17:16

maobaolong mentioned this pull request Mar 21, 2025

pip install ai-dynamo[all] ERROR，help #321

Closed

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

Update README.md (ai-dynamo#1)

b9ce8dd

hhzhang16 mentioned this pull request Apr 11, 2025

feat: add .devcontainer based off images in container/ #497

Merged

dtransposed mentioned this pull request Jun 23, 2025

[BUG]: AttributeError: type object 'ServiceConfig' has no attribute 'get_parsed_config' #1608

Closed

coderabbitai Bot mentioned this pull request Sep 9, 2025

feat: frontend, disconnect metrics (#1) #2953

Merged

copy-pr-bot Bot pushed a commit that referenced this pull request Sep 9, 2025

Mf/disconnect metrics (#1)

0ac83d5

* add metrics to disconnect * fmt * fmt Signed-off-by: michaelfeil <me@michaelfeil.eu>

grahamking pushed a commit that referenced this pull request Sep 10, 2025

feat: frontend, disconnect metrics (#1) (#2953)

e42746a

Signed-off-by: michaelfeil <me@michaelfeil.eu>

ayushag-nv pushed a commit that referenced this pull request Sep 15, 2025

feat: frontend, disconnect metrics (#1) (#2953)

031f95c

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: ayushag <ayushag@nvidia.com>

zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025

feat: frontend, disconnect metrics (#1) (#2953)

aba1dd1

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

elyasmnvidian added a commit that referenced this pull request Sep 22, 2025

chore: fix unit test #1

265870c

Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>

ajcasagrande mentioned this pull request Nov 5, 2025

[BUG]: Tempo fails to run using docker compose due to permissions error #4124

Closed

saturley-hall mentioned this pull request May 27, 2026

docs: fix broken skill links, convert cross-docs refs to relative paths #10040

Merged

3 tasks

dmitry-tokarev-nv mentioned this pull request May 29, 2026

feat(tests): XPU router e2e test support with TOCTOU port fixes #8675

Merged

zbennett10 mentioned this pull request May 29, 2026

DEP: Semantic KV Cache Donor Interface #10127

Open

tedzhouhk mentioned this pull request Jun 2, 2026

feat(planner): plugin framework infrastructure (PR #1 of 2) #10124

Merged

jthomson04 mentioned this pull request Jun 3, 2026

perf(frontend): profile-guided hot-path optimizations (tokenizer cache, metric handles, log-trace cap) #10273

Merged

kangclzjc added a commit that referenced this pull request Jun 4, 2026

feat(planner): plugin framework infrastructure (PR #1 of 2) (#10124)

2001454

Signed-off-by: Kang Zhang <kangz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Swipe4057 mentioned this pull request Jun 8, 2026

[BUG]: Dynamo breaks speculative decoding and structured JSON output in sglang when using dynamo preprocessing #10411

Closed

tmonty12 pushed a commit that referenced this pull request Jun 8, 2026

feat(planner): plugin framework infrastructure (PR #1 of 2) (#10124)

251ef69

Signed-off-by: Kang Zhang <kangz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hutm mentioned this pull request Jun 8, 2026

DEP: Native Gateway API Inference Extension (GAIE) support in Dynamo — standalone vanilla vLLM + Dynamo workers #10426

Open

6 tasks

devin-ai-integration Bot mentioned this pull request Jun 10, 2026

fix(vllm): prevent Ultra DS-tail copy offset overflow #10552

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#1

Update README.md#1
nnshah1 merged 2 commits into
mainfrom
Update-README.md

nvda-mesharma commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvda-mesharma commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants