ci: remove timeout for cargo deny check by nv-anants · Pull Request #4 · ai-dynamo/dynamo

nv-anants · 2025-03-04T14:38:37Z

What does the PR do?

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

github-actions · 2025-03-04T15:04:28Z

Test Results

0 tests 0 ✅ 0s ⏱️
0 suites 0 💤
0 files 0 ❌

Results for commit d750624.

♻️ This comment has been updated with latest results.

saturley-hall · 2025-03-04T16:16:30Z

Merged into PR #3

Signed-off-by: Ryan Olson <rolson@nvidia.com>

… (text mode) (ai-dynamo#4) * fix * ups Signed-off-by: Qidong Su <soodoshll@gmail.com> * upd * update * fix * upd * fix --------- Signed-off-by: Qidong Su <soodoshll@gmail.com>

Fixes all actionable items from the second review: Bug fixes: - #1: Change returncode=4 → returncode=2 in pytest_configure exit (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED) - #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original) Test quality: - #7: Add missing assertions to test_apply_hf_home_layout (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE) - #8: Use monkeypatch in tests 3 & 4 for proper env isolation (prevents pre-existing env vars from leaking on test failure) Design / correctness: - #3: Fix _models_dir_env docstring ("exactly once" → "once per worker") - #4: Add comment noting TRANSFORMERS_CACHE deprecation - #5: Update --models-dir help text and docs to reflect both supported layouts (bare HF_HUB_CACHE and HF_HOME), not just bare - #10: Restore pytest.skip() in download_lora() (test-only infra); remove now-redundant guard from minio_lora_service fixture - #11: Raise hub/ detection log to WARNING with guidance - #12: Replace shutil.rmtree(ignore_errors=True) with try/except so cleanup failures are logged rather than silently swallowed Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester test deferred — complex due to conftest dependencies, low severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

Bug fixes: - #1: Add monkeypatch env isolation to test_apply_sets_writable_transformers_cache - #2: Add TRANSFORMERS_CACHE assertion to test_apply_bare_cache_layout (bare-layout path was missing the writable-dir check present in HF_HOME test) Minor cleanups: - #4: Move `import pytest` from inside download_lora() to module top-level (lora_utils.py is test-only infra; pytest is always available) - #5: Replace pytestconfig.getoption("--models-dir") re-checks in predownload_models/predownload_tokenizers with os.environ.get("DYNAMO_MODELS_DIR") (_models_dir_env runs first and sets the var; single source of truth) New coverage tests: - #7: test_models_dir_nonexistent_exits_with_code_2 — subprocess test verifying pytest_configure exits with returncode=2 on bad path - #8: test_download_lora_skips_in_models_dir_mode — verifies download_lora() raises pytest.skip.Exception when DYNAMO_MODELS_DIR is set Not addressed: #3 (keep gpu_0 per project guidelines and previous review retraction), #6 (hook ordering is guaranteed), #9 (complex, low priority) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

…code-waiter cleanup Addresses two correctness issues raised by dynamo-ops review on PR 8965. 1. **Race on PrefillReady re-insert** (model_manager.rs) Both `register_prefill_router` and `activate_prefill_router` previously used a `remove → process → insert` pattern on `prefill_router_activators`. Between the `remove` and the `insert` the entry is gone from the map, so a concurrent `remove_prefill_activator` call (called by the watcher on prefill-component teardown) hits the empty map, skips its cleanup, and we then re-insert a stale `PrefillReady` for a prefill that's already gone. The next decode rebuild's lookup finds the resurrected cache and activates a `PrefillRouter` against a dead endpoint. Refactor both functions to use DashMap's `Entry` API. The shard lock is held for the duration of the OccupiedEntry, so any concurrent `remove_prefill_activator` serializes after us and observes the entry it needs to clear. The PrefillReady-consume path in `register_prefill_router` now reads-and-clones in place without removing; the DecodeWaiting/PrefillReady → PrefillReady transition in `activate_prefill_router` uses `OccupiedEntry::insert(...)` for an atomic value swap that returns the old value. 2. **Decode-waiter cleanup gated on `removed.is_some()`** (watcher.rs) `handle_delete_helper`'s decode branch only called `remove_decode_prefill_waiter` when `remove_worker_set` returned `Some(_)`. If decode registered (creating a `DecodeWaiting` activator entry via `register_prefill_router`) but `handle_add_helper` failed later (e.g., on `kv_chooser_for`, monitor setup, or `build_routed_pipeline`) before `add_worker_set`, no WorkerSet ever landed in the manager — and on later teardown `remove_worker_set` returns `None`, leaving the stale `DecodeWaiting` entry orphaned in the activator map. Drop the `removed.is_some()` guard. The helper itself is state-safe (uses `DashMap::remove_if(|_, v| matches!(v, DecodeWaiting(_)))`) so calling it on a key that's vacant or holds `PrefillReady` is a no-op. The prefill-teardown branch keeps its `removed.is_some()` guard because `remove_prefill_activator` blanket-removes both states, and calling it without a real WorkerSet teardown could orphan a live decode-side `DecodeWaiting`. Behavior is unchanged for all four reproducers (ai-dynamo#1 burst-sweep canary OFF, ai-dynamo#2 Pattern 1 multi-decode restart, ai-dynamo#3 Pattern 2 stale-DecodeWaiting, ai-dynamo#4 burst-sweep canary ON); will re-verify on a fresh build.

…later Reader-first ordering. Old order buried Quick start at position 8 of 9; new order surfaces the runnable commands above all the reference sections (FSx vs EBS, label conventions, aiperf SHA rationale). New section order: 1. Title + 3-config table 2. Pre-requisites (was #3) 3. Quick start (was #8 — promoted) 4. Directory layout (was #7 — now serves as map for the rest) 5. Hardware targets (was #2 — now pure reference; invocation examples moved into Quick start) 6. Storage (was #5) 7. aiperf install (was #6) 8. Naming & ownership (was #4) 9. Notes (unchanged) Also drop the stray "We use ebs by default" sentence — it contradicted both the Storage section and the actual yaml (where the PVC block is fully commented out, no default storage class is set). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xl v1.0.1 - sglang_ref: e5c2a9b -> v0.5.11 - SGLANG_KERNEL_REF: c668bb67 -> 3fbd4d3372 (latest sgl-kernel-xpu main) - nixl_ref: 0.10.1 -> v1.0.1 Manually re-applied from 7990.patch commit ai-dynamo#4 (skipped during git am due to context drift in container/context.yaml). Signed-off-by: yuanwu <yuan.wu@intel.com>

That reference was tied to the CI fan-out section, which is no longer in this PR. Signed-off-by: Rini Gupta <rinig@nvidia.com>

…cycle Address senior-architect review of the metrics infrastructure: #1 (two parallel Prometheus paths): merge `register_prometheus` + `setup_component_metrics` into one `setup_metrics(ctx) -> MetricsBindings` method on the Rust `LLMEngine` trait. The trait surface shrinks from 7 → 6 methods; both expfmt bridging and the structured component publisher go through one place. The Python ABC keeps its two methods — the PyO3 bridge adapts (engine-author API unchanged for Python). #3 + #8 (framework lifecycle gauges gated on engine opt-in): - `cleanup_time_seconds` and `drain_time_seconds` move to Rust-side `LifecycleGauges` (prometheus crate gauges registered via the runtime's `MetricsRegistry`). Worker constructs them after `engine.start()` succeeds; observes during cleanup/drain. Emits regardless of engine opt-in. - Drop `set_cleanup_time` / `set_drain_time` from the `ComponentMetricsPublisher` trait — engine no longer responsible. - Drop the corresponding methods from `PyComponentMetricsPublisher`. - `model_load_time_seconds` STAYS on Python `LLMBackendMetrics` for parity with the legacy entry points (both legacy `main.py` and the unified bridge populate it). The unified bridge now constructs `LLMBackendMetrics` UNCONDITIONALLY so the gauge emits even when the engine returns no per-rank sources. #5 (kv_cache_hit_rate Optional semantics): clearer docstring on both Rust `ComponentSnapshot` and the Python ABC — tri-state explicitly documented (`None` = no data, `0.0` = legitimate zero-hit). Deferred with rationale (explained in the post-review architect's note): - #2 (HashMap<MetricKey, f64> ComponentSnapshot): real future-proofing but no current bug; 5 fields for 5 signals isn't worth the refactor cost yet. Revisit at ~10+ signals. - #4 (per-source cadence): explicit defer in original review. - #7 (conformance in CI): workflow change, separate from trait surface. Conformance kit collapses `check_register_prometheus` + `check_component_metrics` into `check_setup_metrics`. All 68 backend-common unit tests pass. Mocker example updated. PyO3 bridge wheel rebuilt successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Steady-state RSS of the agent (pynvml + kubernetes client + prometheus exporter) is ~80 MiB on an idle 8-GPU node — the 128Mi default left only ~48 MiB of headroom, breaking under both routine inspection (`kubectl exec --container=power-agent -- python -c "..."` re-imports the kubernetes client in the same cgroup; observed RSS spike > 128 MiB, two OOM-kills during 2026-05-21 live testing on AKS dpp-dev-env) and transient peaks in multi-pod conflict resolution where the UID → annotation dict + PID → UUID map are both held in memory simultaneously (linear in pod count per node). Bump: limits.memory: 128Mi → 256Mi (~3× steady-state RSS headroom) requests.memory: 64Mi → 96Mi (more honest scheduling floor) CPU envelope unchanged — agent reconciles every 15 s and the heavy calls (pynvml + list_pod_for_all_namespaces) are well under 200m on fleet inspection (median: 12 m, p99: 78 m). Fleet cost: at 1000 GPU nodes the extra 128 MiB per node is 125 GiB, which is negligible compared to the operational cost of a silently OOM-killed power-cap controller (the failure mode is a node whose caps slowly drift back to safeDefaultWatts on the next reconcile that doesn't fit the budget, with no obvious alert signal — the agent's Pod restart counter is the only indicator). Verified: helm template + helm lint clean. Refs: PR #9682 review, Power Agent live-test finding #4. Signed-off-by: Kai Ma <kaim@nvidia.com>

…oder reuse, doc fixes Low-risk cleanup batch from the independent review (no decision-path change): - ai-dynamo#4 chain_augment: add ``predicted_kv_hit_rate`` to ``_PREDICTION_FIELDS`` so it participates in first-writer-wins partial merge like the other three predicted_* fields (was silently dropped in any 2+ plugin PREDICT chain, contradicting the proto/Pydantic contract). +2 chain_augment tests. - #10 engine_adapter: add ``scale_down_capped_by_throughput`` to ``_aggregate_disagg_load_reason`` priority (PSM disagg emits it; placed between scale_up and scale_down to mirror PSM's _PRIORITY). - ai-dynamo#11 dead code: drop ``contributing_plugin_ids`` (built, never read) in pipeline._run_fanout_stage; drop ``_set_enabled`` + ``_plugin_ids`` (no caller in PR #1; would KeyError if reached). - ai-dynamo#18 _encode_fpm: use the canonical ``dynamo.common.forward_pass_metrics.encode`` (shared module-level encoder) instead of allocating a fresh ``msgspec.msgpack.Encoder`` per tick and re-implementing the encoding. Byte-identical wire format; keeps FPM serialization in lock-step with the rest of dynamo. - ai-dynamo#17 transport ABC docstring: timeout is enforced by the transport (``call()`` wraps ``asyncio.wait_for``), not the orchestrator — the pipeline uses a bare gather to avoid double-counting the deadline. - ai-dynamo#20 scheduler docstring: note the heartbeat-eviction monitor is not wired in this PR (last_heartbeat_at is recorded but unread; monitor is follow-up). - ai-dynamo#21 transport contract test: 7 inputs (not 8) → 14 cases (multi_pool fixture was removed with component_name; comments were stale). - ai-dynamo#22 metrics test: remove the dead no-op ``pass`` loop in _sample_value. 828 planner tests pass (was 825; +3 chain-augment / merge tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ADP routing In disaggregated serving, TRT-LLM's ConversationAwareADPRouter pins each conversation to an attention-DP rank (KV locality) by reading disaggregated_params.conversation_id. Native trtllm-serve populates this from the client's conversation routing; the Dynamo TRT-LLM workers did not (gap #4), so the engine's conversation-affinity router never engaged and CTX prefill load scattered across ranks -> prefill queue -> high TTFT tail. This plumbs conversation_id end-to-end so the engine router engages: - RoutingHints gains conversation_id, seeded by the OpenAI preprocessor from nvext.session_control.session_id (lib/llm). - Both the unified worker (llm_engine.py) and the legacy handler (request_handlers/handler_base.py) set disaggregated_params.conversation_id on the engine request, on prefill and decode. - Gated by DYN_ENGINE_CONV_AFFINITY=1: when set, the worker stops force-feeding attention_dp_rank (an explicit rank is honored before conversation affinity in the engine and would bypass ConversationAwareADPRouter). - dp_rank.conversation_id_from_request helper + unit test; one-shot INFO log to confirm the runtime invariants without enabling debug. Validated on Lyris GB300 (DSV4-Pro, 3xCTX-DEP4 + 1xGEN-DEP8, c460, perf-off) with engine config kv_cache_routing_conversation_affinity=true: TTFT p95 8.39 -> 4.84s (~42%), matching native trtllm-serve (4.81); p99 7.12 vs 7.00; 46 -> 50 req/s. Runtime-confirmed: conversation_id set + attention_dp_rank not forced, on both PREFILL and DECODE. Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>

remove timeout for cargo deny check

11ff1ce

nv-anants temporarily deployed to GITLAB March 4, 2025 14:38 — with GitHub Actions Inactive

nv-anants temporarily deployed to GITLAB March 4, 2025 14:39 — with GitHub Actions Inactive

fix

c9a3b10

nv-anants had a problem deploying to GITLAB March 4, 2025 14:42 — with GitHub Actions Failure

nv-anants temporarily deployed to GITLAB March 4, 2025 14:46 — with GitHub Actions Inactive

nv-anants temporarily deployed to GITLAB March 4, 2025 14:49 — with GitHub Actions Inactive

rerun: check timing

d750624

nv-anants temporarily deployed to GITLAB March 4, 2025 15:24 — with GitHub Actions Inactive

nv-anants temporarily deployed to GITLAB March 4, 2025 15:25 — with GitHub Actions Inactive

saturley-hall closed this Mar 4, 2025

piotrm-nvidia mentioned this pull request Mar 26, 2025

ERROR _core: Failed to send response: SendError #401

Closed

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

WIP: debugging CI failures. (ai-dynamo#4)

1e4ef91

elyasmnvidian added a commit that referenced this pull request Oct 14, 2025

chore: clippy #4

5b3a031

elyasmnvidian added a commit that referenced this pull request Oct 15, 2025

chore: clippy #4

7affd0a

elyasmnvidian mentioned this pull request Oct 15, 2025

fix: deepseek tool parsing fixed #3557

Merged

dagil-nvidia mentioned this pull request Oct 21, 2025

feat: Add automated dependency version tracking and extraction #3547

Closed

coderabbitai Bot mentioned this pull request Nov 5, 2025

feat: Update Dynamo k8s deployment example to use ModelExpress #4112

Closed

ryanolson added a commit that referenced this pull request Dec 2, 2025

kvbm: checkpoint - e2e pol #4 - 4 instance cluster with find_blocks api

2fb0696

Signed-off-by: Ryan Olson <rolson@nvidia.com>

dagil-nvidia mentioned this pull request Feb 15, 2026

ci: add LGTM Bot for external contributor merge visibility #6313

Draft

nnshah1 mentioned this pull request Apr 3, 2026

feat: request lifecycle logging + metrics capture [DIS-1643] #7840

Merged

4 tasks

krishung5 mentioned this pull request Apr 15, 2026

feat(multimodal): move MM routing into vLLM frontend processor #8065

Merged

tanmayv25 mentioned this pull request Apr 15, 2026

DEP: Backend Interface -- LLMEngine ABC and Worker #8251

Open

keivenchang mentioned this pull request Apr 24, 2026

[INVESTIGATE] sglang + DSv4: kwarg workaround breaks live — needs diagnosis #8671

Closed

4 tasks

yifjiang mentioned this pull request May 4, 2026

fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates #8965

Merged

4 tasks

indrajit96 mentioned this pull request May 7, 2026

[BUG] TRT-LLM disaggregated serving crashes with NIXL agent creation failure (three conflicting NIXL installations) #6671

Closed

nv-rinig added a commit that referenced this pull request May 14, 2026

docs: drop dangling reference to "Required markers" #4

e4c2d2d

That reference was tied to the CI fan-out section, which is no longer in this PR. Signed-off-by: Rini Gupta <rinig@nvidia.com>

This was referenced May 15, 2026

fix(self-host): harvest non-weight siblings into slug_dir (gh-8749) #9610

Merged

fix: MM-aware KV routing for Phi-3, Qwen2-VL, Qwen2.5-VL #9441

Merged

dmitry-tokarev-nv mentioned this pull request May 26, 2026

fix(trtllm): restore deps, default CMD, and reduce layer count after upstream base switch #9889

Merged

15 tasks

nv-yna mentioned this pull request Jun 10, 2026

feat(trtllm): plumb conversation_id to engine for conversation-aware ADP routing #10517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: remove timeout for cargo deny check#4

ci: remove timeout for cargo deny check#4
nv-anants wants to merge 3 commits into
mainfrom
anants/fix-rust-ci

nv-anants commented Mar 4, 2025

Uh oh!

github-actions Bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

saturley-hall commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nv-anants commented Mar 4, 2025

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

github-actions Bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

saturley-hall commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Mar 4, 2025 •

edited

Loading