feat: introduce RoutingConstraints + worker taints#9558
Conversation
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis PR introduces RoutingConstraints—a structure containing required and preferred worker taints—and threads it through request processing, scheduling, worker selection, router pipelines, and Python bindings. The feature enables clients to specify worker compatibility constraints via taints (required sets and soft-preference weights) that are evaluated during worker selection to filter candidates and influence scoring. ChangesRouting Constraints Implementation and Integration
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/kv-router/src/scheduling/local.rs`:
- Around line 189-190: The schedule() signature was changed to require a new
taints: Taints parameter so all local callers must be updated; find calls to
schedule(...) in this file (and tests) and pass an appropriate Taints value
(e.g., Taints::default() or a specific Taints construct) as the new argument,
ensuring the order matches the signature (taints before shared_cache_hits if
applicable), and update any calls that construct or forward SharedCacheHits
(crate::SharedCacheHits) accordingly so compilation succeeds.
In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 250-253: The pinned-worker taints check is rejecting workers when
request.taints is empty; update the conditional around the
workers.get(&worker.worker_id).is_some_and(...) check so it only performs the
taints compatibility test if request.taints is non-empty (e.g., guard with
!request.taints.is_empty()). In practice, modify the if that references workers,
worker.worker_id, config.taints() and request.taints to skip the
is_compatible_with call when request.taints is empty so pinned and non-pinned
paths behave consistently.
In `@lib/kv-router/src/scheduling/types.rs`:
- Line 65: The SchedulingRequest struct gained a new mandatory field taints
(type Taints), so update every SchedulingRequest literal to include a taints:
... entry; search for constructor/struct literals of SchedulingRequest (e.g., in
lib/kv-router/src/scheduling/selector.rs tests and any test helpers/factories)
and add a sensible value (likely Taints::default() or an empty/neutral taints
value provided by the Taints type) to each literal so the code compiles with the
new field.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5bafd7ee-b0e5-4ebe-a55f-dda608952eef
📒 Files selected for processing (19)
lib/bindings/python/rust/lib.rslib/bindings/python/rust/llm/kv.rslib/bindings/python/rust/llm/local_model.rslib/bindings/python/src/dynamo/_core.pyilib/bindings/python/src/dynamo/llm/__init__.pylib/kv-router/src/protocols.rslib/kv-router/src/scheduling/local.rslib/kv-router/src/scheduling/selector.rslib/kv-router/src/scheduling/types.rslib/llm/src/kv_router.rslib/llm/src/kv_router/prefill_router/execution.rslib/llm/src/kv_router/push_router.rslib/llm/src/kv_router/scheduler.rslib/llm/src/local_model/runtime_config.rslib/llm/src/preprocessor.rslib/llm/src/protocols/common/preprocessor.rslib/llm/src/protocols/openai/nvext.rslib/mocker/src/replay/offline/components/router.rslib/mocker/src/replay/online/router.rs
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com> # Conflicts: # lib/llm/src/kv_router.rs
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
@CodeRabbit full review. I changed the PR quite a bit, so a pr from scratch would be helpful. |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/llm/src/kv_router/prefill_router/execution.rs (1)
37-75:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPreselected prefill workers bypass
taints.requiredvalidation.Line 37 directly accepts
preselected_workerand skips taints checks, while taints are only applied in the non-preselected branch (Line 61 onward). This breaks uniform taint eligibility guarantees.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 37 - 75, Preselected prefill workers currently skip taints validation; when preselected_worker is Some(id) you must apply the same taints eligibility checks used in the non-preselected branch: extract taints from req.routing (as done in the else branch), then validate that the selected worker id satisfies taints.required constraints (reusing the same taints matching logic or helper used by query_prefill_worker); if the worker fails the taints check, reject it (return an error or fall through to the shared selection path) and log the rejection (include worker_id and failing taint), otherwise proceed to set (worker_id, dp_rank) as before. Ensure you reference preselected_worker, req.routing, taints, and query_prefill_worker so the validation code is colocated with the existing routing logic.lib/llm/src/kv_router/push_router.rs (1)
379-466:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTaint eligibility is bypassed on pinned-worker fallback.
Line 397’s fallback path can return/route the pinned worker without a taints compatibility check, while taints are only enforced in the
find_best_match_detailsbranches. This allowstaints.requiredto be skipped for pinned requests in that path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/kv_router/push_router.rs` around lines 379 - 466, The pinned-worker fallback currently skips taints validation (when resolved_pinned_worker is Some or None and WorkerWithDpRank::new is used), allowing routes that ignore taints.required; ensure you run the same taints compatibility check used in find_best_match_details before computing cache hits or returning Ok: validate the pinned_worker_id / worker against the incoming taints and reject or return an error (or fall back) if incompatible, only call chooser.get_cache_hit_estimate and chooser.add_request when the taints check passes, and set scheduler_tracked = true only when the worker passed taints validation (and !is_query_only and resolved_dp_rank.is_some()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Line 2041: Update the doc comment for the "taints" parameter to explicitly
state that only "required" taints are currently enforced by frontend routing and
that "preferred" taints are not acted upon yet; mention the symbols "taints" and
"preferred" so readers know which fields this applies to and add a brief note
advising callers that preferred taints are a non-enforced hint until frontend
routing is implemented to respect them.
---
Outside diff comments:
In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 37-75: Preselected prefill workers currently skip taints
validation; when preselected_worker is Some(id) you must apply the same taints
eligibility checks used in the non-preselected branch: extract taints from
req.routing (as done in the else branch), then validate that the selected worker
id satisfies taints.required constraints (reusing the same taints matching logic
or helper used by query_prefill_worker); if the worker fails the taints check,
reject it (return an error or fall through to the shared selection path) and log
the rejection (include worker_id and failing taint), otherwise proceed to set
(worker_id, dp_rank) as before. Ensure you reference preselected_worker,
req.routing, taints, and query_prefill_worker so the validation code is
colocated with the existing routing logic.
In `@lib/llm/src/kv_router/push_router.rs`:
- Around line 379-466: The pinned-worker fallback currently skips taints
validation (when resolved_pinned_worker is Some or None and
WorkerWithDpRank::new is used), allowing routes that ignore taints.required;
ensure you run the same taints compatibility check used in
find_best_match_details before computing cache hits or returning Ok: validate
the pinned_worker_id / worker against the incoming taints and reject or return
an error (or fall back) if incompatible, only call
chooser.get_cache_hit_estimate and chooser.add_request when the taints check
passes, and set scheduler_tracked = true only when the worker passed taints
validation (and !is_query_only and resolved_dp_rank.is_some()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 36b4ba48-b11d-457e-970f-8da537d407e3
📒 Files selected for processing (19)
lib/bindings/python/rust/lib.rslib/bindings/python/rust/llm/kv.rslib/bindings/python/rust/llm/local_model.rslib/bindings/python/src/dynamo/_core.pyilib/bindings/python/src/dynamo/llm/__init__.pylib/kv-router/src/protocols.rslib/kv-router/src/scheduling/local.rslib/kv-router/src/scheduling/selector.rslib/kv-router/src/scheduling/types.rslib/llm/src/kv_router.rslib/llm/src/kv_router/prefill_router/execution.rslib/llm/src/kv_router/push_router.rslib/llm/src/kv_router/scheduler.rslib/llm/src/local_model/runtime_config.rslib/llm/src/preprocessor.rslib/llm/src/protocols/common/preprocessor.rslib/llm/src/protocols/openai/nvext.rslib/mocker/src/replay/offline/components/router.rslib/mocker/src/replay/online/router.rs
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
/ok to test bd2a681 |
PeaBrane
left a comment
There was a problem hiding this comment.
One broader shape suggestion: since this PR now has two levels of eligibility filtering (allowed_worker_ids plus hard required_taints), I think it would be worth centralizing that in a tiny borrowed helper rather than letting queue admission, selector filtering, and any queue cost estimation each reconstruct the same predicate.
The shape I have in mind is intentionally small and should compile away: something like a RoutingEligibility<'a> holding Option<&HashSet<WorkerId>> and &RoutingConstraints, with #[inline] methods for allows_worker_id(...) and allows_worker(worker_id, config). That keeps the combined rule (allowed_worker_ids + required taints) in one place without allocation, cloning, trait objects, or dynamic dispatch.
That helper would also make the WSPT fix less error-prone, because the same eligibility predicate could be reused when estimating best effective prefill tokens. Separately, RoutingConstraints::is_compatible_with_worker_taints() and preferred_taint_multiplier() should short-circuit the empty/default cases so the common no-constraints path stays essentially zero overhead.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
lib/bindings/python/src/dynamo/_core.pyi (1)
550-568:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMirror the current
preferred_taintsbehavior in the Python docs.This stub still describes
preferred_taintsas actively affecting scoring, but the default selector only consumesrequired_taintstoday. Please markpreferred_taintsas accepted-but-not-enforced until frontend routing actually uses it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/bindings/python/src/dynamo/_core.pyi` around lines 550 - 568, The docstring for class RoutingConstraints inaccurately claims preferred_taints affects scoring; update the RoutingConstraints docstring (and mention in the __init__ signature if present) to state that preferred_taints is currently accepted but not enforced/used by the default selector—i.e., keep required_taints behavior unchanged, but explicitly mark preferred_taints as a no-op/for-future-use until frontend routing consumes it so readers understand it does not impact scoring today.lib/llm/src/kv_router/prefill_router/execution.rs (2)
273-316:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject or filter hard-constrained requests in
SimpleRoutermode.
routing_constraintsis forwarded to the KV-router branch only. In theSimpleRouterbranch,required_taintsare ignored, so non-KV prefill routing can still choose an incompatible worker.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 273 - 316, query_prefill_worker forwards routing_constraints only to the KvRouter branch so the SimpleRouter branch (InnerPrefillRouter::SimpleRouter) ignores required_taints and may pick incompatible workers; update the SimpleRouter path in query_prefill_worker to first inspect routing_constraints.required_taints (and other hard constraints like allowed_worker_ids) and either filter available workers returned by r.select_next_worker()/r.peek_next_worker() to those that satisfy the constraints or reject the request with an error when no compatible worker exists, ensuring behavior matches the KvRouter branch.
24-82:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
preselected_workeragainstrequired_taints.This method now threads
routing_constraintsthrough the normal selection path, but thepreselected_workerbranch still bypasses that check entirely. A request with hard constraints can therefore bootstrap against an incompatible prefill worker. Reuse the pinned-worker compatibility check here, or rejectpreselected_workerwhen the constraints cannot be validated.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 24 - 82, resolve_prefill_worker currently skips validating a provided preselected_worker against routing constraints/required_taints, allowing incompatible bootstraps; update the preselected_worker branch in resolve_prefill_worker to perform the same compatibility check used by query_prefill_worker (reuse the pinned-worker compatibility logic or call the same validator) using routing_constraints (and required_taints) derived from req.routing and req.block_mm_routing_info(), and if validation fails return PrefillResolveDecision::Unavailable (or the same error path as query_prefill_worker) instead of accepting the worker; reference resolve_prefill_worker, preselected_worker, routing_constraints, required_taints, query_prefill_worker, and PrefillResolveDecision in your change.lib/llm/src/kv_router.rs (1)
690-716:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore the old
find_best_matchentry point.This adds
RoutingConstraintsto a public method, so existing out-of-tree Rust callers will stop compiling. Please keep the old signature as a default-constraints wrapper and move the taints-aware path to a separate method name.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/kv_router.rs` around lines 690 - 716, Restore the original public API by keeping the old find_best_match signature (without RoutingConstraints) as a thin wrapper that calls the existing taints-aware implementation (rename that implementation to find_best_match_with_constraints or similar) and passes a default RoutingConstraints value; specifically, keep a public fn find_best_match(&self, context_id: Option<&str>, tokens: &[u32], block_mm_infos: Option<&[Option<BlockExtraInfo>]>, router_config_override: Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>, priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids: Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that forwards to find_best_match_with_constraints(..., routing_constraints: RoutingConstraints::default()), and rename the current method that accepts RoutingConstraints to the new taints-aware name (e.g., find_best_match_with_constraints) so callers who need taint behavior can use it while existing callers keep compiling.lib/kv-router/src/scheduling/selector.rs (1)
287-296:⚠️ Potential issue | 🟠 Major | ⚡ Quick winApply preferred-taint bias independently of the base cost.
Multiplying
base_scoreby the preferred-taint multiplier makes soft preferences a no-op whenever the computed cost is0.0, so fully idle or fully cached candidates can still tie onto an explicitly avoided taint. This needs an additive bias or another tie-breaker that still differentiates zero-cost workers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/kv-router/src/scheduling/selector.rs` around lines 287 - 296, The get_score closure currently multiplies base_score by preferred_taint_multiplier, which makes taint preferences ineffective when base_score == 0. Update the get_score closure (and related logic around worker_logit and request.routing_constraints.preferred_taint_multiplier) to apply the taint preference as an additive bias independent of base_score — e.g., compute base_score = self.worker_logit(...), compute taint_mult = request.routing_constraints.preferred_taint_multiplier(config.taints()), compute a small additive bias (constant or epsilon-scaled value) derived from taint_mult (for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score + bias so zero-cost workers are still differentiated by taint preference.lib/kv-router/src/protocols.rs (1)
174-180:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftNon-finite weight validation still missing.
Serde will deserialize
NaNand±infinityvalues intopreferred_taintswithout validation. When these flow intopreferred_taint_multiplier(), they propagate throughtanhandexp, producing all-NaNscores that can panic downstream selection logic (e.g., temperature-0 argmax).Add a custom
Deserializeimpl or validation method that rejects or neutralizes non-finite weights at the API boundary.🛡️ Example validation approach
impl RoutingConstraints { pub fn validate(&self) -> Result<(), &'static str> { for (taint, weight) in &self.preferred_taints { if !weight.is_finite() { return Err("preferred_taints must have finite weights"); } } Ok(()) } }Or filter at construction/deserialization time.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/kv-router/src/protocols.rs` around lines 174 - 180, The preferred_taints map in struct RoutingConstraints can contain non-finite floats (NaN/±inf) from serde deserialization; add validation to reject or sanitize these values at the API boundary by either implementing a custom Deserialize for RoutingConstraints or adding a validate() method called right after deserialization that iterates over self.preferred_taints and returns an error if any weight is not finite; ensure callers (or the deserializer) enforce this so preferred_taint_multiplier() never receives non-finite inputs.
🧹 Nitpick comments (1)
lib/kv-router/src/protocols.rs (1)
174-215: ⚡ Quick winConsider adding unit tests for RoutingConstraints methods.
The new
RoutingConstraintsmethods (is_empty,has_hard_constraints,is_compatible_with_worker_taints,preferred_taint_matches,preferred_taint_multiplier) have no test coverage. Adding tests would document expected behavior, catch edge cases (empty sets, boundary conditions, formula correctness), and prevent regressions.📝 Example test structure
#[cfg(test)] mod routing_constraints_tests { use super::*; #[test] fn test_is_compatible_with_worker_taints() { let constraints = RoutingConstraints { required_taints: ["gpu", "us-west"].iter().map(|s| s.to_string()).collect(), ..Default::default() }; let worker_taints_match: HashSet<String> = ["gpu", "us-west", "extra"].iter().map(|s| s.to_string()).collect(); assert!(constraints.is_compatible_with_worker_taints(&worker_taints_match)); let worker_taints_missing: HashSet<String> = ["gpu"].iter().map(|s| s.to_string()).collect(); assert!(!constraints.is_compatible_with_worker_taints(&worker_taints_missing)); } // Add tests for other methods... }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/kv-router/src/protocols.rs` around lines 174 - 215, Add unit tests for RoutingConstraints covering is_empty, has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches, and preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify empty/default constraints, constraints with required_taints that both match and don't match worker_taints (use RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints), verify preferred_taint_matches returns correct counts for overlapping keys, and assert preferred_taint_multiplier produces expected numeric behavior for known weights (include cases like no preferred_taints -> multiplier == (-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp flow). Reference the RoutingConstraints struct and its methods by name when locating code to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 288-306: The doc comment on
RoutingConstraintsSchema::preferred_taints incorrectly documents active scoring
behavior (tanh, summing, positive multiplier) which isn't implemented; update
the comment on the preferred_taints field to a neutral description (e.g.,
"Optional soft preferences keyed by worker taint; implementations may use these
as tie-breakers or heuristics") and remove the specifics about weighting,
summing, and tanh so the schema doesn't promise unimplemented scoring; keep the
required_taints doc as-is and ensure the symbol names referenced are
RoutingConstraintsSchema and preferred_taints.
---
Duplicate comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Around line 550-568: The docstring for class RoutingConstraints inaccurately
claims preferred_taints affects scoring; update the RoutingConstraints docstring
(and mention in the __init__ signature if present) to state that
preferred_taints is currently accepted but not enforced/used by the default
selector—i.e., keep required_taints behavior unchanged, but explicitly mark
preferred_taints as a no-op/for-future-use until frontend routing consumes it so
readers understand it does not impact scoring today.
In `@lib/kv-router/src/protocols.rs`:
- Around line 174-180: The preferred_taints map in struct RoutingConstraints can
contain non-finite floats (NaN/±inf) from serde deserialization; add validation
to reject or sanitize these values at the API boundary by either implementing a
custom Deserialize for RoutingConstraints or adding a validate() method called
right after deserialization that iterates over self.preferred_taints and returns
an error if any weight is not finite; ensure callers (or the deserializer)
enforce this so preferred_taint_multiplier() never receives non-finite inputs.
In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 287-296: The get_score closure currently multiplies base_score by
preferred_taint_multiplier, which makes taint preferences ineffective when
base_score == 0. Update the get_score closure (and related logic around
worker_logit and request.routing_constraints.preferred_taint_multiplier) to
apply the taint preference as an additive bias independent of base_score — e.g.,
compute base_score = self.worker_logit(...), compute taint_mult =
request.routing_constraints.preferred_taint_multiplier(config.taints()), compute
a small additive bias (constant or epsilon-scaled value) derived from taint_mult
(for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score +
bias so zero-cost workers are still differentiated by taint preference.
In `@lib/llm/src/kv_router.rs`:
- Around line 690-716: Restore the original public API by keeping the old
find_best_match signature (without RoutingConstraints) as a thin wrapper that
calls the existing taints-aware implementation (rename that implementation to
find_best_match_with_constraints or similar) and passes a default
RoutingConstraints value; specifically, keep a public fn find_best_match(&self,
context_id: Option<&str>, tokens: &[u32], block_mm_infos:
Option<&[Option<BlockExtraInfo>]>, router_config_override:
Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>,
priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids:
Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that
forwards to find_best_match_with_constraints(..., routing_constraints:
RoutingConstraints::default()), and rename the current method that accepts
RoutingConstraints to the new taints-aware name (e.g.,
find_best_match_with_constraints) so callers who need taint behavior can use it
while existing callers keep compiling.
In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 273-316: query_prefill_worker forwards routing_constraints only to
the KvRouter branch so the SimpleRouter branch
(InnerPrefillRouter::SimpleRouter) ignores required_taints and may pick
incompatible workers; update the SimpleRouter path in query_prefill_worker to
first inspect routing_constraints.required_taints (and other hard constraints
like allowed_worker_ids) and either filter available workers returned by
r.select_next_worker()/r.peek_next_worker() to those that satisfy the
constraints or reject the request with an error when no compatible worker
exists, ensuring behavior matches the KvRouter branch.
- Around line 24-82: resolve_prefill_worker currently skips validating a
provided preselected_worker against routing constraints/required_taints,
allowing incompatible bootstraps; update the preselected_worker branch in
resolve_prefill_worker to perform the same compatibility check used by
query_prefill_worker (reuse the pinned-worker compatibility logic or call the
same validator) using routing_constraints (and required_taints) derived from
req.routing and req.block_mm_routing_info(), and if validation fails return
PrefillResolveDecision::Unavailable (or the same error path as
query_prefill_worker) instead of accepting the worker; reference
resolve_prefill_worker, preselected_worker, routing_constraints,
required_taints, query_prefill_worker, and PrefillResolveDecision in your
change.
---
Nitpick comments:
In `@lib/kv-router/src/protocols.rs`:
- Around line 174-215: Add unit tests for RoutingConstraints covering is_empty,
has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches,
and preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify
empty/default constraints, constraints with required_taints that both match and
don't match worker_taints (use
RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints),
verify preferred_taint_matches returns correct counts for overlapping keys, and
assert preferred_taint_multiplier produces expected numeric behavior for known
weights (include cases like no preferred_taints -> multiplier ==
(-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp
flow). Reference the RoutingConstraints struct and its methods by name when
locating code to test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8ac1def6-94fa-40df-9d21-515ac8551fbb
📒 Files selected for processing (22)
lib/bindings/c/src/lib.rslib/bindings/python/rust/lib.rslib/bindings/python/rust/llm/kv.rslib/bindings/python/rust/llm/local_model.rslib/bindings/python/src/dynamo/_core.pyilib/kv-router/src/protocols.rslib/kv-router/src/scheduling/local.rslib/kv-router/src/scheduling/policy.rslib/kv-router/src/scheduling/queue.rslib/kv-router/src/scheduling/selector.rslib/kv-router/src/scheduling/types.rslib/kv-router/src/test_utils.rslib/llm/src/kv_router.rslib/llm/src/kv_router/prefill_router/execution.rslib/llm/src/kv_router/push_router.rslib/llm/src/kv_router/scheduler.rslib/llm/src/local_model/runtime_config.rslib/llm/src/preprocessor.rslib/llm/src/protocols/common/preprocessor.rslib/llm/src/protocols/openai/nvext.rslib/mocker/src/replay/offline/components/router.rslib/mocker/src/replay/online/router.rs
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
I added a commit. I am not sure if |
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
To use Codex here, create an environment for this repo. |
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
…ueueing computes the right thing Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
/ok to test f802890 |
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
|
/ok to test 6aca329 |
Main PR #9558 (`feat: introduce RoutingConstraints + worker taints`) added `from dynamo._core import RoutingConstraints` to `dynamo.llm.__init__`. Without this symbol in the testbed runtime stub, every planner unit test that transitively imports `dynamo.common.utils.endpoint_types` (-> `dynamo.llm`) fails at collection time on dev machines that don't have the maturin-built native binding. Add `RoutingConstraints` to the stubbed class list in `tests/testbed/_runtime_stub.py`. Idempotent, doesn't affect CI or the dev pod (real `dynamo._core` is loaded there). Signed-off-by: Kai Ma <kaim@nvidia.com>
Adds the synthetic-metrics testbed: deterministic, no GPU, no cluster. Alpha-class (27 scenarios, A1-F26): synthetic fleet + fake metrics + fake actuator drive the planner through fault-injection scenarios covering AIC drift, NVML clamps, K8s RBAC denials, node loss / recovery, Prometheus outages, MDC gaps, budget shrinkage, AIC infeasibility, and drift-threshold boundary cases. Gamma-class (3 scenarios, G1-G3): mocker-driven trace replay with synthetic-power overlay (replay/synthetic_power_overlay.py) and a power-aware replay adapter (replay/power_aware_replay_adapter.py) exercise the closed loop against real Mooncake traces. Infrastructure: - runner.py + scenarios.py + assertions.py + recorder.py + clock.py (run-loop, scenario loader, invariant checks, recording). - synthetic_fleet.py + fake_actuator.py + fake_aic.py + fake_planner_metrics.py + fake_prometheus.py (test doubles for every external dependency of the planner run loop). - _runtime_stub.py installs a stub dynamo._core when the compiled Rust binding is absent, so the testbed runs on developer laptops without a CUDA toolchain (carries every dynamo._core symbol used by dynamo.llm at module-load time, including the post-rebase RoutingConstraints addition from main PR #9558). - grafana/testbed_dashboard.json + systems/ (h100_pcie / h100_sxm / h200_sxm SKUs) + traces/placeholder_h200_disagg_1rps.jsonl provide a complete observability + replay stack. Production code: - offline/replay_adapter.py: 9-line timing fix (now_s = max(tick.at_s, bridge_now_s)) prevents stale-tick loops on sparse traces. 86 testbed tests + 30 scenarios (27 alpha + 3 gamma) ship green at this tip (1 skipped pending env-var; test_aic_real_data.py is module-skipped unless AIC_SANDBOX_DIR is set). Part of the PR #9369 split (PR 4 of 6). See docs/design-docs/pr9369-split-plan.md. Signed-off-by: Kai Ma <kaim@nvidia.com>
Adds the synthetic-metrics testbed: deterministic, no GPU, no cluster. Alpha-class (27 scenarios, A1-F26): synthetic fleet + fake metrics + fake actuator drive the planner through fault-injection scenarios covering AIC drift, NVML clamps, K8s RBAC denials, node loss / recovery, Prometheus outages, MDC gaps, budget shrinkage, AIC infeasibility, and drift-threshold boundary cases. Gamma-class (3 scenarios, G1-G3): mocker-driven trace replay with synthetic-power overlay (replay/synthetic_power_overlay.py) and a power-aware replay adapter (replay/power_aware_replay_adapter.py) exercise the closed loop against real Mooncake traces. Infrastructure: - runner.py + scenarios.py + assertions.py + recorder.py + clock.py (run-loop, scenario loader, invariant checks, recording). - synthetic_fleet.py + fake_actuator.py + fake_aic.py + fake_planner_metrics.py + fake_prometheus.py (test doubles for every external dependency of the planner run loop). - _runtime_stub.py installs a stub dynamo._core when the compiled Rust binding is absent, so the testbed runs on developer laptops without a CUDA toolchain (carries every dynamo._core symbol used by dynamo.llm at module-load time, including the post-rebase RoutingConstraints addition from main PR #9558). - grafana/testbed_dashboard.json + systems/ (h100_pcie / h100_sxm / h200_sxm SKUs) + traces/placeholder_h200_disagg_1rps.jsonl provide a complete observability + replay stack. Production code: - offline/replay_adapter.py: 9-line timing fix (now_s = max(tick.at_s, bridge_now_s)) prevents stale-tick loops on sparse traces. 86 testbed tests + 30 scenarios (27 alpha + 3 gamma) ship green at this tip (1 skipped pending env-var; test_aic_real_data.py is module-skipped unless AIC_SANDBOX_DIR is set). Part of the PR #9369 split (PR 4 of 6). See docs/design-docs/pr9369-split-plan.md. Signed-off-by: Kai Ma <kaim@nvidia.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil 63565275+michaelfeil@users.noreply.github.com
Overview:
This adds worker taints plus request-side routing constraints for KV router selection.
Current shape:
taintsRoutingConstraintsrequired_taintspreferred_taintsis accepted and used to bias traffic towards prefered worekrs.The implementation keeps the model asymmetric:
This is intended to support placement boundaries such as keeping traffic within a specific MDC / NVL domain, while also leaving room for future soft-preference routing.
Motivation
required_taints: hard placement constraints, for example keeping prefill/disagg within a domain boundary. set of taints.preferred_taints: soft heuristics biasing, for example steering different applications toward fast/slow or short/long-context worker pools. Also things like how well two components are networked etc. hashmap of taints and their weight on a (-1,1) scale, with 0 being neutral preference (no impact) values towards +1 steering to pick a worker with this taint, and -1Semantics
A request is eligible for a worker only if every entry in
routing_constraints.required_taintsis present in the worker taint set.preferred_taintscurrently has no effect in the default Nvidia Dynamo router policy.API Changes
ModelRuntimeConfig.taints: HashSet<String>RouterRequest::New { routing_constraints: RoutingConstraints, ... }RoutingConstraints { required_taints: HashSet<String>, preferred_taints: HashSet<String> }-> hope is to have pinned worker or allowed worker ids also in here.ModelRuntimeConfig.taintsstill presents asset[str], but converts to a set internallyRoutingConstraints(...)is available -> We could make it not available if there are containts. I am sceptical exposing e.g. allowed_worker_ids on the http entrypoint.Notes
HashSet<String>) rather than list semantics.preferred_taintsis intentionally accepted now so custom selectors can use it later, but the default selector does not score on it yet. Its up to user to implement something, that is now wired to the core (we will)Details:
Where should the reviewer start?
lib/kv-router/src/protocols.rslib/kv-router/src/scheduling/selector.rslib/llm/src/kv_router.rslib/llm/src/protocols/openai/nvext.rsRelated Issues:
Summary by CodeRabbit