feat: introduce RoutingConstraints + worker taints by michaelfeil · Pull Request #9558 · ai-dynamo/dynamo

michaelfeil · 2026-05-14T17:00:29Z

Signed-off-by: michaelfeil 63565275+michaelfeil@users.noreply.github.com

Overview:

RFC: Adding Routable Taints, Tiering and Tainting Workers #5462

This adds worker taints plus request-side routing constraints for KV router selection.

Current shape:

Workers advertise taints
Requests may carry RoutingConstraints
Routing enforces required_taints
preferred_taints is accepted and used to bias traffic towards prefered worekrs.

The implementation keeps the model asymmetric:

worker side: facts/capabilities
request side: matching policy

This is intended to support placement boundaries such as keeping traffic within a specific MDC / NVL domain, while also leaving room for future soft-preference routing.

Motivation

required_taints: hard placement constraints, for example keeping prefill/disagg within a domain boundary. set of taints.
preferred_taints: soft heuristics biasing, for example steering different applications toward fast/slow or short/long-context worker pools. Also things like how well two components are networked etc. hashmap of taints and their weight on a (-1,1) scale, with 0 being neutral preference (no impact) values towards +1 steering to pick a worker with this taint, and -1

Semantics

A request is eligible for a worker only if every entry in routing_constraints.required_taints is present in the worker taint set.

preferred_taints currently has no effect in the default Nvidia Dynamo router policy.

API Changes

Worker runtime config:
- ModelRuntimeConfig.taints: HashSet<String>
Internal router request:
- RouterRequest::New { routing_constraints: RoutingConstraints, ... }
Request-side constraint type:
- RoutingConstraints { required_taints: HashSet<String>, preferred_taints: HashSet<String> } -> hope is to have pinned worker or allowed worker ids also in here.
Python:
- ModelRuntimeConfig.taints still presents as set[str], but converts to a set internally
- request-side RoutingConstraints(...) is available -> We could make it not available if there are containts. I am sceptical exposing e.g. allowed_worker_ids on the http entrypoint.

Notes

Taints and routing constraints now use set semantics internally (HashSet<String>) rather than list semantics.
OpenAPI exposes a concrete schema for request routing constraints.
preferred_taints is intentionally accepted now so custom selectors can use it later, but the default selector does not score on it yet. Its up to user to implement something, that is now wired to the core (we will)

Details:

Adds worker taints to runtime config and worker config trait surfaces.
Threads request-side routing constraints through router, scheduler, prefill, push-router, Python bindings, and C bindings.
Adds selector and queue coverage for incompatible required-taint routing cases.

Where should the reviewer start?

lib/kv-router/src/protocols.rs
lib/kv-router/src/scheduling/selector.rs
lib/llm/src/kv_router.rs
lib/llm/src/protocols/openai/nvext.rs

Related Issues:

Relates to discussion: RFC: Adding Routable Taints, Tiering and Tainting Workers #5462

Summary by CodeRabbit

New Features
- Request-level routing constraints: clients can provide required and preferred worker taints to influence routing.
- Model/runtime APIs expose taint configuration so models can advertise eligibility.
- Routers and schedulers honor routing constraints: worker selection, queueing, and prefill/push paths prefer or restrict workers accordingly.
- OpenAPI/SDK surfaces routing constraints and optional parameters for routing in requests.

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

copy-pr-bot · 2026-05-14T17:00:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-14T17:00:44Z

👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-05-14T17:08:39Z

Walkthrough

This PR introduces RoutingConstraints—a structure containing required and preferred worker taints—and threads it through request processing, scheduling, worker selection, router pipelines, and Python bindings. The feature enables clients to specify worker compatibility constraints via taints (required sets and soft-preference weights) that are evaluated during worker selection to filter candidates and influence scoring.

Changes

Routing Constraints Implementation and Integration

Layer / File(s)	Summary
RoutingConstraints protocol and trait extension `lib/kv-router/src/protocols.rs`	Introduces `RoutingConstraints` struct with `required_taints: HashSet<String>` and `preferred_taints: HashMap<String, f32>`, plus compatibility and multiplier-calculation methods. Extends `WorkerConfigLike` trait with `taints()` accessor. Adds `routing_constraints` field to `RouterRequest::New` with serde omission when empty.
Scheduling request and worker config integration `lib/kv-router/src/scheduling/types.rs`, `lib/llm/src/local_model/runtime_config.rs`, `lib/kv-router/src/test_utils.rs`	`SchedulingRequest` gains `routing_constraints` field. `ModelRuntimeConfig` adds `taints` field and `WorkerConfigLike::taints()` implementation. Test utility `SimpleWorkerConfig` adds `taints` field.
LocalScheduler and queue admission with constraint filtering `lib/kv-router/src/scheduling/local.rs`, `lib/kv-router/src/scheduling/queue.rs`	`LocalScheduler::schedule` accepts and forwards `routing_constraints` parameter. Queue admission (`all_workers_busy`) filters workers by constraint compatibility, rejecting incompatible pinned or candidate workers. All test call sites updated with default constraints.
Worker selection with taints compatibility `lib/kv-router/src/scheduling/selector.rs`	`DefaultWorkerSelector::select_worker` filters non-pinned candidates by compatibility, validates pinned worker taints, and applies preferred taint multiplier scoring. Extended with `TaintedWorkerConfig` test utility and new test cases for required and preferred taints routing.
KvRouter method signatures and request handling `lib/llm/src/kv_router.rs`	`KvRouter::find_best_match_details` and `find_best_match` accept `routing_constraints` and forward to scheduler. `AsyncEngine` handler for `RouterRequest::New` destructures and passes constraints. Test call sites updated.
KvScheduler and prefill router integration `lib/llm/src/kv_router/scheduler.rs`, `lib/llm/src/kv_router/prefill_router/execution.rs`	`KvScheduler::schedule` accepts and forwards `routing_constraints`. Prefill router extracts constraints from request routing hints (with default fallback) and passes to `query_prefill_worker`.
Push router and replay routers `lib/llm/src/kv_router/push_router.rs`, `lib/mocker/src/replay/offline/components/router.rs`, `lib/mocker/src/replay/online/router.rs`	Push router extracts and threads constraints for both KV-overlap and pinned-worker paths, with hard-constraint validation for pinned workers. Replay routers supply default constraints during request construction.
Request protocols and preprocessing `lib/llm/src/protocols/common/preprocessor.rs`, `lib/llm/src/protocols/openai/nvext.rs`, `lib/llm/src/preprocessor.rs`	`RoutingHints` and `NvExt` add optional `routing_constraints` fields. `RoutingConstraintsSchema` added for OpenAPI with validation rejecting weights outside `(-1.0, 1.0)`. `OpenAIPreprocessor` clones constraints into routing hints.
C FFI routing constraints threading `lib/bindings/c/src/lib.rs`	`preprocess_request` returns three-tuple including `routing_constraints`. `RouterHandles` methods accept and forward constraints through prefill and decode paths. `route_prefill_request` and `route_decode_request` destructure and pass constraints.
Python RoutingConstraints binding and KvRouter signatures `lib/bindings/python/rust/llm/local_model.rs`, `lib/bindings/python/rust/llm/kv.rs`, `lib/bindings/python/rust/lib.rs`	New PyO3 `RoutingConstraints` wrapper with `required_taints` and `preferred_taints` fields and `From` conversions. `ModelRuntimeConfig` extended with taints getter/setter. `KvRouter.generate()` and `best_worker()` gain optional `routing_constraints=None` parameter. Module exports `RoutingConstraints` class.
Python type stubs and documentation `lib/bindings/python/src/dynamo/_core.pyi`	Type stub declares `RoutingConstraints` class with field and initializer signatures. `ModelRuntimeConfig` stub adds `taints: Set[str]` field. `KvRouter` method signatures and docstring updated for routing constraints parameter.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description check	✅ Passed	Description covers overview, motivation, semantics, API changes, implementation details, and reviewer guidance. Follows template structure with all key sections populated.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title clearly and concisely summarizes the main feature addition: introducing RoutingConstraints and worker taints for routing logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/kv-router/src/scheduling/local.rs`:
- Around line 189-190: The schedule() signature was changed to require a new
taints: Taints parameter so all local callers must be updated; find calls to
schedule(...) in this file (and tests) and pass an appropriate Taints value
(e.g., Taints::default() or a specific Taints construct) as the new argument,
ensuring the order matches the signature (taints before shared_cache_hits if
applicable), and update any calls that construct or forward SharedCacheHits
(crate::SharedCacheHits) accordingly so compilation succeeds.

In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 250-253: The pinned-worker taints check is rejecting workers when
request.taints is empty; update the conditional around the
workers.get(&worker.worker_id).is_some_and(...) check so it only performs the
taints compatibility test if request.taints is non-empty (e.g., guard with
!request.taints.is_empty()). In practice, modify the if that references workers,
worker.worker_id, config.taints() and request.taints to skip the
is_compatible_with call when request.taints is empty so pinned and non-pinned
paths behave consistently.

In `@lib/kv-router/src/scheduling/types.rs`:
- Line 65: The SchedulingRequest struct gained a new mandatory field taints
(type Taints), so update every SchedulingRequest literal to include a taints:
... entry; search for constructor/struct literals of SchedulingRequest (e.g., in
lib/kv-router/src/scheduling/selector.rs tests and any test helpers/factories)
and add a sensible value (likely Taints::default() or an empty/neutral taints
value provided by the Taints type) to each literal so the code compiles with the
new field.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5bafd7ee-b0e5-4ebe-a55f-dda608952eef

📥 Commits

Reviewing files that changed from the base of the PR and between a2ac896 and c796c28.

📒 Files selected for processing (19)

lib/bindings/python/rust/lib.rs
lib/bindings/python/rust/llm/kv.rs
lib/bindings/python/rust/llm/local_model.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/bindings/python/src/dynamo/llm/__init__.py
lib/kv-router/src/protocols.rs
lib/kv-router/src/scheduling/local.rs
lib/kv-router/src/scheduling/selector.rs
lib/kv-router/src/scheduling/types.rs
lib/llm/src/kv_router.rs
lib/llm/src/kv_router/prefill_router/execution.rs
lib/llm/src/kv_router/push_router.rs
lib/llm/src/kv_router/scheduler.rs
lib/llm/src/local_model/runtime_config.rs
lib/llm/src/preprocessor.rs
lib/llm/src/protocols/common/preprocessor.rs
lib/llm/src/protocols/openai/nvext.rs
lib/mocker/src/replay/offline/components/router.rs
lib/mocker/src/replay/online/router.rs

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com> # Conflicts: # lib/llm/src/kv_router.rs

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

michaelfeil · 2026-05-14T18:03:24Z

@CodeRabbit full review. I changed the PR quite a bit, so a pr from scratch would be helpful.

coderabbitai · 2026-05-14T18:03:36Z

@michaelfeil Sure! I'll kick off a fresh full review of your PR from scratch right away.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

lib/llm/src/kv_router/prefill_router/execution.rs (1)

37-75: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Preselected prefill workers bypass taints.required validation.

Line 37 directly accepts preselected_worker and skips taints checks, while taints are only applied in the non-preselected branch (Line 61 onward). This breaks uniform taint eligibility guarantees.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 37 - 75,
Preselected prefill workers currently skip taints validation; when
preselected_worker is Some(id) you must apply the same taints eligibility checks
used in the non-preselected branch: extract taints from req.routing (as done in
the else branch), then validate that the selected worker id satisfies
taints.required constraints (reusing the same taints matching logic or helper
used by query_prefill_worker); if the worker fails the taints check, reject it
(return an error or fall through to the shared selection path) and log the
rejection (include worker_id and failing taint), otherwise proceed to set
(worker_id, dp_rank) as before. Ensure you reference preselected_worker,
req.routing, taints, and query_prefill_worker so the validation code is
colocated with the existing routing logic.

lib/llm/src/kv_router/push_router.rs (1)

379-466: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Taint eligibility is bypassed on pinned-worker fallback.

Line 397’s fallback path can return/route the pinned worker without a taints compatibility check, while taints are only enforced in the find_best_match_details branches. This allows taints.required to be skipped for pinned requests in that path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/push_router.rs` around lines 379 - 466, The
pinned-worker fallback currently skips taints validation (when
resolved_pinned_worker is Some or None and WorkerWithDpRank::new is used),
allowing routes that ignore taints.required; ensure you run the same taints
compatibility check used in find_best_match_details before computing cache hits
or returning Ok: validate the pinned_worker_id / worker against the incoming
taints and reject or return an error (or fall back) if incompatible, only call
chooser.get_cache_hit_estimate and chooser.add_request when the taints check
passes, and set scheduler_tracked = true only when the worker passed taints
validation (and !is_query_only and resolved_dp_rank.is_some()).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Line 2041: Update the doc comment for the "taints" parameter to explicitly
state that only "required" taints are currently enforced by frontend routing and
that "preferred" taints are not acted upon yet; mention the symbols "taints" and
"preferred" so readers know which fields this applies to and add a brief note
advising callers that preferred taints are a non-enforced hint until frontend
routing is implemented to respect them.

---

Outside diff comments:
In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 37-75: Preselected prefill workers currently skip taints
validation; when preselected_worker is Some(id) you must apply the same taints
eligibility checks used in the non-preselected branch: extract taints from
req.routing (as done in the else branch), then validate that the selected worker
id satisfies taints.required constraints (reusing the same taints matching logic
or helper used by query_prefill_worker); if the worker fails the taints check,
reject it (return an error or fall through to the shared selection path) and log
the rejection (include worker_id and failing taint), otherwise proceed to set
(worker_id, dp_rank) as before. Ensure you reference preselected_worker,
req.routing, taints, and query_prefill_worker so the validation code is
colocated with the existing routing logic.

In `@lib/llm/src/kv_router/push_router.rs`:
- Around line 379-466: The pinned-worker fallback currently skips taints
validation (when resolved_pinned_worker is Some or None and
WorkerWithDpRank::new is used), allowing routes that ignore taints.required;
ensure you run the same taints compatibility check used in
find_best_match_details before computing cache hits or returning Ok: validate
the pinned_worker_id / worker against the incoming taints and reject or return
an error (or fall back) if incompatible, only call
chooser.get_cache_hit_estimate and chooser.add_request when the taints check
passes, and set scheduler_tracked = true only when the worker passed taints
validation (and !is_query_only and resolved_dp_rank.is_some()).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36b4ba48-b11d-457e-970f-8da537d407e3

📥 Commits

Reviewing files that changed from the base of the PR and between 9643e64 and 70c5c71.

📒 Files selected for processing (19)

lib/bindings/python/rust/lib.rs
lib/bindings/python/rust/llm/kv.rs
lib/bindings/python/rust/llm/local_model.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/bindings/python/src/dynamo/llm/__init__.py
lib/kv-router/src/protocols.rs
lib/kv-router/src/scheduling/local.rs
lib/kv-router/src/scheduling/selector.rs
lib/kv-router/src/scheduling/types.rs
lib/llm/src/kv_router.rs
lib/llm/src/kv_router/prefill_router/execution.rs
lib/llm/src/kv_router/push_router.rs
lib/llm/src/kv_router/scheduler.rs
lib/llm/src/local_model/runtime_config.rs
lib/llm/src/preprocessor.rs
lib/llm/src/protocols/common/preprocessor.rs
lib/llm/src/protocols/openai/nvext.rs
lib/mocker/src/replay/offline/components/router.rs
lib/mocker/src/replay/online/router.rs

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

tmonty12 · 2026-05-14T19:07:48Z

/ok to test bd2a681

PeaBrane

One broader shape suggestion: since this PR now has two levels of eligibility filtering (allowed_worker_ids plus hard required_taints), I think it would be worth centralizing that in a tiny borrowed helper rather than letting queue admission, selector filtering, and any queue cost estimation each reconstruct the same predicate.

The shape I have in mind is intentionally small and should compile away: something like a RoutingEligibility<'a> holding Option<&HashSet<WorkerId>> and &RoutingConstraints, with #[inline] methods for allows_worker_id(...) and allows_worker(worker_id, config). That keeps the combined rule (allowed_worker_ids + required taints) in one place without allocation, cloning, trait objects, or dynamic dispatch.

That helper would also make the WSPT fix less error-prone, because the same eligibility predicate could be reused when estimating best effective prefill tokens. Separately, RoutingConstraints::is_compatible_with_worker_taints() and preferred_taint_multiplier() should short-circuit the empty/default cases so the common no-constraints path stays essentially zero overhead.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (6)

lib/bindings/python/src/dynamo/_core.pyi (1)
550-568: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mirror the current preferred_taints behavior in the Python docs.

This stub still describes preferred_taints as actively affecting scoring, but the default selector only consumes required_taints today. Please mark preferred_taints as accepted-but-not-enforced until frontend routing actually uses it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/bindings/python/src/dynamo/_core.pyi` around lines 550 - 568, The
docstring for class RoutingConstraints inaccurately claims preferred_taints
affects scoring; update the RoutingConstraints docstring (and mention in the
__init__ signature if present) to state that preferred_taints is currently
accepted but not enforced/used by the default selector—i.e., keep
required_taints behavior unchanged, but explicitly mark preferred_taints as a
no-op/for-future-use until frontend routing consumes it so readers understand it
does not impact scoring today.
lib/llm/src/kv_router/prefill_router/execution.rs (2)
273-316: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or filter hard-constrained requests in SimpleRouter mode.

routing_constraints is forwarded to the KV-router branch only. In the SimpleRouter branch, required_taints are ignored, so non-KV prefill routing can still choose an incompatible worker.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 273 - 316,
query_prefill_worker forwards routing_constraints only to the KvRouter branch so
the SimpleRouter branch (InnerPrefillRouter::SimpleRouter) ignores
required_taints and may pick incompatible workers; update the SimpleRouter path
in query_prefill_worker to first inspect routing_constraints.required_taints
(and other hard constraints like allowed_worker_ids) and either filter available
workers returned by r.select_next_worker()/r.peek_next_worker() to those that
satisfy the constraints or reject the request with an error when no compatible
worker exists, ensuring behavior matches the KvRouter branch.
24-82: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate preselected_worker against required_taints.

This method now threads routing_constraints through the normal selection path, but the preselected_worker branch still bypasses that check entirely. A request with hard constraints can therefore bootstrap against an incompatible prefill worker. Reuse the pinned-worker compatibility check here, or reject preselected_worker when the constraints cannot be validated.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 24 - 82,
resolve_prefill_worker currently skips validating a provided preselected_worker
against routing constraints/required_taints, allowing incompatible bootstraps;
update the preselected_worker branch in resolve_prefill_worker to perform the
same compatibility check used by query_prefill_worker (reuse the pinned-worker
compatibility logic or call the same validator) using routing_constraints (and
required_taints) derived from req.routing and req.block_mm_routing_info(), and
if validation fails return PrefillResolveDecision::Unavailable (or the same
error path as query_prefill_worker) instead of accepting the worker; reference
resolve_prefill_worker, preselected_worker, routing_constraints,
required_taints, query_prefill_worker, and PrefillResolveDecision in your
change.
lib/llm/src/kv_router.rs (1)
690-716: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the old find_best_match entry point.

This adds RoutingConstraints to a public method, so existing out-of-tree Rust callers will stop compiling. Please keep the old signature as a default-constraints wrapper and move the taints-aware path to a separate method name.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router.rs` around lines 690 - 716, Restore the original public
API by keeping the old find_best_match signature (without RoutingConstraints) as
a thin wrapper that calls the existing taints-aware implementation (rename that
implementation to find_best_match_with_constraints or similar) and passes a
default RoutingConstraints value; specifically, keep a public fn
find_best_match(&self, context_id: Option<&str>, tokens: &[u32], block_mm_infos:
Option<&[Option<BlockExtraInfo>]>, router_config_override:
Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>,
priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids:
Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that
forwards to find_best_match_with_constraints(..., routing_constraints:
RoutingConstraints::default()), and rename the current method that accepts
RoutingConstraints to the new taints-aware name (e.g.,
find_best_match_with_constraints) so callers who need taint behavior can use it
while existing callers keep compiling.
lib/kv-router/src/scheduling/selector.rs (1)
287-296: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply preferred-taint bias independently of the base cost.

Multiplying base_score by the preferred-taint multiplier makes soft preferences a no-op whenever the computed cost is 0.0, so fully idle or fully cached candidates can still tie onto an explicitly avoided taint. This needs an additive bias or another tie-breaker that still differentiates zero-cost workers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/scheduling/selector.rs` around lines 287 - 296, The
get_score closure currently multiplies base_score by preferred_taint_multiplier,
which makes taint preferences ineffective when base_score == 0. Update the
get_score closure (and related logic around worker_logit and
request.routing_constraints.preferred_taint_multiplier) to apply the taint
preference as an additive bias independent of base_score — e.g., compute
base_score = self.worker_logit(...), compute taint_mult =
request.routing_constraints.preferred_taint_multiplier(config.taints()), compute
a small additive bias (constant or epsilon-scaled value) derived from taint_mult
(for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score +
bias so zero-cost workers are still differentiated by taint preference.
lib/kv-router/src/protocols.rs (1)
174-180: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Non-finite weight validation still missing.

Serde will deserialize NaN and ±infinity values into preferred_taints without validation. When these flow into preferred_taint_multiplier(), they propagate through tanh and exp, producing all-NaN scores that can panic downstream selection logic (e.g., temperature-0 argmax).

Add a custom Deserialize impl or validation method that rejects or neutralizes non-finite weights at the API boundary.
🛡️ Example validation approach
impl RoutingConstraints {
    pub fn validate(&self) -> Result<(), &'static str> {
        for (taint, weight) in &self.preferred_taints {
            if !weight.is_finite() {
                return Err("preferred_taints must have finite weights");
            }
        }
        Ok(())
    }
}
Or filter at construction/deserialization time.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/protocols.rs` around lines 174 - 180, The preferred_taints
map in struct RoutingConstraints can contain non-finite floats (NaN/±inf) from
serde deserialization; add validation to reject or sanitize these values at the
API boundary by either implementing a custom Deserialize for RoutingConstraints
or adding a validate() method called right after deserialization that iterates
over self.preferred_taints and returns an error if any weight is not finite;
ensure callers (or the deserializer) enforce this so
preferred_taint_multiplier() never receives non-finite inputs.

🧹 Nitpick comments (1)

lib/kv-router/src/protocols.rs (1)

174-215: ⚡ Quick win

Consider adding unit tests for RoutingConstraints methods.

The new RoutingConstraints methods (is_empty, has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches, preferred_taint_multiplier) have no test coverage. Adding tests would document expected behavior, catch edge cases (empty sets, boundary conditions, formula correctness), and prevent regressions.

📝 Example test structure

#[cfg(test)]
mod routing_constraints_tests {
    use super::*;

    #[test]
    fn test_is_compatible_with_worker_taints() {
        let constraints = RoutingConstraints {
            required_taints: ["gpu", "us-west"].iter().map(|s| s.to_string()).collect(),
            ..Default::default()
        };
        
        let worker_taints_match: HashSet<String> = 
            ["gpu", "us-west", "extra"].iter().map(|s| s.to_string()).collect();
        assert!(constraints.is_compatible_with_worker_taints(&worker_taints_match));
        
        let worker_taints_missing: HashSet<String> = 
            ["gpu"].iter().map(|s| s.to_string()).collect();
        assert!(!constraints.is_compatible_with_worker_taints(&worker_taints_missing));
    }
    
    // Add tests for other methods...
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/protocols.rs` around lines 174 - 215, Add unit tests for
RoutingConstraints covering is_empty, has_hard_constraints,
is_compatible_with_worker_taints, preferred_taint_matches, and
preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify
empty/default constraints, constraints with required_taints that both match and
don't match worker_taints (use
RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints),
verify preferred_taint_matches returns correct counts for overlapping keys, and
assert preferred_taint_multiplier produces expected numeric behavior for known
weights (include cases like no preferred_taints -> multiplier ==
(-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp
flow). Reference the RoutingConstraints struct and its methods by name when
locating code to test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 288-306: The doc comment on
RoutingConstraintsSchema::preferred_taints incorrectly documents active scoring
behavior (tanh, summing, positive multiplier) which isn't implemented; update
the comment on the preferred_taints field to a neutral description (e.g.,
"Optional soft preferences keyed by worker taint; implementations may use these
as tie-breakers or heuristics") and remove the specifics about weighting,
summing, and tanh so the schema doesn't promise unimplemented scoring; keep the
required_taints doc as-is and ensure the symbol names referenced are
RoutingConstraintsSchema and preferred_taints.

---

Duplicate comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Around line 550-568: The docstring for class RoutingConstraints inaccurately
claims preferred_taints affects scoring; update the RoutingConstraints docstring
(and mention in the __init__ signature if present) to state that
preferred_taints is currently accepted but not enforced/used by the default
selector—i.e., keep required_taints behavior unchanged, but explicitly mark
preferred_taints as a no-op/for-future-use until frontend routing consumes it so
readers understand it does not impact scoring today.

In `@lib/kv-router/src/protocols.rs`:
- Around line 174-180: The preferred_taints map in struct RoutingConstraints can
contain non-finite floats (NaN/±inf) from serde deserialization; add validation
to reject or sanitize these values at the API boundary by either implementing a
custom Deserialize for RoutingConstraints or adding a validate() method called
right after deserialization that iterates over self.preferred_taints and returns
an error if any weight is not finite; ensure callers (or the deserializer)
enforce this so preferred_taint_multiplier() never receives non-finite inputs.

In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 287-296: The get_score closure currently multiplies base_score by
preferred_taint_multiplier, which makes taint preferences ineffective when
base_score == 0. Update the get_score closure (and related logic around
worker_logit and request.routing_constraints.preferred_taint_multiplier) to
apply the taint preference as an additive bias independent of base_score — e.g.,
compute base_score = self.worker_logit(...), compute taint_mult =
request.routing_constraints.preferred_taint_multiplier(config.taints()), compute
a small additive bias (constant or epsilon-scaled value) derived from taint_mult
(for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score +
bias so zero-cost workers are still differentiated by taint preference.

In `@lib/llm/src/kv_router.rs`:
- Around line 690-716: Restore the original public API by keeping the old
find_best_match signature (without RoutingConstraints) as a thin wrapper that
calls the existing taints-aware implementation (rename that implementation to
find_best_match_with_constraints or similar) and passes a default
RoutingConstraints value; specifically, keep a public fn find_best_match(&self,
context_id: Option<&str>, tokens: &[u32], block_mm_infos:
Option<&[Option<BlockExtraInfo>]>, router_config_override:
Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>,
priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids:
Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that
forwards to find_best_match_with_constraints(..., routing_constraints:
RoutingConstraints::default()), and rename the current method that accepts
RoutingConstraints to the new taints-aware name (e.g.,
find_best_match_with_constraints) so callers who need taint behavior can use it
while existing callers keep compiling.

In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 273-316: query_prefill_worker forwards routing_constraints only to
the KvRouter branch so the SimpleRouter branch
(InnerPrefillRouter::SimpleRouter) ignores required_taints and may pick
incompatible workers; update the SimpleRouter path in query_prefill_worker to
first inspect routing_constraints.required_taints (and other hard constraints
like allowed_worker_ids) and either filter available workers returned by
r.select_next_worker()/r.peek_next_worker() to those that satisfy the
constraints or reject the request with an error when no compatible worker
exists, ensuring behavior matches the KvRouter branch.
- Around line 24-82: resolve_prefill_worker currently skips validating a
provided preselected_worker against routing constraints/required_taints,
allowing incompatible bootstraps; update the preselected_worker branch in
resolve_prefill_worker to perform the same compatibility check used by
query_prefill_worker (reuse the pinned-worker compatibility logic or call the
same validator) using routing_constraints (and required_taints) derived from
req.routing and req.block_mm_routing_info(), and if validation fails return
PrefillResolveDecision::Unavailable (or the same error path as
query_prefill_worker) instead of accepting the worker; reference
resolve_prefill_worker, preselected_worker, routing_constraints,
required_taints, query_prefill_worker, and PrefillResolveDecision in your
change.

---

Nitpick comments:
In `@lib/kv-router/src/protocols.rs`:
- Around line 174-215: Add unit tests for RoutingConstraints covering is_empty,
has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches,
and preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify
empty/default constraints, constraints with required_taints that both match and
don't match worker_taints (use
RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints),
verify preferred_taint_matches returns correct counts for overlapping keys, and
assert preferred_taint_multiplier produces expected numeric behavior for known
weights (include cases like no preferred_taints -> multiplier ==
(-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp
flow). Reference the RoutingConstraints struct and its methods by name when
locating code to test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ac1def6-94fa-40df-9d21-515ac8551fbb

📥 Commits

Reviewing files that changed from the base of the PR and between 1c841c4 and 65596fe.

📒 Files selected for processing (22)

lib/bindings/c/src/lib.rs
lib/bindings/python/rust/lib.rs
lib/bindings/python/rust/llm/kv.rs
lib/bindings/python/rust/llm/local_model.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/kv-router/src/protocols.rs
lib/kv-router/src/scheduling/local.rs
lib/kv-router/src/scheduling/policy.rs
lib/kv-router/src/scheduling/queue.rs
lib/kv-router/src/scheduling/selector.rs
lib/kv-router/src/scheduling/types.rs
lib/kv-router/src/test_utils.rs
lib/llm/src/kv_router.rs
lib/llm/src/kv_router/prefill_router/execution.rs
lib/llm/src/kv_router/push_router.rs
lib/llm/src/kv_router/scheduler.rs
lib/llm/src/local_model/runtime_config.rs
lib/llm/src/preprocessor.rs
lib/llm/src/protocols/common/preprocessor.rs
lib/llm/src/protocols/openai/nvext.rs
lib/mocker/src/replay/offline/components/router.rs
lib/mocker/src/replay/online/router.rs

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

michaelfeil · 2026-05-15T20:28:54Z

I added a commit. I am not sure if #inline is still need. I like the helper and adding more things like allowed_worker_id's to it.

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

chatgpt-codex-connector · 2026-05-15T21:02:46Z

To use Codex here, create an environment for this repo.

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

…ueueing computes the right thing Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

PeaBrane · 2026-05-15T22:40:50Z

/ok to test f802890

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

PeaBrane · 2026-05-15T23:53:33Z

/ok to test 6aca329

Main PR #9558 (`feat: introduce RoutingConstraints + worker taints`) added `from dynamo._core import RoutingConstraints` to `dynamo.llm.__init__`. Without this symbol in the testbed runtime stub, every planner unit test that transitively imports `dynamo.common.utils.endpoint_types` (-> `dynamo.llm`) fails at collection time on dev machines that don't have the maturin-built native binding. Add `RoutingConstraints` to the stubbed class list in `tests/testbed/_runtime_stub.py`. Idempotent, doesn't affect CI or the dev pod (real `dynamo._core` is loaded there). Signed-off-by: Kai Ma <kaim@nvidia.com>

Adds the synthetic-metrics testbed: deterministic, no GPU, no cluster. Alpha-class (27 scenarios, A1-F26): synthetic fleet + fake metrics + fake actuator drive the planner through fault-injection scenarios covering AIC drift, NVML clamps, K8s RBAC denials, node loss / recovery, Prometheus outages, MDC gaps, budget shrinkage, AIC infeasibility, and drift-threshold boundary cases. Gamma-class (3 scenarios, G1-G3): mocker-driven trace replay with synthetic-power overlay (replay/synthetic_power_overlay.py) and a power-aware replay adapter (replay/power_aware_replay_adapter.py) exercise the closed loop against real Mooncake traces. Infrastructure: - runner.py + scenarios.py + assertions.py + recorder.py + clock.py (run-loop, scenario loader, invariant checks, recording). - synthetic_fleet.py + fake_actuator.py + fake_aic.py + fake_planner_metrics.py + fake_prometheus.py (test doubles for every external dependency of the planner run loop). - _runtime_stub.py installs a stub dynamo._core when the compiled Rust binding is absent, so the testbed runs on developer laptops without a CUDA toolchain (carries every dynamo._core symbol used by dynamo.llm at module-load time, including the post-rebase RoutingConstraints addition from main PR #9558). - grafana/testbed_dashboard.json + systems/ (h100_pcie / h100_sxm / h200_sxm SKUs) + traces/placeholder_h200_disagg_1rps.jsonl provide a complete observability + replay stack. Production code: - offline/replay_adapter.py: 9-line timing fix (now_s = max(tick.at_s, bridge_now_s)) prevents stale-tick loops on sparse traces. 86 testbed tests + 30 scenarios (27 alpha + 3 gamma) ship green at this tip (1 skipped pending env-var; test_aic_real_data.py is module-skipped unless AIC_SANDBOX_DIR is set). Part of the PR #9369 split (PR 4 of 6). See docs/design-docs/pr9369-split-plan.md. Signed-off-by: Kai Ma <kaim@nvidia.com>

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

add taints

c796c28

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

michaelfeil requested review from a team as code owners May 14, 2026 17:00

pull-request-size Bot added the size/L label May 14, 2026

github-actions Bot added external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. feat labels May 14, 2026

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread lib/kv-router/src/scheduling/local.rs Outdated

Comment thread lib/kv-router/src/scheduling/selector.rs

Comment thread lib/kv-router/src/scheduling/types.rs Outdated

Merge remote-tracking branch 'upstream/main' into mf/taints

5cd49a9

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com> # Conflicts: # lib/llm/src/kv_router.rs

dynamo-ops reviewed May 14, 2026

View reviewed changes

Comment thread lib/bindings/python/rust/llm/local_model.rs Outdated

Comment thread lib/llm/src/protocols/openai/nvext.rs Outdated

michaelfeil added 4 commits May 14, 2026 10:14

add taints

bcb124c

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

add schema

9bc8f64

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

make taints simplified

c164501

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

do not, commit the codexx file ...

70c5c71

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread lib/bindings/python/src/dynamo/_core.pyi Outdated

michaelfeil added 5 commits May 14, 2026 11:13

fix the tests

304a5b6

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

fix the fmt

848070a

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

Merge branch 'main' into mf/taints

044328e

fix the fmt

6e934bf

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

fix the fmt

bd2a681

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 14, 2026 19:07 Inactive

dynamo-ops reviewed May 14, 2026

View reviewed changes

Comment thread lib/bindings/c/src/lib.rs Outdated

PeaBrane reviewed May 15, 2026

View reviewed changes

Comment thread lib/kv-router/src/scheduling/types.rs

Comment thread lib/kv-router/src/scheduling/selector.rs Outdated

Comment thread lib/kv-router/src/scheduling/selector.rs Outdated

Comment thread lib/bindings/python/src/dynamo/_core.pyi

Comment thread lib/kv-router/src/test_utils.rs Outdated

PeaBrane reviewed May 15, 2026

View reviewed changes

PeaBrane approved these changes May 15, 2026

View reviewed changes

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread lib/llm/src/protocols/openai/nvext.rs

add eligibility helper to bundle checks with usually inline

ce236a3

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

add re-export and test utils import fix

d45ead4

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

michaelfeil changed the title ~~feat: introduce RequestContrainsts + worker taints~~ feat: introduce RoutingConstraints + worker taints May 15, 2026

Merge branch 'main' into mf/taints

0b263d5

michaelfeil added 2 commits May 15, 2026 14:44

make api more clear to only multiply if useful

b471aff

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

make api more clear to only multiply if useful

1922ae4

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

dynamo-ops approved these changes May 15, 2026

View reviewed changes

ove pinned worker in the api, scheduled context helper to make sure q…

f802890

…ueueing computes the right thing Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

pull-request-size Bot added size/XXL and removed size/XL labels May 15, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 22:40 Inactive

michaelfeil added 2 commits May 15, 2026 16:15

add comment why heuristic is choosen

64ccf6d

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

add comment why heuristic is choosen

6aca329

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 23:53 Inactive

PeaBrane enabled auto-merge (squash) May 16, 2026 00:29

PeaBrane merged commit a466327 into ai-dynamo:main May 16, 2026
99 checks passed

copy-pr-bot Bot had a problem deploying to GITLAB May 16, 2026 03:23 Failure

tmonty12 mentioned this pull request May 19, 2026

DEP: Topology-Aware KV Transfer Routing Constraints #9118

Open

ryanolson pushed a commit that referenced this pull request May 20, 2026

feat: introduce RoutingConstraints + worker taints (#9558)

33e72ab

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

Conversation

michaelfeil commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Motivation

Semantics

API Changes

Notes

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelfeil commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tmonty12 commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PeaBrane left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelfeil commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 15, 2026

Uh oh!

PeaBrane commented May 15, 2026

Uh oh!

PeaBrane commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelfeil commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading