Skip to content

feat: introduce RoutingConstraints + worker taints#9558

Merged
PeaBrane merged 39 commits into
ai-dynamo:mainfrom
michaelfeil:mf/taints
May 16, 2026
Merged

feat: introduce RoutingConstraints + worker taints#9558
PeaBrane merged 39 commits into
ai-dynamo:mainfrom
michaelfeil:mf/taints

Conversation

@michaelfeil

@michaelfeil michaelfeil commented May 14, 2026

Copy link
Copy Markdown
Contributor

Signed-off-by: michaelfeil 63565275+michaelfeil@users.noreply.github.com

Overview:

This adds worker taints plus request-side routing constraints for KV router selection.

Current shape:

  • Workers advertise taints
  • Requests may carry RoutingConstraints
  • Routing enforces required_taints
  • preferred_taints is accepted and used to bias traffic towards prefered worekrs.

The implementation keeps the model asymmetric:

  • worker side: facts/capabilities
  • request side: matching policy

This is intended to support placement boundaries such as keeping traffic within a specific MDC / NVL domain, while also leaving room for future soft-preference routing.

Motivation

  1. required_taints: hard placement constraints, for example keeping prefill/disagg within a domain boundary. set of taints.
  2. preferred_taints: soft heuristics biasing, for example steering different applications toward fast/slow or short/long-context worker pools. Also things like how well two components are networked etc. hashmap of taints and their weight on a (-1,1) scale, with 0 being neutral preference (no impact) values towards +1 steering to pick a worker with this taint, and -1

Semantics

A request is eligible for a worker only if every entry in routing_constraints.required_taints is present in the worker taint set.

preferred_taints currently has no effect in the default Nvidia Dynamo router policy.

API Changes

  • Worker runtime config:
    • ModelRuntimeConfig.taints: HashSet<String>
  • Internal router request:
    • RouterRequest::New { routing_constraints: RoutingConstraints, ... }
  • Request-side constraint type:
    • RoutingConstraints { required_taints: HashSet<String>, preferred_taints: HashSet<String> } -> hope is to have pinned worker or allowed worker ids also in here.
  • Python:
    • ModelRuntimeConfig.taints still presents as set[str], but converts to a set internally
    • request-side RoutingConstraints(...) is available -> We could make it not available if there are containts. I am sceptical exposing e.g. allowed_worker_ids on the http entrypoint.

Notes

  • Taints and routing constraints now use set semantics internally (HashSet<String>) rather than list semantics.
  • OpenAPI exposes a concrete schema for request routing constraints.
  • preferred_taints is intentionally accepted now so custom selectors can use it later, but the default selector does not score on it yet. Its up to user to implement something, that is now wired to the core (we will)

Details:

  • Adds worker taints to runtime config and worker config trait surfaces.
  • Threads request-side routing constraints through router, scheduler, prefill, push-router, Python bindings, and C bindings.
  • Adds selector and queue coverage for incompatible required-taint routing cases.

Where should the reviewer start?

  • lib/kv-router/src/protocols.rs
  • lib/kv-router/src/scheduling/selector.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/protocols/openai/nvext.rs

Related Issues:

Summary by CodeRabbit

  • New Features
    • Request-level routing constraints: clients can provide required and preferred worker taints to influence routing.
    • Model/runtime APIs expose taint configuration so models can advertise eligibility.
    • Routers and schedulers honor routing constraints: worker selection, queueing, and prefill/push paths prefer or restrict workers accordingly.
    • OpenAPI/SDK surfaces routing constraints and optional parameters for routing in requests.

Review Change Stack

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@michaelfeil michaelfeil requested review from a team as code owners May 14, 2026 17:00
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. feat labels May 14, 2026
@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This PR introduces RoutingConstraints—a structure containing required and preferred worker taints—and threads it through request processing, scheduling, worker selection, router pipelines, and Python bindings. The feature enables clients to specify worker compatibility constraints via taints (required sets and soft-preference weights) that are evaluated during worker selection to filter candidates and influence scoring.

Changes

Routing Constraints Implementation and Integration

Layer / File(s) Summary
RoutingConstraints protocol and trait extension
lib/kv-router/src/protocols.rs
Introduces RoutingConstraints struct with required_taints: HashSet<String> and preferred_taints: HashMap<String, f32>, plus compatibility and multiplier-calculation methods. Extends WorkerConfigLike trait with taints() accessor. Adds routing_constraints field to RouterRequest::New with serde omission when empty.
Scheduling request and worker config integration
lib/kv-router/src/scheduling/types.rs, lib/llm/src/local_model/runtime_config.rs, lib/kv-router/src/test_utils.rs
SchedulingRequest gains routing_constraints field. ModelRuntimeConfig adds taints field and WorkerConfigLike::taints() implementation. Test utility SimpleWorkerConfig adds taints field.
LocalScheduler and queue admission with constraint filtering
lib/kv-router/src/scheduling/local.rs, lib/kv-router/src/scheduling/queue.rs
LocalScheduler::schedule accepts and forwards routing_constraints parameter. Queue admission (all_workers_busy) filters workers by constraint compatibility, rejecting incompatible pinned or candidate workers. All test call sites updated with default constraints.
Worker selection with taints compatibility
lib/kv-router/src/scheduling/selector.rs
DefaultWorkerSelector::select_worker filters non-pinned candidates by compatibility, validates pinned worker taints, and applies preferred taint multiplier scoring. Extended with TaintedWorkerConfig test utility and new test cases for required and preferred taints routing.
KvRouter method signatures and request handling
lib/llm/src/kv_router.rs
KvRouter::find_best_match_details and find_best_match accept routing_constraints and forward to scheduler. AsyncEngine handler for RouterRequest::New destructures and passes constraints. Test call sites updated.
KvScheduler and prefill router integration
lib/llm/src/kv_router/scheduler.rs, lib/llm/src/kv_router/prefill_router/execution.rs
KvScheduler::schedule accepts and forwards routing_constraints. Prefill router extracts constraints from request routing hints (with default fallback) and passes to query_prefill_worker.
Push router and replay routers
lib/llm/src/kv_router/push_router.rs, lib/mocker/src/replay/offline/components/router.rs, lib/mocker/src/replay/online/router.rs
Push router extracts and threads constraints for both KV-overlap and pinned-worker paths, with hard-constraint validation for pinned workers. Replay routers supply default constraints during request construction.
Request protocols and preprocessing
lib/llm/src/protocols/common/preprocessor.rs, lib/llm/src/protocols/openai/nvext.rs, lib/llm/src/preprocessor.rs
RoutingHints and NvExt add optional routing_constraints fields. RoutingConstraintsSchema added for OpenAPI with validation rejecting weights outside (-1.0, 1.0). OpenAIPreprocessor clones constraints into routing hints.
C FFI routing constraints threading
lib/bindings/c/src/lib.rs
preprocess_request returns three-tuple including routing_constraints. RouterHandles methods accept and forward constraints through prefill and decode paths. route_prefill_request and route_decode_request destructure and pass constraints.
Python RoutingConstraints binding and KvRouter signatures
lib/bindings/python/rust/llm/local_model.rs, lib/bindings/python/rust/llm/kv.rs, lib/bindings/python/rust/lib.rs
New PyO3 RoutingConstraints wrapper with required_taints and preferred_taints fields and From conversions. ModelRuntimeConfig extended with taints getter/setter. KvRouter.generate() and best_worker() gain optional routing_constraints=None parameter. Module exports RoutingConstraints class.
Python type stubs and documentation
lib/bindings/python/src/dynamo/_core.pyi
Type stub declares RoutingConstraints class with field and initializer signatures. ModelRuntimeConfig stub adds taints: Set[str] field. KvRouter method signatures and docstring updated for routing constraints parameter.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed Description covers overview, motivation, semantics, API changes, implementation details, and reviewer guidance. Follows template structure with all key sections populated.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and concisely summarizes the main feature addition: introducing RoutingConstraints and worker taints for routing logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/kv-router/src/scheduling/local.rs`:
- Around line 189-190: The schedule() signature was changed to require a new
taints: Taints parameter so all local callers must be updated; find calls to
schedule(...) in this file (and tests) and pass an appropriate Taints value
(e.g., Taints::default() or a specific Taints construct) as the new argument,
ensuring the order matches the signature (taints before shared_cache_hits if
applicable), and update any calls that construct or forward SharedCacheHits
(crate::SharedCacheHits) accordingly so compilation succeeds.

In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 250-253: The pinned-worker taints check is rejecting workers when
request.taints is empty; update the conditional around the
workers.get(&worker.worker_id).is_some_and(...) check so it only performs the
taints compatibility test if request.taints is non-empty (e.g., guard with
!request.taints.is_empty()). In practice, modify the if that references workers,
worker.worker_id, config.taints() and request.taints to skip the
is_compatible_with call when request.taints is empty so pinned and non-pinned
paths behave consistently.

In `@lib/kv-router/src/scheduling/types.rs`:
- Line 65: The SchedulingRequest struct gained a new mandatory field taints
(type Taints), so update every SchedulingRequest literal to include a taints:
... entry; search for constructor/struct literals of SchedulingRequest (e.g., in
lib/kv-router/src/scheduling/selector.rs tests and any test helpers/factories)
and add a sensible value (likely Taints::default() or an empty/neutral taints
value provided by the Taints type) to each literal so the code compiles with the
new field.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5bafd7ee-b0e5-4ebe-a55f-dda608952eef

📥 Commits

Reviewing files that changed from the base of the PR and between a2ac896 and c796c28.

📒 Files selected for processing (19)
  • lib/bindings/python/rust/lib.rs
  • lib/bindings/python/rust/llm/kv.rs
  • lib/bindings/python/rust/llm/local_model.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/bindings/python/src/dynamo/llm/__init__.py
  • lib/kv-router/src/protocols.rs
  • lib/kv-router/src/scheduling/local.rs
  • lib/kv-router/src/scheduling/selector.rs
  • lib/kv-router/src/scheduling/types.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/prefill_router/execution.rs
  • lib/llm/src/kv_router/push_router.rs
  • lib/llm/src/kv_router/scheduler.rs
  • lib/llm/src/local_model/runtime_config.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/nvext.rs
  • lib/mocker/src/replay/offline/components/router.rs
  • lib/mocker/src/replay/online/router.rs

Comment thread lib/kv-router/src/scheduling/local.rs Outdated
Comment thread lib/kv-router/src/scheduling/selector.rs
Comment thread lib/kv-router/src/scheduling/types.rs Outdated
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>

# Conflicts:
#	lib/llm/src/kv_router.rs
Comment thread lib/bindings/python/rust/llm/local_model.rs Outdated
Comment thread lib/llm/src/protocols/openai/nvext.rs Outdated
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@michaelfeil

Copy link
Copy Markdown
Contributor Author

@CodeRabbit full review. I changed the PR quite a bit, so a pr from scratch would be helpful.

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

@michaelfeil Sure! I'll kick off a fresh full review of your PR from scratch right away.

✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/llm/src/kv_router/prefill_router/execution.rs (1)

37-75: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Preselected prefill workers bypass taints.required validation.

Line 37 directly accepts preselected_worker and skips taints checks, while taints are only applied in the non-preselected branch (Line 61 onward). This breaks uniform taint eligibility guarantees.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 37 - 75,
Preselected prefill workers currently skip taints validation; when
preselected_worker is Some(id) you must apply the same taints eligibility checks
used in the non-preselected branch: extract taints from req.routing (as done in
the else branch), then validate that the selected worker id satisfies
taints.required constraints (reusing the same taints matching logic or helper
used by query_prefill_worker); if the worker fails the taints check, reject it
(return an error or fall through to the shared selection path) and log the
rejection (include worker_id and failing taint), otherwise proceed to set
(worker_id, dp_rank) as before. Ensure you reference preselected_worker,
req.routing, taints, and query_prefill_worker so the validation code is
colocated with the existing routing logic.
lib/llm/src/kv_router/push_router.rs (1)

379-466: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Taint eligibility is bypassed on pinned-worker fallback.

Line 397’s fallback path can return/route the pinned worker without a taints compatibility check, while taints are only enforced in the find_best_match_details branches. This allows taints.required to be skipped for pinned requests in that path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/push_router.rs` around lines 379 - 466, The
pinned-worker fallback currently skips taints validation (when
resolved_pinned_worker is Some or None and WorkerWithDpRank::new is used),
allowing routes that ignore taints.required; ensure you run the same taints
compatibility check used in find_best_match_details before computing cache hits
or returning Ok: validate the pinned_worker_id / worker against the incoming
taints and reject or return an error (or fall back) if incompatible, only call
chooser.get_cache_hit_estimate and chooser.add_request when the taints check
passes, and set scheduler_tracked = true only when the worker passed taints
validation (and !is_query_only and resolved_dp_rank.is_some()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Line 2041: Update the doc comment for the "taints" parameter to explicitly
state that only "required" taints are currently enforced by frontend routing and
that "preferred" taints are not acted upon yet; mention the symbols "taints" and
"preferred" so readers know which fields this applies to and add a brief note
advising callers that preferred taints are a non-enforced hint until frontend
routing is implemented to respect them.

---

Outside diff comments:
In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 37-75: Preselected prefill workers currently skip taints
validation; when preselected_worker is Some(id) you must apply the same taints
eligibility checks used in the non-preselected branch: extract taints from
req.routing (as done in the else branch), then validate that the selected worker
id satisfies taints.required constraints (reusing the same taints matching logic
or helper used by query_prefill_worker); if the worker fails the taints check,
reject it (return an error or fall through to the shared selection path) and log
the rejection (include worker_id and failing taint), otherwise proceed to set
(worker_id, dp_rank) as before. Ensure you reference preselected_worker,
req.routing, taints, and query_prefill_worker so the validation code is
colocated with the existing routing logic.

In `@lib/llm/src/kv_router/push_router.rs`:
- Around line 379-466: The pinned-worker fallback currently skips taints
validation (when resolved_pinned_worker is Some or None and
WorkerWithDpRank::new is used), allowing routes that ignore taints.required;
ensure you run the same taints compatibility check used in
find_best_match_details before computing cache hits or returning Ok: validate
the pinned_worker_id / worker against the incoming taints and reject or return
an error (or fall back) if incompatible, only call
chooser.get_cache_hit_estimate and chooser.add_request when the taints check
passes, and set scheduler_tracked = true only when the worker passed taints
validation (and !is_query_only and resolved_dp_rank.is_some()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36b4ba48-b11d-457e-970f-8da537d407e3

📥 Commits

Reviewing files that changed from the base of the PR and between 9643e64 and 70c5c71.

📒 Files selected for processing (19)
  • lib/bindings/python/rust/lib.rs
  • lib/bindings/python/rust/llm/kv.rs
  • lib/bindings/python/rust/llm/local_model.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/bindings/python/src/dynamo/llm/__init__.py
  • lib/kv-router/src/protocols.rs
  • lib/kv-router/src/scheduling/local.rs
  • lib/kv-router/src/scheduling/selector.rs
  • lib/kv-router/src/scheduling/types.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/prefill_router/execution.rs
  • lib/llm/src/kv_router/push_router.rs
  • lib/llm/src/kv_router/scheduler.rs
  • lib/llm/src/local_model/runtime_config.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/nvext.rs
  • lib/mocker/src/replay/offline/components/router.rs
  • lib/mocker/src/replay/online/router.rs

Comment thread lib/bindings/python/src/dynamo/_core.pyi Outdated
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@tmonty12

Copy link
Copy Markdown
Contributor

/ok to test bd2a681

Comment thread lib/bindings/c/src/lib.rs Outdated
Comment thread lib/kv-router/src/scheduling/types.rs
Comment thread lib/kv-router/src/scheduling/selector.rs Outdated
Comment thread lib/kv-router/src/scheduling/selector.rs Outdated
Comment thread lib/bindings/python/src/dynamo/_core.pyi
Comment thread lib/kv-router/src/test_utils.rs Outdated

@PeaBrane PeaBrane left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One broader shape suggestion: since this PR now has two levels of eligibility filtering (allowed_worker_ids plus hard required_taints), I think it would be worth centralizing that in a tiny borrowed helper rather than letting queue admission, selector filtering, and any queue cost estimation each reconstruct the same predicate.

The shape I have in mind is intentionally small and should compile away: something like a RoutingEligibility<'a> holding Option<&HashSet<WorkerId>> and &RoutingConstraints, with #[inline] methods for allows_worker_id(...) and allows_worker(worker_id, config). That keeps the combined rule (allowed_worker_ids + required taints) in one place without allocation, cloning, trait objects, or dynamic dispatch.

That helper would also make the WSPT fix less error-prone, because the same eligibility predicate could be reused when estimating best effective prefill tokens. Separately, RoutingConstraints::is_compatible_with_worker_taints() and preferred_taint_multiplier() should short-circuit the empty/default cases so the common no-constraints path stays essentially zero overhead.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
lib/bindings/python/src/dynamo/_core.pyi (1)

550-568: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mirror the current preferred_taints behavior in the Python docs.

This stub still describes preferred_taints as actively affecting scoring, but the default selector only consumes required_taints today. Please mark preferred_taints as accepted-but-not-enforced until frontend routing actually uses it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/bindings/python/src/dynamo/_core.pyi` around lines 550 - 568, The
docstring for class RoutingConstraints inaccurately claims preferred_taints
affects scoring; update the RoutingConstraints docstring (and mention in the
__init__ signature if present) to state that preferred_taints is currently
accepted but not enforced/used by the default selector—i.e., keep
required_taints behavior unchanged, but explicitly mark preferred_taints as a
no-op/for-future-use until frontend routing consumes it so readers understand it
does not impact scoring today.
lib/llm/src/kv_router/prefill_router/execution.rs (2)

273-316: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or filter hard-constrained requests in SimpleRouter mode.

routing_constraints is forwarded to the KV-router branch only. In the SimpleRouter branch, required_taints are ignored, so non-KV prefill routing can still choose an incompatible worker.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 273 - 316,
query_prefill_worker forwards routing_constraints only to the KvRouter branch so
the SimpleRouter branch (InnerPrefillRouter::SimpleRouter) ignores
required_taints and may pick incompatible workers; update the SimpleRouter path
in query_prefill_worker to first inspect routing_constraints.required_taints
(and other hard constraints like allowed_worker_ids) and either filter available
workers returned by r.select_next_worker()/r.peek_next_worker() to those that
satisfy the constraints or reject the request with an error when no compatible
worker exists, ensuring behavior matches the KvRouter branch.

24-82: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate preselected_worker against required_taints.

This method now threads routing_constraints through the normal selection path, but the preselected_worker branch still bypasses that check entirely. A request with hard constraints can therefore bootstrap against an incompatible prefill worker. Reuse the pinned-worker compatibility check here, or reject preselected_worker when the constraints cannot be validated.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router/prefill_router/execution.rs` around lines 24 - 82,
resolve_prefill_worker currently skips validating a provided preselected_worker
against routing constraints/required_taints, allowing incompatible bootstraps;
update the preselected_worker branch in resolve_prefill_worker to perform the
same compatibility check used by query_prefill_worker (reuse the pinned-worker
compatibility logic or call the same validator) using routing_constraints (and
required_taints) derived from req.routing and req.block_mm_routing_info(), and
if validation fails return PrefillResolveDecision::Unavailable (or the same
error path as query_prefill_worker) instead of accepting the worker; reference
resolve_prefill_worker, preselected_worker, routing_constraints,
required_taints, query_prefill_worker, and PrefillResolveDecision in your
change.
lib/llm/src/kv_router.rs (1)

690-716: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the old find_best_match entry point.

This adds RoutingConstraints to a public method, so existing out-of-tree Rust callers will stop compiling. Please keep the old signature as a default-constraints wrapper and move the taints-aware path to a separate method name.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/kv_router.rs` around lines 690 - 716, Restore the original public
API by keeping the old find_best_match signature (without RoutingConstraints) as
a thin wrapper that calls the existing taints-aware implementation (rename that
implementation to find_best_match_with_constraints or similar) and passes a
default RoutingConstraints value; specifically, keep a public fn
find_best_match(&self, context_id: Option<&str>, tokens: &[u32], block_mm_infos:
Option<&[Option<BlockExtraInfo>]>, router_config_override:
Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>,
priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids:
Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that
forwards to find_best_match_with_constraints(..., routing_constraints:
RoutingConstraints::default()), and rename the current method that accepts
RoutingConstraints to the new taints-aware name (e.g.,
find_best_match_with_constraints) so callers who need taint behavior can use it
while existing callers keep compiling.
lib/kv-router/src/scheduling/selector.rs (1)

287-296: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply preferred-taint bias independently of the base cost.

Multiplying base_score by the preferred-taint multiplier makes soft preferences a no-op whenever the computed cost is 0.0, so fully idle or fully cached candidates can still tie onto an explicitly avoided taint. This needs an additive bias or another tie-breaker that still differentiates zero-cost workers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/scheduling/selector.rs` around lines 287 - 296, The
get_score closure currently multiplies base_score by preferred_taint_multiplier,
which makes taint preferences ineffective when base_score == 0. Update the
get_score closure (and related logic around worker_logit and
request.routing_constraints.preferred_taint_multiplier) to apply the taint
preference as an additive bias independent of base_score — e.g., compute
base_score = self.worker_logit(...), compute taint_mult =
request.routing_constraints.preferred_taint_multiplier(config.taints()), compute
a small additive bias (constant or epsilon-scaled value) derived from taint_mult
(for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score +
bias so zero-cost workers are still differentiated by taint preference.
lib/kv-router/src/protocols.rs (1)

174-180: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Non-finite weight validation still missing.

Serde will deserialize NaN and ±infinity values into preferred_taints without validation. When these flow into preferred_taint_multiplier(), they propagate through tanh and exp, producing all-NaN scores that can panic downstream selection logic (e.g., temperature-0 argmax).

Add a custom Deserialize impl or validation method that rejects or neutralizes non-finite weights at the API boundary.

🛡️ Example validation approach
impl RoutingConstraints {
    pub fn validate(&self) -> Result<(), &'static str> {
        for (taint, weight) in &self.preferred_taints {
            if !weight.is_finite() {
                return Err("preferred_taints must have finite weights");
            }
        }
        Ok(())
    }
}

Or filter at construction/deserialization time.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/protocols.rs` around lines 174 - 180, The preferred_taints
map in struct RoutingConstraints can contain non-finite floats (NaN/±inf) from
serde deserialization; add validation to reject or sanitize these values at the
API boundary by either implementing a custom Deserialize for RoutingConstraints
or adding a validate() method called right after deserialization that iterates
over self.preferred_taints and returns an error if any weight is not finite;
ensure callers (or the deserializer) enforce this so
preferred_taint_multiplier() never receives non-finite inputs.
🧹 Nitpick comments (1)
lib/kv-router/src/protocols.rs (1)

174-215: ⚡ Quick win

Consider adding unit tests for RoutingConstraints methods.

The new RoutingConstraints methods (is_empty, has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches, preferred_taint_multiplier) have no test coverage. Adding tests would document expected behavior, catch edge cases (empty sets, boundary conditions, formula correctness), and prevent regressions.

📝 Example test structure
#[cfg(test)]
mod routing_constraints_tests {
    use super::*;

    #[test]
    fn test_is_compatible_with_worker_taints() {
        let constraints = RoutingConstraints {
            required_taints: ["gpu", "us-west"].iter().map(|s| s.to_string()).collect(),
            ..Default::default()
        };
        
        let worker_taints_match: HashSet<String> = 
            ["gpu", "us-west", "extra"].iter().map(|s| s.to_string()).collect();
        assert!(constraints.is_compatible_with_worker_taints(&worker_taints_match));
        
        let worker_taints_missing: HashSet<String> = 
            ["gpu"].iter().map(|s| s.to_string()).collect();
        assert!(!constraints.is_compatible_with_worker_taints(&worker_taints_missing));
    }
    
    // Add tests for other methods...
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/kv-router/src/protocols.rs` around lines 174 - 215, Add unit tests for
RoutingConstraints covering is_empty, has_hard_constraints,
is_compatible_with_worker_taints, preferred_taint_matches, and
preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify
empty/default constraints, constraints with required_taints that both match and
don't match worker_taints (use
RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints),
verify preferred_taint_matches returns correct counts for overlapping keys, and
assert preferred_taint_multiplier produces expected numeric behavior for known
weights (include cases like no preferred_taints -> multiplier ==
(-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp
flow). Reference the RoutingConstraints struct and its methods by name when
locating code to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 288-306: The doc comment on
RoutingConstraintsSchema::preferred_taints incorrectly documents active scoring
behavior (tanh, summing, positive multiplier) which isn't implemented; update
the comment on the preferred_taints field to a neutral description (e.g.,
"Optional soft preferences keyed by worker taint; implementations may use these
as tie-breakers or heuristics") and remove the specifics about weighting,
summing, and tanh so the schema doesn't promise unimplemented scoring; keep the
required_taints doc as-is and ensure the symbol names referenced are
RoutingConstraintsSchema and preferred_taints.

---

Duplicate comments:
In `@lib/bindings/python/src/dynamo/_core.pyi`:
- Around line 550-568: The docstring for class RoutingConstraints inaccurately
claims preferred_taints affects scoring; update the RoutingConstraints docstring
(and mention in the __init__ signature if present) to state that
preferred_taints is currently accepted but not enforced/used by the default
selector—i.e., keep required_taints behavior unchanged, but explicitly mark
preferred_taints as a no-op/for-future-use until frontend routing consumes it so
readers understand it does not impact scoring today.

In `@lib/kv-router/src/protocols.rs`:
- Around line 174-180: The preferred_taints map in struct RoutingConstraints can
contain non-finite floats (NaN/±inf) from serde deserialization; add validation
to reject or sanitize these values at the API boundary by either implementing a
custom Deserialize for RoutingConstraints or adding a validate() method called
right after deserialization that iterates over self.preferred_taints and returns
an error if any weight is not finite; ensure callers (or the deserializer)
enforce this so preferred_taint_multiplier() never receives non-finite inputs.

In `@lib/kv-router/src/scheduling/selector.rs`:
- Around line 287-296: The get_score closure currently multiplies base_score by
preferred_taint_multiplier, which makes taint preferences ineffective when
base_score == 0. Update the get_score closure (and related logic around
worker_logit and request.routing_constraints.preferred_taint_multiplier) to
apply the taint preference as an additive bias independent of base_score — e.g.,
compute base_score = self.worker_logit(...), compute taint_mult =
request.routing_constraints.preferred_taint_multiplier(config.taints()), compute
a small additive bias (constant or epsilon-scaled value) derived from taint_mult
(for example bias = (taint_mult - 1.0) * TAINT_BIAS) and return base_score +
bias so zero-cost workers are still differentiated by taint preference.

In `@lib/llm/src/kv_router.rs`:
- Around line 690-716: Restore the original public API by keeping the old
find_best_match signature (without RoutingConstraints) as a thin wrapper that
calls the existing taints-aware implementation (rename that implementation to
find_best_match_with_constraints or similar) and passes a default
RoutingConstraints value; specifically, keep a public fn find_best_match(&self,
context_id: Option<&str>, tokens: &[u32], block_mm_infos:
Option<&[Option<BlockExtraInfo>]>, router_config_override:
Option<&RouterConfigOverride>, update_states: bool, lora_name: Option<String>,
priority_jump: f64, expected_output_tokens: Option<u32>, allowed_worker_ids:
Option<HashSet<WorkerId>>) -> anyhow::Result<(WorkerWithDpRank, u32)> that
forwards to find_best_match_with_constraints(..., routing_constraints:
RoutingConstraints::default()), and rename the current method that accepts
RoutingConstraints to the new taints-aware name (e.g.,
find_best_match_with_constraints) so callers who need taint behavior can use it
while existing callers keep compiling.

In `@lib/llm/src/kv_router/prefill_router/execution.rs`:
- Around line 273-316: query_prefill_worker forwards routing_constraints only to
the KvRouter branch so the SimpleRouter branch
(InnerPrefillRouter::SimpleRouter) ignores required_taints and may pick
incompatible workers; update the SimpleRouter path in query_prefill_worker to
first inspect routing_constraints.required_taints (and other hard constraints
like allowed_worker_ids) and either filter available workers returned by
r.select_next_worker()/r.peek_next_worker() to those that satisfy the
constraints or reject the request with an error when no compatible worker
exists, ensuring behavior matches the KvRouter branch.
- Around line 24-82: resolve_prefill_worker currently skips validating a
provided preselected_worker against routing constraints/required_taints,
allowing incompatible bootstraps; update the preselected_worker branch in
resolve_prefill_worker to perform the same compatibility check used by
query_prefill_worker (reuse the pinned-worker compatibility logic or call the
same validator) using routing_constraints (and required_taints) derived from
req.routing and req.block_mm_routing_info(), and if validation fails return
PrefillResolveDecision::Unavailable (or the same error path as
query_prefill_worker) instead of accepting the worker; reference
resolve_prefill_worker, preselected_worker, routing_constraints,
required_taints, query_prefill_worker, and PrefillResolveDecision in your
change.

---

Nitpick comments:
In `@lib/kv-router/src/protocols.rs`:
- Around line 174-215: Add unit tests for RoutingConstraints covering is_empty,
has_hard_constraints, is_compatible_with_worker_taints, preferred_taint_matches,
and preferred_taint_multiplier: create a #[cfg(test)] mod with tests that verify
empty/default constraints, constraints with required_taints that both match and
don't match worker_taints (use
RoutingConstraints::is_compatible_with_worker_taints and has_hard_constraints),
verify preferred_taint_matches returns correct counts for overlapping keys, and
assert preferred_taint_multiplier produces expected numeric behavior for known
weights (include cases like no preferred_taints -> multiplier ==
(-0.0f64).exp(), single weight, and multiple weights to validate tanh->exp
flow). Reference the RoutingConstraints struct and its methods by name when
locating code to test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ac1def6-94fa-40df-9d21-515ac8551fbb

📥 Commits

Reviewing files that changed from the base of the PR and between 1c841c4 and 65596fe.

📒 Files selected for processing (22)
  • lib/bindings/c/src/lib.rs
  • lib/bindings/python/rust/lib.rs
  • lib/bindings/python/rust/llm/kv.rs
  • lib/bindings/python/rust/llm/local_model.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/kv-router/src/protocols.rs
  • lib/kv-router/src/scheduling/local.rs
  • lib/kv-router/src/scheduling/policy.rs
  • lib/kv-router/src/scheduling/queue.rs
  • lib/kv-router/src/scheduling/selector.rs
  • lib/kv-router/src/scheduling/types.rs
  • lib/kv-router/src/test_utils.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/prefill_router/execution.rs
  • lib/llm/src/kv_router/push_router.rs
  • lib/llm/src/kv_router/scheduler.rs
  • lib/llm/src/local_model/runtime_config.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/nvext.rs
  • lib/mocker/src/replay/offline/components/router.rs
  • lib/mocker/src/replay/online/router.rs

Comment thread lib/llm/src/protocols/openai/nvext.rs
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@michaelfeil

Copy link
Copy Markdown
Contributor Author

I added a commit. I am not sure if #inline is still need. I like the helper and adding more things like allowed_worker_id's to it.

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@michaelfeil michaelfeil changed the title feat: introduce RequestContrainsts + worker taints feat: introduce RoutingConstraints + worker taints May 15, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
…ueueing computes the right thing

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@PeaBrane

Copy link
Copy Markdown
Contributor

/ok to test f802890

Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
@PeaBrane

Copy link
Copy Markdown
Contributor

/ok to test 6aca329

@PeaBrane PeaBrane enabled auto-merge (squash) May 16, 2026 00:29
@PeaBrane PeaBrane merged commit a466327 into ai-dynamo:main May 16, 2026
99 checks passed
kaim-eng added a commit that referenced this pull request May 18, 2026
Main PR #9558 (`feat: introduce RoutingConstraints + worker taints`)
added `from dynamo._core import RoutingConstraints` to
`dynamo.llm.__init__`. Without this symbol in the testbed runtime
stub, every planner unit test that transitively imports
`dynamo.common.utils.endpoint_types` (-> `dynamo.llm`) fails at
collection time on dev machines that don't have the maturin-built
native binding.

Add `RoutingConstraints` to the stubbed class list in
`tests/testbed/_runtime_stub.py`. Idempotent, doesn't affect CI or
the dev pod (real `dynamo._core` is loaded there).

Signed-off-by: Kai Ma <kaim@nvidia.com>
kaim-eng added a commit that referenced this pull request May 18, 2026
Adds the synthetic-metrics testbed: deterministic, no GPU, no cluster.

Alpha-class (27 scenarios, A1-F26): synthetic fleet + fake metrics
+ fake actuator drive the planner through fault-injection scenarios
covering AIC drift, NVML clamps, K8s RBAC denials, node loss / recovery,
Prometheus outages, MDC gaps, budget shrinkage, AIC infeasibility,
and drift-threshold boundary cases.

Gamma-class (3 scenarios, G1-G3): mocker-driven trace replay with
synthetic-power overlay (replay/synthetic_power_overlay.py) and a
power-aware replay adapter (replay/power_aware_replay_adapter.py)
exercise the closed loop against real Mooncake traces.

Infrastructure:
- runner.py + scenarios.py + assertions.py + recorder.py + clock.py
  (run-loop, scenario loader, invariant checks, recording).
- synthetic_fleet.py + fake_actuator.py + fake_aic.py +
  fake_planner_metrics.py + fake_prometheus.py (test doubles for
  every external dependency of the planner run loop).
- _runtime_stub.py installs a stub dynamo._core when the compiled
  Rust binding is absent, so the testbed runs on developer laptops
  without a CUDA toolchain (carries every dynamo._core symbol used
  by dynamo.llm at module-load time, including the post-rebase
  RoutingConstraints addition from main PR #9558).
- grafana/testbed_dashboard.json + systems/ (h100_pcie / h100_sxm /
  h200_sxm SKUs) + traces/placeholder_h200_disagg_1rps.jsonl provide
  a complete observability + replay stack.

Production code:
- offline/replay_adapter.py: 9-line timing fix
  (now_s = max(tick.at_s, bridge_now_s)) prevents stale-tick loops
  on sparse traces.

86 testbed tests + 30 scenarios (27 alpha + 3 gamma) ship green at
this tip (1 skipped pending env-var; test_aic_real_data.py is
module-skipped unless AIC_SANDBOX_DIR is set).

Part of the PR #9369 split (PR 4 of 6). See docs/design-docs/pr9369-split-plan.md.

Signed-off-by: Kai Ma <kaim@nvidia.com>
kaim-eng added a commit that referenced this pull request May 19, 2026
Adds the synthetic-metrics testbed: deterministic, no GPU, no cluster.

Alpha-class (27 scenarios, A1-F26): synthetic fleet + fake metrics
+ fake actuator drive the planner through fault-injection scenarios
covering AIC drift, NVML clamps, K8s RBAC denials, node loss / recovery,
Prometheus outages, MDC gaps, budget shrinkage, AIC infeasibility,
and drift-threshold boundary cases.

Gamma-class (3 scenarios, G1-G3): mocker-driven trace replay with
synthetic-power overlay (replay/synthetic_power_overlay.py) and a
power-aware replay adapter (replay/power_aware_replay_adapter.py)
exercise the closed loop against real Mooncake traces.

Infrastructure:
- runner.py + scenarios.py + assertions.py + recorder.py + clock.py
  (run-loop, scenario loader, invariant checks, recording).
- synthetic_fleet.py + fake_actuator.py + fake_aic.py +
  fake_planner_metrics.py + fake_prometheus.py (test doubles for
  every external dependency of the planner run loop).
- _runtime_stub.py installs a stub dynamo._core when the compiled
  Rust binding is absent, so the testbed runs on developer laptops
  without a CUDA toolchain (carries every dynamo._core symbol used
  by dynamo.llm at module-load time, including the post-rebase
  RoutingConstraints addition from main PR #9558).
- grafana/testbed_dashboard.json + systems/ (h100_pcie / h100_sxm /
  h200_sxm SKUs) + traces/placeholder_h200_disagg_1rps.jsonl provide
  a complete observability + replay stack.

Production code:
- offline/replay_adapter.py: 9-line timing fix
  (now_s = max(tick.at_s, bridge_now_s)) prevents stale-tick loops
  on sparse traces.

86 testbed tests + 30 scenarios (27 alpha + 3 gamma) ship green at
this tip (1 skipped pending env-var; test_aic_real_data.py is
module-skipped unless AIC_SANDBOX_DIR is set).

Part of the PR #9369 split (PR 4 of 6). See docs/design-docs/pr9369-split-plan.md.

Signed-off-by: Kai Ma <kaim@nvidia.com>
ryanolson pushed a commit that referenced this pull request May 20, 2026
Signed-off-by: michaelfeil <63565275+michaelfeil@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants