feat: add python binding for rust llm modules by biswapanda · Pull Request #13 · ai-dynamo/dynamo

biswapanda · 2025-03-04T19:06:26Z

What does the PR do?

Adds python binding and example for these llm modules:

model deployment card
preprocessor
backend

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

github-actions · 2025-03-04T19:38:30Z

Test Results

2 files 2 suites 52s ⏱️
75 tests 75 ✅ 0 💤 0 ❌
97 runs 96 ✅ 1 💤 0 ❌

Results for commit 4f1861f.

Quick-win review fixes from PR #9131. Heavy-lift items (#9 prompt_token_ids env-gate, #11 update_weights atomicity, #13 per-choice completion_token_ids) tracked separately as follow-ups. handlers.py - Catch EngineDeadError before the generic except in all 8 RL handlers (pause/resume/liveness_probe/get_state/flush_cache/update_weights_from_path/ load_lora_adapter/unload_lora_adapter): match the existing shutdown pattern in this file so admin calls also surface engine death instead of leaving a broken worker alive. - get_state: fall back to a no-op collective_rpc when check_health is absent — same fallback liveness_probe already uses, otherwise older engines without check_health always look alive. - load_lora_adapter hot-swap path: a remove_lora() failure now returns a 400-style error response (was: silent log warn + continue, leaving add_lora to no-op against the still-registered ID); a reset_prefix_cache() failure after add_lora succeeds also returns error (was: log error and continue, leaving stale KV from the old adapter routable). - unload_lora_adapter: an unregister_model() failure after engine remove_lora succeeds now returns error (was: log warn and report success, leaving model=<lora_name> still routed to this worker even though _resolve_lora_request would now fall back to the base model). container/deps/vllm/install_vllm.sh - Pin prime-rl install to an immutable commit SHA (d49f3939e7dca29bceb9ed515cc1782497b67e81 ↔ tag v0.5.1.dev101) so a re-pointed tag upstream can't change what we ship. PRIME_RL_REF kept in build logs for human readability; PRIME_RL_COMMIT is the authoritative pin. - Replace `echo "\n=== ..."` with `printf '\n=== ...\n'` (shellcheck SC2028). lib/llm/src/http/service/openai.rs - Force `request.inner.logprobs = Some(true)` unconditionally in both RL token-id promotion blocks (was: only when None). RL extraction of completion_token_ids depends on logprobs being on at the engine; an explicit logprobs=false would otherwise silently drop them. - Bound `/v1/rl/ready` per-worker probes with a 5s timeout (override via DYN_RL_LIVENESS_TIMEOUT_MS). Was reusing the shared 600s http_client, so one wedged worker could block readiness for 10 minutes instead of failing fast as 503. - Tokenize Chat handler: call `request.validate()?` before `merged_chat_template_kwargs()` so the continue_final_message + add_generation_prompt mutual-exclusion constraint is enforced (validate() existed but was never invoked). lib/llm/src/protocols/openai/chat_completions.rs - Update stale doc comments on the legacy `tokens` and `return_token_ids` fields: they pointed callers at the now-404 `/v1/chat/completions/tokens` URI. Direct callers to the canonical top-level `prompt_token_ids` extension and `nvext.extra_fields` instead. cargo check -p dynamo-llm: clean (1 pre-existing benign warning). cargo test -p dynamo-llm --test test_common_ext: 15 passed.

…mbeddingEvents The scheduler-authoritative connector (DynamoMultimodalEmbeddingCacheConnector) runs inside the vLLM EngineCore subprocess and had zero observability: the Prometheus counters named dynamo_component_embedding_cache_* observe the *loader-side* MultimodalEmbeddingCacheManager (handlers.py), not the connector that actually decides whether to skip encoder compute. R2 showed this gap: 278x p50 TTFT win on cache hit but 0 hits reported in Prometheus. Patch: * connector: monotonic hit/miss/eviction counters + atomic JSON stats snapshot written to $DYN_EC_CONNECTOR_STATS_PATH on every build_connector_meta(). The snapshot also carries a rolling EmbeddingEvent buffer (kind=save|evict) - the substrate for cross-node event-plane work next round. * prometheus.py: new EcConnectorMetrics enum + register_ec_connector_metrics() that reads the snapshot file on each /metrics scrape and translates it into Counter / Gauge values. Metric names use the ec_connector infix so they do not collide with the loader-side counters. * worker_factory: register the new callback whenever the connector is enabled (gate mirrors main.py: not route_to_encoder and capacity_gb > 0). This is the prerequisite instrumentation for cross-node MME cache work (DEP ai-dynamo#13 future work) - the next patch can layer EmbeddingEvent publish/subscribe on top of the same snapshot format.

…auge reset (review ai-dynamo#12/ai-dynamo#13/ai-dynamo#19) Hot-path quality fixes — none change a scaling decision. ai-dynamo#12 pipeline durations use Clock.monotonic(), not Clock.now() Clock's contract reserves now() for wall-clock timestamps and monotonic() for duration measurement. Six duration sites (predict latency, fan-out call latency, whole-tick duration) used now(); under WallClock a backward NTP step mid-tick distorted the latency/duration histograms. Switched all six to monotonic(). VirtualClock.monotonic() is synced to trace time in replay, so replay/test behavior is unchanged. ai-dynamo#13 ProposeResult derives result_kind + enforces the oneof ProposeResult carries the same accept/override/reject oneof as the stage responses but, unlike them, had no model_post_init — so building it the natural way (override=...) left result_kind='' and the proto round-trip came back 'override', breaking round-trip equality; a two-payload oneof violation also went unchecked. Extracted the derive+validate logic into a shared _derive_result_kind() helper used by both _StageOneofResponse and ProposeResult. +round-trip test (derive + oneof-violation reject). ai-dynamo#19 override_active gauge reset covers errored plugins _emit_override_active reset the gauge only for plugins in plugin_results; a plugin whose call raised is absent from that list, so a 1 it set on a prior tick lingered. Now reset every ATTEMPTED plugin id (triggered + inherited) before setting the contributors. 828 planner tests pass (+1 round-trip test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

biswapanda added 6 commits March 4, 2025 10:42

feat: add pybindings for mdc, backend and preprocessor

2dd42cb

fix: remove pipeline

8d4afeb

fix: add license text

89f121e

style: mypy and cargo fmt

769550c

style: rename dirs

7ab8913

feat: fixes

4f1861f

biswapanda requested review from grahamking, nnshah1, paulhendricks, ptarasiewiczNV and rmccorm4 as code owners March 4, 2025 19:06

biswapanda temporarily deployed to GITLAB March 4, 2025 19:06 — with GitHub Actions Inactive

biswapanda temporarily deployed to GITLAB March 4, 2025 19:10 — with GitHub Actions Inactive

grahamking approved these changes Mar 4, 2025

View reviewed changes

biswapanda self-assigned this Mar 4, 2025

fix: clippy err

0b62587

biswapanda requested a review from GuanLuo as a code owner March 4, 2025 22:06

biswapanda temporarily deployed to GITLAB March 4, 2025 22:06 — with GitHub Actions Inactive

biswapanda temporarily deployed to GITLAB March 4, 2025 22:15 — with GitHub Actions Inactive

style: cargo fmt

3b96755

biswapanda temporarily deployed to GITLAB March 4, 2025 22:17 — with GitHub Actions Inactive

biswapanda temporarily deployed to GITLAB March 4, 2025 22:25 — with GitHub Actions Inactive

Merge branch 'main' into bis/pybind-rusty-llm

d0e38c9

biswapanda temporarily deployed to GITLAB March 4, 2025 22:32 — with GitHub Actions Inactive

biswapanda temporarily deployed to GITLAB March 4, 2025 22:42 — with GitHub Actions Inactive

biswapanda merged commit 2da0921 into main Mar 4, 2025

biswapanda deleted the bis/pybind-rusty-llm branch March 4, 2025 22:52

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

feat: add python binding for rust llm modules (ai-dynamo#13)

a32cdad

tanmayv25 mentioned this pull request Apr 15, 2026

DEP: Backend Interface -- LLMEngine ABC and Worker #8251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add python binding for rust llm modules#13

feat: add python binding for rust llm modules#13
biswapanda merged 9 commits into
mainfrom
bis/pybind-rusty-llm

biswapanda commented Mar 4, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

biswapanda commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

github-actions Bot commented Mar 4, 2025

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

biswapanda commented Mar 4, 2025 •

edited

Loading