feat: OAI compatible endpoints for TRTLLM#14
Merged
Conversation
GuanLuo
approved these changes
Mar 5, 2025
kylehh
pushed a commit
to kylehh/dynamo
that referenced
this pull request
Apr 11, 2025
Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: Tanmay Verma <tanmayv@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Tanmay Verma <tanmay2592@gmail.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
kaim-eng
added a commit
that referenced
this pull request
May 12, 2026
Adds power-aware intelligence to the Dynamo planner across three layers, plus tooling that pipecleans the Phase 4 selection path with currently available single-TDP AIC data. Reimplementation of unmerged PR #5280 against post-refactor ToT (planner/utils/ no longer exists; configuration is Pydantic; layout is config/, connectors/, core/, monitoring/). Full design: docs/design-docs/powerplanner-design.md Repro / dev environment: docs/components/planner/dpp-dev-env.md What's in this PR ----------------- Phase 1 — infrastructure - Power Agent DaemonSet: components/power_agent/{power_agent.py,tests/} Watches pod annotations, maps cgroup→pod_uid→GPU via NVML, calls nvmlDeviceSetPowerManagementLimit on a 15 s reconciliation loop. Multi-pod-per-GPU policy, fail-closed safe-default, SIGTERM restore. - Pod annotation actuation: connectors/kubernetes.py writes dynamo.nvidia.com/gpu-power-limit on worker pods. - K8s manifests: deploy/power_agent/{daemonset,rbac}.yaml. - Operator chart fix: ClusterRole now grants `pods` permissions (deploy/helm/charts/platform/components/operator/templates/planner.yaml, chart bumped to 1.2.1). Older clusters can apply deploy/planner-pod-rbac-dev.yaml as a temporary patch. Phase 2 — budget enforcement - _apply_power_budget() in core/state_machine.py clamps replica scaling within a watt budget. New PlannerConfig fields: enable_power_awareness, total_gpu_power_limit (required when awareness enabled — validator enforces, no silent default), prefill/decode_engine_gpu_power_limit, power_agent_safe_default_watts. Phase 3 — AIC optimizer - monitoring/aic_power_optimizer.py: AIConfigurator sweep picks (replica, power) configs that maximise throughput within the watt budget. Per-component EMA correction (c_ttft, c_itl, c_power_p, c_power_d for disagg; c_power_agg for agg) closes the loop on silicon-vs-AIC drift. SLA gate, budget gate, capacity-exceeded re-optimize, auto-disable on consecutive failures. - core/base.py: NativePlannerBase plumbing (admission threshold push, named-port resolution, AIC integration). Phase 4 — pipeclean (using existing single-TDP data) - tools/integrate_aic_power_data.py: copies measured H200/B200 power data into an AIC checkout and reinstalls. - tools/validate_aic_power_integration.py: asserts estimate_perf() returns power_w in [100, 710] W (not 0.0, not 700.0 TDP fallback). - tools/compute_power_budget.py: rack-capacity → safe-budget helper. - examples/deployments/powerplanner/: PIPECLEAN.md runbook, disagg-{power-aware,conservative-cold-start}.yaml, verify_poweraware.bash (8-section health check inc. Phase 4 preview), MULTI_DGD.md operator playbook, README.md. Bug fixes (found by live integration tests, fixed in this patchset) ------------------------------------------------------------------- - Operator chart 1.2.0 ClusterRole had no `pods` rules → planner couldn't list workers or patch annotations. Fixed in chart template; workaround manifest provided for clusters not yet upgraded. - Stale label selectors in connectors/kubernetes.py: `dynamo-graph-deployment` → `dynamo-graph-deployment-name`, `dynamo-service` → `dynamo-component`/`dynamo-component-type`. - DCGM queries used `pod=~` (matched the DCGM exporter pod) and the wrong namespace; corrected to `exported_pod=~` + `exported_namespace=<k8s-namespace>` and pod regex now matches the operator's `<dgd>-<replica-idx>-<service-key>-...` format (monitoring/traffic_metrics.py). - Frontend-source metric methods returned NaN on quiet clusters (0/0 in `increase(_sum)/increase(_count)`); router path had a NaN guard, frontend didn't — added matching `math.isnan` filter. Tests ----- Verified 2026-05-10 on a clean from-scratch repro on a live Azure AKS dev cluster (qwen3-quickstart DGD, single dev namespace): Suite Pass Skip Fail ---------------------------------------------------- ---- ---- ---- planner/tests/unit/ 465 0 0 planner/tests/integration/test_aic_power_e2e_sim 15 0 0 planner/tests/integration/test_aic_power_optimizer 34 0 0 planner/tests/integration/test_metric_paths_live 22-23 3-2 0 planner/tests/integration/test_actuation_knobs_live 10-11 1-0 0 power_agent/tests/ 43 0 0 ---------------------------------------------------- ---- ---- ---- Total (cold deploy / after sanity chat completion) 590-591 4-3 0 Documented skips: - test_frontend_metric_series_exists — passes after any traffic. - TestDirectRouterMetricsClientLive::* — LocalRouter doesn't expose dynamo_frontend_worker_* gauges (open-question #14 in design doc). - TestScalingRealMutation — opt-in disruptive test, gated by RUN_DISRUPTIVE_TESTS=1 (passes when enabled). Repro recipe: docs/components/planner/dpp-dev-env.md "From-Scratch Repro Script" section. Compatibility ------------- Power awareness is opt-in (enable_power_awareness=False by default). AIC optimizer is opt-in (enable_aic_optimizer=False by default). No existing planner behavior changes when both flags are off. Phase 4 (multi-power-level AIC sweep) and Phase 5 (silicon validation) remain follow-up work; this PR delivers Phases 1–3 plus the Phase 4 pipeclean code path so it can be exercised before AIC ships the full sweep API. Signed-off-by: Kai Ma <kaim@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
kaim-eng
added a commit
that referenced
this pull request
May 12, 2026
Adds power-aware intelligence to the Dynamo planner across three layers, plus tooling that pipecleans the Phase 4 selection path with currently available single-TDP AIC data. Reimplementation of unmerged PR #5280 against post-refactor ToT (planner/utils/ no longer exists; configuration is Pydantic; layout is config/, connectors/, core/, monitoring/). Full design: docs/design-docs/powerplanner-design.md Repro / dev environment: docs/components/planner/dpp-dev-env.md What's in this PR ----------------- Phase 1 — infrastructure - Power Agent DaemonSet: components/power_agent/{power_agent.py,tests/} Watches pod annotations, maps cgroup→pod_uid→GPU via NVML, calls nvmlDeviceSetPowerManagementLimit on a 15 s reconciliation loop. Multi-pod-per-GPU policy, fail-closed safe-default, SIGTERM restore. - Pod annotation actuation: connectors/kubernetes.py writes dynamo.nvidia.com/gpu-power-limit on worker pods. - K8s manifests: deploy/power_agent/{daemonset,rbac}.yaml. - Operator chart fix: ClusterRole now grants `pods` permissions (deploy/helm/charts/platform/components/operator/templates/planner.yaml, chart bumped to 1.2.1). Older clusters can apply deploy/planner-pod-rbac-dev.yaml as a temporary patch. Phase 2 — budget enforcement - _apply_power_budget() in core/state_machine.py clamps replica scaling within a watt budget. New PlannerConfig fields: enable_power_awareness, total_gpu_power_limit (required when awareness enabled — validator enforces, no silent default), prefill/decode_engine_gpu_power_limit, power_agent_safe_default_watts. Phase 3 — AIC optimizer - monitoring/aic_power_optimizer.py: AIConfigurator sweep picks (replica, power) configs that maximise throughput within the watt budget. Per-component EMA correction (c_ttft, c_itl, c_power_p, c_power_d for disagg; c_power_agg for agg) closes the loop on silicon-vs-AIC drift. SLA gate, budget gate, capacity-exceeded re-optimize, auto-disable on consecutive failures. - core/base.py: NativePlannerBase plumbing (admission threshold push, named-port resolution, AIC integration). Phase 4 — pipeclean (using existing single-TDP data) - tools/integrate_aic_power_data.py: copies measured H200/B200 power data into an AIC checkout and reinstalls. - tools/validate_aic_power_integration.py: asserts estimate_perf() returns power_w in [100, 710] W (not 0.0, not 700.0 TDP fallback). - tools/compute_power_budget.py: rack-capacity → safe-budget helper. - examples/deployments/powerplanner/: PIPECLEAN.md runbook, disagg-{power-aware,conservative-cold-start}.yaml, verify_poweraware.bash (8-section health check inc. Phase 4 preview), MULTI_DGD.md operator playbook, README.md. Bug fixes (found by live integration tests, fixed in this patchset) ------------------------------------------------------------------- - Operator chart 1.2.0 ClusterRole had no `pods` rules → planner couldn't list workers or patch annotations. Fixed in chart template; workaround manifest provided for clusters not yet upgraded. - Stale label selectors in connectors/kubernetes.py: `dynamo-graph-deployment` → `dynamo-graph-deployment-name`, `dynamo-service` → `dynamo-component`/`dynamo-component-type`. - DCGM queries used `pod=~` (matched the DCGM exporter pod) and the wrong namespace; corrected to `exported_pod=~` + `exported_namespace=<k8s-namespace>` and pod regex now matches the operator's `<dgd>-<replica-idx>-<service-key>-...` format (monitoring/traffic_metrics.py). - Frontend-source metric methods returned NaN on quiet clusters (0/0 in `increase(_sum)/increase(_count)`); router path had a NaN guard, frontend didn't — added matching `math.isnan` filter. Tests ----- Verified 2026-05-10 on a clean from-scratch repro on a live Azure AKS dev cluster (qwen3-quickstart DGD, single dev namespace): Suite Pass Skip Fail ---------------------------------------------------- ---- ---- ---- planner/tests/unit/ 465 0 0 planner/tests/integration/test_aic_power_e2e_sim 15 0 0 planner/tests/integration/test_aic_power_optimizer 34 0 0 planner/tests/integration/test_metric_paths_live 22-23 3-2 0 planner/tests/integration/test_actuation_knobs_live 10-11 1-0 0 power_agent/tests/ 43 0 0 ---------------------------------------------------- ---- ---- ---- Total (cold deploy / after sanity chat completion) 590-591 4-3 0 Documented skips: - test_frontend_metric_series_exists — passes after any traffic. - TestDirectRouterMetricsClientLive::* — LocalRouter doesn't expose dynamo_frontend_worker_* gauges (open-question #14 in design doc). - TestScalingRealMutation — opt-in disruptive test, gated by RUN_DISRUPTIVE_TESTS=1 (passes when enabled). Repro recipe: docs/components/planner/dpp-dev-env.md "From-Scratch Repro Script" section. Compatibility ------------- Power awareness is opt-in (enable_power_awareness=False by default). AIC optimizer is opt-in (enable_aic_optimizer=False by default). No existing planner behavior changes when both flags are off. Phase 4 (multi-power-level AIC sweep) and Phase 5 (silicon validation) remain follow-up work; this PR delivers Phases 1–3 plus the Phase 4 pipeclean code path so it can be exercised before AIC ships the full sweep API. Signed-off-by: Kai Ma <kaim@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Kai Ma <kaim@nvidia.com>
kaim-eng
added a commit
that referenced
this pull request
May 18, 2026
Adds power-aware intelligence to the Dynamo planner across three layers, plus tooling that pipecleans the Phase 4 selection path with currently available single-TDP AIC data. Reimplementation of unmerged PR #5280 against post-refactor ToT (planner/utils/ no longer exists; configuration is Pydantic; layout is config/, connectors/, core/, monitoring/). Full design: docs/design-docs/powerplanner-design.md Repro / dev environment: docs/components/planner/dpp-dev-env.md What's in this PR ----------------- Phase 1 — infrastructure - Power Agent DaemonSet: components/power_agent/{power_agent.py,tests/} Watches pod annotations, maps cgroup→pod_uid→GPU via NVML, calls nvmlDeviceSetPowerManagementLimit on a 15 s reconciliation loop. Multi-pod-per-GPU policy, fail-closed safe-default, SIGTERM restore. - Pod annotation actuation: connectors/kubernetes.py writes dynamo.nvidia.com/gpu-power-limit on worker pods. - K8s manifests: deploy/power_agent/{daemonset,rbac}.yaml. - Operator chart fix: ClusterRole now grants `pods` permissions (deploy/helm/charts/platform/components/operator/templates/planner.yaml, chart bumped to 1.2.1). Older clusters can apply deploy/planner-pod-rbac-dev.yaml as a temporary patch. Phase 2 — budget enforcement - _apply_power_budget() in core/state_machine.py clamps replica scaling within a watt budget. New PlannerConfig fields: enable_power_awareness, total_gpu_power_limit (required when awareness enabled — validator enforces, no silent default), prefill/decode_engine_gpu_power_limit, power_agent_safe_default_watts. Phase 3 — AIC optimizer - monitoring/aic_power_optimizer.py: AIConfigurator sweep picks (replica, power) configs that maximise throughput within the watt budget. Per-component EMA correction (c_ttft, c_itl, c_power_p, c_power_d for disagg; c_power_agg for agg) closes the loop on silicon-vs-AIC drift. SLA gate, budget gate, capacity-exceeded re-optimize, auto-disable on consecutive failures. - core/base.py: NativePlannerBase plumbing (admission threshold push, named-port resolution, AIC integration). Phase 4 — pipeclean (using existing single-TDP data) - tools/integrate_aic_power_data.py: copies measured H200/B200 power data into an AIC checkout and reinstalls. - tools/validate_aic_power_integration.py: asserts estimate_perf() returns power_w in [100, 710] W (not 0.0, not 700.0 TDP fallback). - tools/compute_power_budget.py: rack-capacity → safe-budget helper. - examples/deployments/powerplanner/: PIPECLEAN.md runbook, disagg-{power-aware,conservative-cold-start}.yaml, verify_poweraware.bash (8-section health check inc. Phase 4 preview), MULTI_DGD.md operator playbook, README.md. Bug fixes (found by live integration tests, fixed in this patchset) ------------------------------------------------------------------- - Operator chart 1.2.0 ClusterRole had no `pods` rules → planner couldn't list workers or patch annotations. Fixed in chart template; workaround manifest provided for clusters not yet upgraded. - Stale label selectors in connectors/kubernetes.py: `dynamo-graph-deployment` → `dynamo-graph-deployment-name`, `dynamo-service` → `dynamo-component`/`dynamo-component-type`. - DCGM queries used `pod=~` (matched the DCGM exporter pod) and the wrong namespace; corrected to `exported_pod=~` + `exported_namespace=<k8s-namespace>` and pod regex now matches the operator's `<dgd>-<replica-idx>-<service-key>-...` format (monitoring/traffic_metrics.py). - Frontend-source metric methods returned NaN on quiet clusters (0/0 in `increase(_sum)/increase(_count)`); router path had a NaN guard, frontend didn't — added matching `math.isnan` filter. Tests ----- Verified 2026-05-10 on a clean from-scratch repro on a live Azure AKS dev cluster (qwen3-quickstart DGD, single dev namespace): Suite Pass Skip Fail ---------------------------------------------------- ---- ---- ---- planner/tests/unit/ 465 0 0 planner/tests/integration/test_aic_power_e2e_sim 15 0 0 planner/tests/integration/test_aic_power_optimizer 34 0 0 planner/tests/integration/test_metric_paths_live 22-23 3-2 0 planner/tests/integration/test_actuation_knobs_live 10-11 1-0 0 power_agent/tests/ 43 0 0 ---------------------------------------------------- ---- ---- ---- Total (cold deploy / after sanity chat completion) 590-591 4-3 0 Documented skips: - test_frontend_metric_series_exists — passes after any traffic. - TestDirectRouterMetricsClientLive::* — LocalRouter doesn't expose dynamo_frontend_worker_* gauges (open-question #14 in design doc). - TestScalingRealMutation — opt-in disruptive test, gated by RUN_DISRUPTIVE_TESTS=1 (passes when enabled). Repro recipe: docs/components/planner/dpp-dev-env.md "From-Scratch Repro Script" section. Compatibility ------------- Power awareness is opt-in (enable_power_awareness=False by default). AIC optimizer is opt-in (enable_aic_optimizer=False by default). No existing planner behavior changes when both flags are off. Phase 4 (multi-power-level AIC sweep) and Phase 5 (silicon validation) remain follow-up work; this PR delivers Phases 1–3 plus the Phase 4 pipeclean code path so it can be exercised before AIC ships the full sweep API. Signed-off-by: Kai Ma <kaim@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Kai Ma <kaim@nvidia.com>
SpencerGarnets
added a commit
to ai-blaise/dynamo-prod-k8s
that referenced
this pull request
May 30, 2026
…verlay
Replaces the 4+4 PD-disaggregated TP=4 DP=4 layout with the
single-role TP=8 EP=8 decode-only topology that 001 actually runs.
Per Spencer 2026-05-30: the infrastructure deploy script should
match what we run on 001 (so production = 001 verbatim), with
SMC-SD enabled as the production default.
Changes vs the prior 4+4 PD manifest:
- Drop prefill service entirely (no disaggregation; NIXL/UCX
transport on 002 was the failure mode anyway)
- Decode: TP=8 EP=8, single 8-GPU role
- moe-runner-backend flashinfer_trtllm, moe-a2a-backend none
- --dsa-indexer-mode indexcache-hisa + --dsa-indexer-quantization
nvfp4_e2m1_ue8m0 + --hisa-min-seq-len 1
- --json-model-override-args (index_topk=1024, freq=4, hisa params)
- --enable-flashinfer-allreduce-fusion (auto on for DSV3.2 +
TP>1 + no DP + single-node + moe_a2a_backend=none)
- --num-continuous-decode-steps 2
- Drop --enable-dp-attention (the forward_idle DP path was the
source of the ai-dynamo#14 launch-config bug chain; TP-only on SM_100
avoids it entirely)
- Drop --enable-hisparse (extra path cost; HISA already in stack)
- Drop nsa-prefill/decode-backend (replaced by dsa-*)
Optimization env vars (top-level spec.envs, fully armed):
SGLANG_NSA_NVFP4_HISA=1
SGLANG_NSA_NVFP4_HISA_CAND_SCORE_WMMA=1
SGLANG_NSA_NVFP4_HISA_MEAN_POOL_PREDECODE=1
SGLANG_NSA_NVFP4_HISA_MEAN_POOL_TRANSP=1
SGLANG_NSA_NVFP4_HISA_MEAN_POOL_FMA2=1
SGLANG_NSA_NVFP4_HISA_COMPRESSION_RATIO=4.0
SGLANG_HIGGS_DSA_TRTLLM_FP8=1 (only fires when HIGGS enabled)
SGLANG_HIGGS_DSA_TRTLLM_DEQUANT_STREAM=1
SGLANG_HIGGS_DSA_TRTLLM_PINGPONG=1
SGLANG_HIGGS_DSA_TRTLLM_DEDICATED_STREAM=1
SGLANG_USE_SGL_NVFP4_FUSED_RMSNORM=1
SGLANG_MOE_NVFP4_DISPATCH=1
SGLANG_ENABLE_JIT_DEEPGEMM=1
SGLANG_ENABLE_SPEC_V2=1
SGLANG_SMC_DRAFT_FP8_GEMM_BACKEND=cutlass
SGLANG_SMC_TARGET_VERIFY_GRAPH=0
(and the NCCL / UCX / HF defaults)
SMC-SD overlay (decode args):
--speculative-algorithm SMC
--speculative-draft-model-path /models/smcsd/GLM-4-9B-0414-FP8-DeepSeekV32-OMP
--speculative-draft-model-quantization fp8
--speculative-draft-attention-backend triton
--smc-draft-kv-cache-dtype fp8_e4m3
--smc-n-particles 4
--smc-gamma 6
--smc-draft-temperature 0.7
--smc-target-temperature 1.0
--smc-resample-threshold 0.5
--smc-resample-method systematic
+ smcsd-draft-model volume + mount
For ablation (002 iteration testing): strip --speculative-* args +
smcsd-draft-model volume/mount. Production default keeps SMC-SD on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tanmayv25
added a commit
that referenced
this pull request
Jun 4, 2026
Comment #14 from peilii: when the unified path moved the `build_sglang_logprob_kwargs` gate check from the legacy handler into the engine's `generate()`, the resulting `ValueError` was no longer guaranteed to surface as a 4xx invalid-request error. Root cause (static + unit-test reproduction): - `lib/bindings/python/rust/backend.rs::py_err_to_dynamo` maps a Python `ValueError`/`TypeError` to `DynamoError { error_type: Backend(BackendError::InvalidArgument) }` — under the `Backend()` wrapper, not the top-level `ErrorType::InvalidArgument`. - `lib/llm/src/http/service/openai.rs::find_dynamo_error_in_chain` used strict-equality matching, so the existing `ErrorType::InvalidArgument` check missed `Backend(InvalidArgument)`. The engine-side ValueError fell through to the catch-all 500 path. Reproduced as a unit test that, against the unpatched mapping, asserted `StatusCode 500` instead of `400`: thread '…test_backend_invalid_argument_surfaces_as_400' panicked assertion `left == right` failed left: 500 right: 400 Fix: add `find_invalid_argument_in_chain` that matches both `ErrorType::InvalidArgument` AND `ErrorType::Backend(BackendError::InvalidArgument)`. Both classes mean "the request was malformed" — whether validation tripped at the frontend or inside the engine is an implementation detail the client shouldn't have to care about. Regression test `test_backend_invalid_argument_surfaces_as_400` locks in the behavior. All 59 existing `http::service::openai` tests still pass; rustfmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tanmayv25
added a commit
that referenced
this pull request
Jun 4, 2026
Comment #14 from peilii: when the unified path moved the `build_sglang_logprob_kwargs` gate check from the legacy handler into the engine's `generate()`, the resulting `ValueError` was no longer guaranteed to surface as a 4xx invalid-request error. Root cause (static + unit-test reproduction): - `lib/bindings/python/rust/backend.rs::py_err_to_dynamo` maps a Python `ValueError`/`TypeError` to `DynamoError { error_type: Backend(BackendError::InvalidArgument) }` — under the `Backend()` wrapper, not the top-level `ErrorType::InvalidArgument`. - `lib/llm/src/http/service/openai.rs::find_dynamo_error_in_chain` used strict-equality matching, so the existing `ErrorType::InvalidArgument` check missed `Backend(InvalidArgument)`. The engine-side ValueError fell through to the catch-all 500 path. Reproduced as a unit test that, against the unpatched mapping, asserted `StatusCode 500` instead of `400`: thread '…test_backend_invalid_argument_surfaces_as_400' panicked assertion `left == right` failed left: 500 right: 400 Fix: add `find_invalid_argument_in_chain` that matches both `ErrorType::InvalidArgument` AND `ErrorType::Backend(BackendError::InvalidArgument)`. Both classes mean "the request was malformed" — whether validation tripped at the frontend or inside the engine is an implementation detail the client shouldn't have to care about. Regression test `test_backend_invalid_argument_surfaces_as_400` locks in the behavior. All 59 existing `http::service::openai` tests still pass; rustfmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tanmayv25
added a commit
that referenced
this pull request
Jun 4, 2026
Comment #14 from peilii: when the unified path moved the `build_sglang_logprob_kwargs` gate check from the legacy handler into the engine's `generate()`, the resulting `ValueError` was no longer guaranteed to surface as a 4xx invalid-request error. Root cause (static + unit-test reproduction): - `lib/bindings/python/rust/backend.rs::py_err_to_dynamo` maps a Python `ValueError`/`TypeError` to `DynamoError { error_type: Backend(BackendError::InvalidArgument) }` — under the `Backend()` wrapper, not the top-level `ErrorType::InvalidArgument`. - `lib/llm/src/http/service/openai.rs::find_dynamo_error_in_chain` used strict-equality matching, so the existing `ErrorType::InvalidArgument` check missed `Backend(InvalidArgument)`. The engine-side ValueError fell through to the catch-all 500 path. Reproduced as a unit test that, against the unpatched mapping, asserted `StatusCode 500` instead of `400`: thread '…test_backend_invalid_argument_surfaces_as_400' panicked assertion `left == right` failed left: 500 right: 400 Fix: add `find_invalid_argument_in_chain` that matches both `ErrorType::InvalidArgument` AND `ErrorType::Backend(BackendError::InvalidArgument)`. Both classes mean "the request was malformed" — whether validation tripped at the frontend or inside the engine is an implementation detail the client shouldn't have to care about. Regression test `test_backend_invalid_argument_surfaces_as_400` locks in the behavior. All 59 existing `http::service::openai` tests still pass; rustfmt + clippy clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does the PR do?