feat: add AIMD startup ramp#638
Conversation
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
MkDocs preview: https://d7072d26.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-638.docs.buildwithfern.com/nemo/datadesigner
|
Code Review: PR #638 — feat: add AIMD startup rampSummaryAdds an optional time-based startup ramp to the The change is well-scoped: a single config field, a small ramp helper on FindingsCorrectness
Tests
Style / conventions
Documentation nits
Performance / security
VerdictApprove with minor follow-ups. Implementation is clean, well-tested, and correctly preserves default behavior. Two things worth addressing before merge:
Doc nit on the "On success after a 429" bullet is optional. |
Greptile SummaryThis PR adds an optional startup ramp to the AIMD throttle manager so async inference workloads can ease into peak concurrency instead of immediately bursting. The default
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py | Core AIMD ramp logic added correctly; ceiling recording stays inside the current_limit < prev_limit guard, resolving the previously flagged concern; _apply_startup_ramp is called consistently on both try_acquire and release_success paths under the lock. |
| packages/data-designer-config/src/data_designer/config/run_config.py | Adds rampup_seconds field with ge=0.0 validation, a DEFAULT_RAMPUP_SECONDS=0.0 class var, and docstring; backward-compatible with default 0.0. |
| packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py | Seven new unit tests cover all documented ramp scenarios including 429-abort, minimum-concurrency guard, elapsed-only completion, release_failure preservation, and non-ramp recovery. |
| packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py | Adds a synthetic ColdServerAsyncHTTPClient and two async integration smoke tests that exercise the real OpenAI-compatible client path. |
| packages/data-designer-config/tests/config/test_run_config.py | Two new tests verify rampup_seconds acceptance and rejection of negative values. |
| docs/concepts/architecture-and-performance.md | Adds startup ramp bullet to the AIMD section, updates the example, and adds rampup_seconds row to the config table. |
| docs/devnotes/posts/owning-the-model-stack.md | Adds Startup ramp subsection and updates the AIMD bullet list and parameter table consistently. |
Reviews (5): Last reviewed commit: "Merge branch 'main' into codex/aimd-star..." | Re-trigger Greptile
Scratch validation harness for AIMD startup rampI added a local ignored scratch harness to exercise the new ramp behavior against a synthetic cold async inference backend: Run it from the repo root with: /Users/etramel/.local/bin/uv run --all-packages --group dev python .scratch/aimd_ramp_scratch.pyThe harness uses the real
Observed output locally: This is intentionally not committed because |
|
[P2] Early startup 429 can pin recovery at 2 In |
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
Addressed the review feedback in What changed:
Validated locally: git diff --check
PATH="/Users/etramel/.local/bin:$PATH" make lint-engine
PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py packages/data-designer-config/tests/config/test_run_config.py
PATH="/Users/etramel/.local/bin:$PATH" make test-engine
PATH="/Users/etramel/.local/bin:$PATH" make test-configResults: focused suite |
|
Nice work on this one, @eric-tramel — the startup ramp slots cleanly into the existing AIMD machinery and the test coverage tells a clear story. SummaryThis PR adds an optional FindingsWarnings — Worth addressing
def _get_or_create_domain(
self,
provider_name: str,
model_id: str,
domain: ThrottleDomain,
now: float,
) -> DomainThrottleState:
...The call sites are small fixes:
Suggestions — Take it or leave it
def _apply_startup_ramp(self, state: DomainThrottleState, effective_max: int, now: float) -> None:
"""Advance the startup ramp for this domain, if it's still active."""
if not state.rampup_active:
return
if self._rampup_seconds <= 0 or effective_max <= DEFAULT_MIN_LIMIT:
state.current_limit = min(state.current_limit, effective_max)
state.rampup_active = False
return
elapsed = max(0.0, now - state.rampup_started_at)
if elapsed >= self._rampup_seconds:
state.current_limit = effective_max
state.rampup_active = False
return
fraction = elapsed / self._rampup_seconds
ramp_slots = math.floor((effective_max - DEFAULT_MIN_LIMIT) * fraction)
state.current_limit = min(effective_max, DEFAULT_MIN_LIMIT + ramp_slots)If you'd rather keep the clamp as a belt-and-suspenders guard against a future caller dropping into
What Looks Good
VerdictNeeds changes — the runtime logic itself is solid, but three things are worth resolving before merge: the This review was generated by an AI assistant. |
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
@nabinchha thanks for the detailed pass. I pushed follow-ups through Resolved items:
Validated locally after integrating the worker branches: git diff --check
PATH="/Users/etramel/.local/bin:$PATH" make lint-engine
PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py
PATH="/Users/etramel/.local/bin:$PATH" make test-engine
PATH="/Users/etramel/.local/bin:$PATH" make test-configResults: focused throttle suite |
nabinchha
left a comment
There was a problem hiding this comment.
Thanks for the fast turnaround, @eric-tramel — all six items from the prior pass landed cleanly and the fixes go beyond just "make the test pass." Re-reviewed fa992927..081e431a (5 follow-up commits + a merge from main).
Summary
This round adds the docs sync, makes now a required argument on _get_or_create_domain with all call sites updated, drops the inactive-ramp clamp from _apply_startup_ramp, renames the failure-preserves-ramp test, and rebuilds the integration test's timing model to be deterministic rather than wall-clock-sensitive. Net result: 468 / -21 across 10 files (vs. 447 / -11 last time), and ruff check + format --check are green on the changed set.
Findings
What Got Resolved (vs. prior review)
-
docs/concepts/architecture-and-performance.mddrift (Warning, prior) → fixed ine3fb05f8. All three of the requested additions are there: the "During optional startup ramp" AIMD bullet, the updated worked example ("Iframpup_seconds=30, it starts at one request and climbs linearly toward 32..."), and therampup_seconds=0.0row in both the example code block and the config-table. Source.mdand Fern.mdxnow read identically — pair is back in lockstep, matching the dev-note treatment. -
nowrequired on_get_or_create_domain(Warning, prior) → fixed in446f753e. The signature now requiresnow: float;release_failureresolvesnow = now if now is not None else time.monotonic()at the top like its peers;acquire_sync/acquire_asynceach capturenow = time.monotonic()once and thread it through bothtry_acquireand the bookkeeping_get_or_create_domaincall. Nice touch: the retry loops re-capturenow = time.monotonic()before each subsequenttry_acquire, so a long wait doesn't carry a stale timestamp into the next acquire attempt. The type checker now enforces the contract end-to-end. -
Wall-clock-sensitive integration assertion (Warning, prior) → fixed in
e8c9e496+4dba051c. Rather than just widening margins, the test was redesigned:_started_atis now lazy (set on firstpost()instead of__init__), so the gap between client construction and first request stops counting against the ramp budget; the timing-fragilepeak_in_flight_before(0.05) <= 1assertion got replaced withpeak_allowed > 1(a deterministic property of the sample set, not a wall-clock window); the post-ramp sleep is now derived from the ramp duration (throttle_ramp_seconds + 0.2) instead of a hardcoded0.31. The behavioral check that actually matters —ramped_client.rate_limits == 0— was already there and remains the real proof the throttle ramp matched the server ramp. -
Defensive clamp in
_apply_startup_ramp(Suggestion, prior) → fixed ine8c9e496. The inactive-ramp branch is now a clean earlyreturn—_clamp_domainskeeps owning thecurrent_limit ≤ effective_maxinvariant on the registration path, and the helper does exactly what its name says. The active-ramp branch'smin(effective_max, …)continues to handle mid-ramp clamps. No behavior change, easier to read. -
architecture/models.mdmention (Suggestion, prior) → fixed ine3fb05f8. Added a one-line description under theThrottleManagersection that hits the three load-bearing facts: "starts new domains at one concurrent request, climbs linearly toward the peak, and aborts to normal AIMD behavior on the first 429." Subsystem map now matches the dev-note narrative. -
Test name
test_failure_does_not_abort_startup_ramp(Suggestion, prior) → fixed ine8c9e496. Renamed totest_release_failure_preserves_startup_ramp_and_progress, which now accurately describes both behaviors the test verifies (ramp preserved acrossrelease_failure, and ramp formula still advances correctly afterward).
New Findings
None — re-pass on the modified files surfaces no regressions, no new edge cases, and no leftover docs drift. Ruff check + format --check clean on the changed set.
What Looks Good
nowpropagation is more thorough than the suggestion asked for. Recapturingnowinside theacquire_sync/acquire_asyncretry loops is the right move — without it, a long initial wait would have left subsequenttry_acquirecalls reading a deadline-relative timestamp instead of a fresh wall-clock one. Type checker now enforces the threaded-clock contract everywhere.- The integration test redesign is a real fix, not a workaround. Switching to
peak_allowed > 1plus lazy_started_atremoves the wall-clock dependency entirely — the test now asserts what the system did rather than when it happened to do it. That's the version that'll keep passing on noisy CI. - Docs pair is fully reconciled. Source
.mdand Fern.mdxforarchitecture-and-performanceread identically now; the subsystem map inarchitecture/models.mdwas a freebie that you didn't have to do but did anyway.
Verdict
Ship it. All prior findings are resolved with care, no new findings, lint/format pass. The follow-up commits also raise the test-architecture bar — the integration test is now closer to a property check than a timing race, which is worth keeping as the pattern for future async ramp work.
This review was generated by an AI assistant.
📋 Summary
Adds an optional startup ramp for AIMD throttling so async inference workloads can ease into peak concurrency instead of immediately bursting to
max_parallel_requests. When configured, each throttle domain starts at one concurrent request and linearly ramps toward the effective max; if a 429 arrives during that ramp, the ramp is aborted and normal AIMD decrease/cooldown/recovery takes over.The default
rampup_seconds=0.0preserves the existing immediate-start behavior for current users.🔗 Related Issue
N/A
🔄 Changes
ThrottleConfig.rampup_secondswith validation, docs, and a default of0.0to keep existing behavior unchanged.ThrottleManagerdomain state to track startup ramp timing and apply a linear time-based ramp during acquire/success handling._get_or_create_domain(..., now=...)require an explicit timestamp and updated all callers, includingrelease_failure,acquire_sync, andacquire_async, to pass a single captured clock value._apply_startup_rampso inactive ramps return immediately, leaving registration/clamping paths to own effective-max invariants.release_failurepreserving ramp progress, and non-ramp minimum-concurrency 429 recovery.OpenAICompatibleClient+ThrottledModelClientpath for no-ramp burst behavior, matched ramp warmup, and too-fast ramp abort-to-AIMD behavior.ThrottleManagerstartup ramp behavior.🔍 Attention Areas
ThrottleConfig— adds a public runtime configuration field, so naming/default semantics should be reviewed carefully.ThrottleManager— changes adaptive concurrency startup behavior while preserving default disabled behavior and existing minimum-limit recovery semantics.test_throttled_model_client.py— includes an integration-style smoke test for cold-server ramp behavior, intentionally avoiding precision timing assertions.🧪 Testing
make testpasses (not run locally; package-level config/engine suites and focused throttle/config suites passed)git diff --checkPATH="/Users/etramel/.local/bin:$PATH" make lint-enginePATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py(62 passed)PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py packages/data-designer-config/tests/config/test_run_config.py(66 passed, earlier run before Nabin follow-ups)PATH="/Users/etramel/.local/bin:$PATH" make test-engine(2009 passed)PATH="/Users/etramel/.local/bin:$PATH" make test-config(564 passed, 42 existing warnings)✅ Checklist