Skip to content

feat: add AIMD startup ramp#638

Merged
eric-tramel merged 8 commits into
mainfrom
codex/aimd-startup-ramp
May 13, 2026
Merged

feat: add AIMD startup ramp#638
eric-tramel merged 8 commits into
mainfrom
codex/aimd-startup-ramp

Conversation

@eric-tramel

@eric-tramel eric-tramel commented May 13, 2026

Copy link
Copy Markdown
Contributor

📋 Summary

Adds an optional startup ramp for AIMD throttling so async inference workloads can ease into peak concurrency instead of immediately bursting to max_parallel_requests. When configured, each throttle domain starts at one concurrent request and linearly ramps toward the effective max; if a 429 arrives during that ramp, the ramp is aborted and normal AIMD decrease/cooldown/recovery takes over.

The default rampup_seconds=0.0 preserves the existing immediate-start behavior for current users.

🔗 Related Issue

N/A

🔄 Changes

  • Added ThrottleConfig.rampup_seconds with validation, docs, and a default of 0.0 to keep existing behavior unchanged.
  • Extended ThrottleManager domain state to track startup ramp timing and apply a linear time-based ramp during acquire/success handling.
  • Kept 429 ceiling recording scoped to cases where the concurrency limit actually decreases, so a 429 at minimum concurrency aborts the ramp/cools down without pinning later AIMD recovery to a soft ceiling of 2.
  • Made _get_or_create_domain(..., now=...) require an explicit timestamp and updated all callers, including release_failure, acquire_sync, and acquire_async, to pass a single captured clock value.
  • Hardened the cold-server async integration smoke test by starting the synthetic server clock on first request, removing the sub-50ms assertion, and widening the final ramp-completion timing margin.
  • Simplified _apply_startup_ramp so inactive ramps return immediately, leaving registration/clamping paths to own effective-max invariants.
  • Covered startup ramp behavior with unit tests for linear climb, 429 abort during ramp, 429 at minimum concurrency, ramp disabled at effective max 1, elapsed-only ramp completion, release_failure preserving ramp progress, and non-ramp minimum-concurrency 429 recovery.
  • Added an integration-style async cold-server test that exercises the real OpenAICompatibleClient + ThrottledModelClient path for no-ramp burst behavior, matched ramp warmup, and too-fast ramp abort-to-AIMD behavior.
  • Updated the Architecture & Performance docs plus the model-stack dev note in both Fern and source Markdown, and added a short architecture note for ThrottleManager startup ramp behavior.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • ThrottleConfig — adds a public runtime configuration field, so naming/default semantics should be reviewed carefully.
  • ThrottleManager — changes adaptive concurrency startup behavior while preserving default disabled behavior and existing minimum-limit recovery semantics.
  • test_throttled_model_client.py — includes an integration-style smoke test for cold-server ramp behavior, intentionally avoiding precision timing assertions.

🧪 Testing

  • make test passes (not run locally; package-level config/engine suites and focused throttle/config suites passed)
  • Unit tests added/updated
  • Integration-style async throttled client test added/updated
  • E2E tests added/updated (N/A — no E2E surface for throttle ramp timing)
  • git diff --check
  • PATH="/Users/etramel/.local/bin:$PATH" make lint-engine
  • PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py (62 passed)
  • PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py packages/data-designer-config/tests/config/test_run_config.py (66 passed, earlier run before Nabin follow-ups)
  • PATH="/Users/etramel/.local/bin:$PATH" make test-engine (2009 passed)
  • PATH="/Users/etramel/.local/bin:$PATH" make test-config (564 passed, 42 existing warnings)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel requested a review from a team as a code owner May 13, 2026 00:37
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

MkDocs preview: https://d7072d26.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-638.docs.buildwithfern.com/nemo/datadesigner​

Notebook tutorials are rendered without execution outputs in previews.

@github-actions

Copy link
Copy Markdown
Contributor

Code Review: PR #638 — feat: add AIMD startup ramp

Summary

Adds an optional time-based startup ramp to the ThrottleManager so async inference workloads can ease into peak concurrency instead of bursting to max_parallel_requests from the first request. The new ThrottleConfig.rampup_seconds field (default 0.0) preserves existing behavior; when set > 0, each new throttle domain starts at 1 concurrent request and linearly climbs toward the effective max over the configured duration. A 429 during ramp aborts the ramp and hands off cleanly to the existing AIMD decrease/cooldown/recovery path.

The change is well-scoped: a single config field, a small ramp helper on ThrottleManager, three targeted unit tests, and matching docs in two places (Fern + source Markdown).

Findings

Correctness

  • Ramp math is sound. _apply_startup_ramp (packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py:528-541) computes ramp_slots = floor((effective_max - 1) * elapsed/rampup_seconds) and clamps current_limit = min(effective_max, 1 + ramp_slots). Unit tests cover the start (limit=1), midpoint (5/10s → limit=3 with max=5), and end (limit=5).
  • Pull-based ramp progress is fine. The ramp only advances when try_acquire / release_* is called rather than via a timer. In practice these are called frequently, and the ramp correctly clamps to effective_max on the first call after elapsed >= rampup_seconds. Worth noting in case anyone later adds a code path that holds a slot without polling.
  • release_success short-circuits during ramp. When rampup_active is true, the success branch resets success_streak to 0 and returns. This is the right call — the ramp owns concurrency growth during startup, and AIMD's success-window logic should not double-count progress. Good.
  • Behavior change in release_rate_limited extends beyond the ramp feature. The ceiling assignment was hoisted out of the if state.current_limit < prev_limit: guard:
    # before: ceiling only set when current_limit actually decreased
    # after: ceiling unconditionally set on the first 429 in a cascade
    if state.rate_limit_ceiling == 0:
        state.rate_limit_ceiling = prev_limit
    else:
        state.rate_limit_ceiling = min(state.rate_limit_ceiling, prev_limit)
    This affects existing AIMD behavior even when rampup_seconds=0 — specifically, if a 429 arrives while already at DEFAULT_MIN_LIMIT, the ceiling is now recorded as 1 (capping subsequent additive recovery at the soft ceiling = 2 instead of effective_max). The new "at minimum concurrency" log branch also fires unconditionally there. This is arguably more correct (a 429 is a real ceiling observation regardless of what we could decrease to), but the PR description frames everything as ramp-related, and this is a subtle independent change to existing throttling behavior. Suggest calling it out in the commit/PR description, or splitting it into a separate commit.
  • Defensive clamp in the inactive-ramp branch. _apply_startup_ramp also clamps current_limit > effective_max when ramp is inactive. Looks like belt-and-suspenders for re-register scenarios; harmless but unmotivated by anything else in this PR.

Tests

  • Three new tests cover the three meaningful states: linear climb, 429 mid-ramp, and 429 at the floor. Each asserts both current_limit and rampup_active, which is the right granularity.
  • Missing coverage worth adding:
    • effective_max == DEFAULT_MIN_LIMIT_get_or_create_domain deliberately skips ramp activation in this case (rampup_active = self._rampup_seconds > 0 and effective_max > DEFAULT_MIN_LIMIT). A test pinning that behavior would prevent regressions.
    • Ramp completion via elapsed-only (no calls during ramp) — call try_acquire once at t=0, then again at t > rampup_seconds, and assert current_limit == effective_max and rampup_active is False.
    • Mixing ramp with release_failure (non-rate-limit error). release_failure does not set rampup_active = False — confirm that's the intended behavior (a 500 mid-ramp shouldn't abort the ramp the way a 429 does), or add the abort. Either way, lock it in with a test.

Style / conventions

  • All edits use absolute imports, modern type syntax, from __future__ import annotations, and match the surrounding patterns (dataclass field added on DomainThrottleState, Pydantic Field(ge=0.0, ...) with a ClassVar default to mirror the other knobs). Consistent with STYLEGUIDE.md and AGENTS.md invariants.
  • Docstring on _apply_startup_ramp is one short line — appropriate.
  • Docs updates touch both docs/devnotes/posts/owning-the-model-stack.md and fern/versions/v0.5.8/pages/devnotes/posts/owning-the-model-stack.mdx (plus the Fern architecture page). Content is consistent across all three.

Documentation nits

  • Devnotes bullet: **On success**: after a window… was rewritten to **On success after a 429**: after a window…. The after a 429 qualifier is slightly misleading — release_success increases the limit any time the success window is hit, not only after a 429. Functionally it only matters post-429 (or post-ramp) because the limit is otherwise already at the cap, but readers comparing the bullet to code may find this confusing. Consider keeping the original **On success** framing and letting the new ramp bullet stand on its own.
  • Architecture page (fern/versions/v0.5.8/pages/concepts/architecture-and-performance.mdx:118): "a new throttle domain starts at 1 concurrent request" — tighter to say "starts at one concurrent request" to match the rest of the prose, but this is purely stylistic.
  • The defaults table now lists rampup_seconds | 0.0 | …. Good — consistent with other knobs.

Performance / security

  • All new state lives inside the existing _lock. No new sync primitives, no IO, no allocations in the hot path beyond the existing dataclass. Negligible overhead.
  • No security-relevant surface (no user input parsing, no network).

Verdict

Approve with minor follow-ups. Implementation is clean, well-tested, and correctly preserves default behavior. Two things worth addressing before merge:

  1. Acknowledge the release_rate_limited ceiling-recording change in the PR description (or split it into its own commit) — it's an independent behavior tweak, not part of the ramp.
  2. Add the missing edge-case tests (ramp disabled when effective_max == DEFAULT_MIN_LIMIT; ramp completes purely via elapsed time; release_failure interaction with active ramp).

Doc nit on the "On success after a 429" bullet is optional.

@greptile-apps

greptile-apps Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds an optional startup ramp to the AIMD throttle manager so async inference workloads can ease into peak concurrency instead of immediately bursting. The default rampup_seconds=0.0 preserves existing behavior for all current users.

  • ThrottleConfig.rampup_seconds: new field with ge=0.0 validation, default 0.0, and full docs/test coverage.
  • ThrottleManager ramp logic: _get_or_create_domain initialises new domains at limit=1 when a ramp is active; _apply_startup_ramp linearly advances current_limit toward effective_max on every try_acquire and release_success; a 429 sets rampup_active=False and hands off to normal AIMD; the rate-limit ceiling is correctly kept inside the current_limit < prev_limit guard so a 429 at minimum concurrency cannot pin later recovery.
  • Tests: seven targeted unit tests and two integration-style async smoke tests covering the linear climb, 429-abort, ramp-disabled-at-max-1, elapsed-only completion, release_failure preservation, and non-ramp minimum-concurrency recovery scenarios.

Confidence Score: 5/5

Safe to merge — the ramp is additive-only during the startup window, the fallback to normal AIMD on 429 is unconditional, and the default rampup_seconds=0.0 leaves all existing behavior unchanged.

The rate-limit ceiling is correctly kept inside the current_limit < prev_limit guard (the concern raised in the previous review has been addressed), and the new tests explicitly verify the minimum-concurrency 429 case for both ramp and non-ramp paths. The ramp state is always mutated under the lock, and all callers capture a single now snapshot before entering the lock, eliminating clock-skew inconsistencies. The integration smoke tests exercise the real OpenAICompatibleClient + ThrottledModelClient path end-to-end.

No files require special attention; the core state machine in throttle_manager.py and its test coverage are thorough.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py Core AIMD ramp logic added correctly; ceiling recording stays inside the current_limit < prev_limit guard, resolving the previously flagged concern; _apply_startup_ramp is called consistently on both try_acquire and release_success paths under the lock.
packages/data-designer-config/src/data_designer/config/run_config.py Adds rampup_seconds field with ge=0.0 validation, a DEFAULT_RAMPUP_SECONDS=0.0 class var, and docstring; backward-compatible with default 0.0.
packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py Seven new unit tests cover all documented ramp scenarios including 429-abort, minimum-concurrency guard, elapsed-only completion, release_failure preservation, and non-ramp recovery.
packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py Adds a synthetic ColdServerAsyncHTTPClient and two async integration smoke tests that exercise the real OpenAI-compatible client path.
packages/data-designer-config/tests/config/test_run_config.py Two new tests verify rampup_seconds acceptance and rejection of negative values.
docs/concepts/architecture-and-performance.md Adds startup ramp bullet to the AIMD section, updates the example, and adds rampup_seconds row to the config table.
docs/devnotes/posts/owning-the-model-stack.md Adds Startup ramp subsection and updates the AIMD bullet list and parameter table consistently.

Reviews (5): Last reviewed commit: "Merge branch 'main' into codex/aimd-star..." | Re-trigger Greptile

@eric-tramel

Copy link
Copy Markdown
Contributor Author

Scratch validation harness for AIMD startup ramp

I added a local ignored scratch harness to exercise the new ramp behavior against a synthetic cold async inference backend:

.scratch/aimd_ramp_scratch.py

Run it from the repo root with:

/Users/etramel/.local/bin/uv run --all-packages --group dev python .scratch/aimd_ramp_scratch.py

The harness uses the real ThrottleManager + ThrottledModelClient path and a fake async server that allows more concurrency over wall-clock time. It covers three practical scenarios:

  • no_ramp_cold_server: no client ramp against a cold server; demonstrates the initial burst and synthetic 429s.
  • matched_ramp_cold_server: client ramp matches server warmup; verifies no cold-start 429s and no immediate burst above one request.
  • too_fast_ramp_aborts: client ramp is too aggressive; verifies a synthetic 429 aborts the startup ramp and switches into normal AIMD behavior.

Observed output locally:

scenario                         peak_100ms  peak_250ms  peak_total  429s  attempts  final_limit  ramp_active  seconds
------------------------------------------------------------------------------------------------------------
no_ramp_cold_server                      8           8           8    16        48            5        False     0.94
matched_ramp_cold_server                 1           2           7     0        32            8        False     0.85
too_fast_ramp_aborts                     2           2           3     1        33            3        False     1.41

Scratch assertions passed.

This is intentionally not committed because .scratch/ is gitignored, but it gives reviewers a concrete reproduction for the behavior the production unit tests are modeling.

@eric-tramel

Copy link
Copy Markdown
Contributor Author

[P2] Early startup 429 can pin recovery at 2

In packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py, the new ceiling update records rate_limit_ceiling even when the ramp is still at DEFAULT_MIN_LIMIT. That means a single transient 429 on the first ramped request sets the ceiling to 1; _compute_soft_ceiling() then caps later recovery at 2 even when max_parallel_requests is much higher. For the cold-start case this feature targets, that can abort the ramp and leave the domain stuck at two concurrent requests indefinitely. Consider preserving the previous guard so a ceiling is recorded only when the limit actually drops, or add a way to clear/raise this minimum ceiling after successful recovery.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel

Copy link
Copy Markdown
Contributor Author

Addressed the review feedback in fa992927.

What changed:

  • Restored the key AIMD ceiling guard: rate_limit_ceiling is only recorded when a 429 actually reduces current_limit. A 429 at minimum concurrency still aborts startup ramp and applies cooldown, but it no longer pins recovery to a soft ceiling of 2. This covers both the non-ramp regression and the early-startup 429 concern.
  • Added the missing edge coverage:
    • ramp skipped when effective max is one
    • ramp completion on the first call after elapsed ramp time
    • release_failure does not abort startup ramp
    • non-ramp minimum-concurrency 429 does not pin recovery
  • Promoted the scratch cold-server idea into a committed integration-style async test using the real OpenAICompatibleClient + ThrottledModelClient path. It compares no-ramp cold burst behavior, matched ramp warmup with no 429s, and too-fast ramp aborting into normal AIMD.
  • Cleaned up the docs wording called out in review and clarified that ceiling recording happens when a decrease reveals a higher failed limit.

Validated locally:

git diff --check
PATH="/Users/etramel/.local/bin:$PATH" make lint-engine
PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py packages/data-designer-config/tests/config/test_run_config.py
PATH="/Users/etramel/.local/bin:$PATH" make test-engine
PATH="/Users/etramel/.local/bin:$PATH" make test-config

Results: focused suite 66 passed, engine suite 2009 passed, config suite 564 passed with the existing deprecation warnings.

@eric-tramel eric-tramel self-assigned this May 13, 2026
@eric-tramel eric-tramel requested a review from nabinchha May 13, 2026 15:09
@nabinchha

Copy link
Copy Markdown
Contributor

Nice work on this one, @eric-tramel — the startup ramp slots cleanly into the existing AIMD machinery and the test coverage tells a clear story.

Summary

This PR adds an optional rampup_seconds knob to ThrottleConfig that ramps a new throttle domain linearly from 1 to max_parallel_requests over the configured duration, with a 429 aborting the ramp and handing off to normal AIMD recovery. The default of 0.0 preserves the existing immediate-start behavior, so this is purely additive. The implementation matches the PR description: per-domain ramp state, integration with try_acquire / release_success / release_rate_limited, a fix to scope ceiling recording so a 429 at minimum concurrency doesn't pin recovery, and matching docs in both source markdown and the Fern site.

Findings

Warnings — Worth addressing

docs/concepts/architecture-and-performance.md — source markdown not updated alongside the Fern .mdx

  • What: The Fern page fern/versions/v0.5.8/pages/concepts/architecture-and-performance.mdx got the full rampup_seconds treatment (new AIMD bullet, worked example, config-table row), but its paired source at docs/concepts/architecture-and-performance.md still describes only reduce_factor / additive_increase / success_window / cooldown_seconds / ceiling_overshoot. After this PR, the two copies are out of sync.

  • Why: The dev note in this same PR (docs/devnotes/posts/owning-the-model-stack.mdfern/.../devnotes/posts/owning-the-model-stack.mdx) was updated in lockstep, so the established author-intent here is "keep both copies aligned." Fern is the published source of truth per the docs maintenance skill, but anyone reading the in-repo docs/concepts/architecture-and-performance.md (GitHub browsing, grep, MkDocs render) will see a stale config surface — and the next person editing throttle docs will hit a divergence that wasn't there when they started.

  • Suggestion: Mirror the three additions from the .mdx into docs/concepts/architecture-and-performance.md:

    • The "During optional startup ramp" bullet in the AIMD list (~line 110-ish).
    • The updated example sentence ("With buffer_size=100 and max_parallel_requests=32...") that explains the ramp.
    • The rampup_seconds=0.0 row in the ThrottleConfig example block and the parameter table.

    If the intent is to deprecate docs/concepts/*.md entirely, that's a separate conversation worth raising in the PR description — but for this PR alone, the cleanest move is to keep the pair in sync the same way the dev note was.

packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py:503 — make now required on _get_or_create_domain

  • What: _get_or_create_domain(..., now: float | None = None) silently falls back to time.monotonic() when now is None. This PR threaded now from try_acquire, release_success, and release_rate_limited, but release_failure (which already accepts now: float | None) and acquire_sync / acquire_async still call the helper without it — leaving an implicit wall-clock fallback that's inconsistent with the rest of the call graph.
  • Why: The whole point of threading now is to keep the rampup_started_at clock source under the caller's control (deterministic tests, no skew between time-of-check and time-of-state-update). Once a private helper's state seeding depends on now, leaving the default makes it easy for a future caller to forget to thread it and silently re-introduce a wall-clock race in the ramp init path. test_failure_does_not_abort_startup_ramp already passes now=0.0 into release_failure, so the parameter is part of the test contract — but the value is currently being dropped on the floor by the time it reaches _get_or_create_domain.
  • Suggestion: Make now: float required on _get_or_create_domain and update all callers to pass it explicitly:
def _get_or_create_domain(
    self,
    provider_name: str,
    model_id: str,
    domain: ThrottleDomain,
    now: float,
) -> DomainThrottleState:
    ...

The call sites are small fixes: release_failure resolves now = now if now is not None else time.monotonic() outside the lock like its peers and passes now=now; acquire_sync / acquire_async capture now = time.monotonic() once at the top (they don't strictly need it for ramp purposes since try_acquire runs first, but it's free and makes the dependency explicit — and avoids the trap if anyone ever reorders these methods so _get_or_create_domain becomes the first writer). The type checker then locks the contract across the whole class.

packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py:507 — Wall-clock-sensitive integration assertions

  • What: test_startup_ramp_integration_eases_into_cold_server_without_429s mixes real time.monotonic() (inside ColdServerAsyncHTTPClient) with tight timing thresholds — notably peak_in_flight_before(0.05) <= 1, service_seconds=0.04, and cooldown_seconds=0.02. The whole test runs in ~0.6 s, with sub-50 ms margins between events.
  • Why: On a loaded CI runner, a single GC pause or event-loop hiccup right after the first request fires could push the second sample past the 50 ms mark, or let two requests slip through the ramp window before throttle state updates. The same machinery also depends on await asyncio.sleep(0.31) being "long enough" to cover the 0.3 s ramp — which it is most of the time, but not always.
  • Suggestion: Either widen the timing thresholds (e.g. peak_in_flight_before(0.10) against a throttle_ramp_seconds=0.5 config), or — even better — let ColdServerAsyncHTTPClient accept an injected "clock" so the server-side elapsed time is driven from the same source as the throttle (e.g. an asyncio.Event-based gate, or a counter that advances on each post()). The pure-unit tests in test_throttle_manager.py already give you airtight semantic coverage via now=; the integration test is best treated as a smoke test that "things wire up" rather than as a precision timing assertion.

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/models/clients/throttle_manager.py:524 — give _apply_startup_ramp one job

  • What: The current inactive-ramp branch does two things at once — it both returns early and sneaks in a if state.current_limit > effective_max: state.current_limit = effective_max invariant check. That extra clamp is unreachable in the normal flow (_clamp_domains enforces the same invariant on register(), and the active-ramp branch already does min(effective_max, …)), so the helper reads like it does more than it actually does. It's confusing enough that one can plausibly misread "is this branch supposed to flip rampup_active back on?" — it's not (re-arming would either re-ramp forever after completion or silently undo the 429-abort contract), but the mixed responsibility makes the question reasonable.
  • Why: A reader has to convince themselves that the clamp is dead code in steady state and that the missing rampup_active = True is intentional. That's a lot of reasoning per line for a private helper whose name promises one thing ("apply the startup ramp"). Tightening the helper to exactly that contract also makes the next person extending the ramp logic less likely to confuse "fix the invariant" with "advance the ramp."
  • Suggestion: Drop the inactive-ramp clamp and let _clamp_domains own the current_limit ≤ effective_max invariant on the registration path (it already does). The helper then collapses to "ramp inactive → return; otherwise advance/finalize the ramp":
def _apply_startup_ramp(self, state: DomainThrottleState, effective_max: int, now: float) -> None:
    """Advance the startup ramp for this domain, if it's still active."""
    if not state.rampup_active:
        return

    if self._rampup_seconds <= 0 or effective_max <= DEFAULT_MIN_LIMIT:
        state.current_limit = min(state.current_limit, effective_max)
        state.rampup_active = False
        return

    elapsed = max(0.0, now - state.rampup_started_at)
    if elapsed >= self._rampup_seconds:
        state.current_limit = effective_max
        state.rampup_active = False
        return

    fraction = elapsed / self._rampup_seconds
    ramp_slots = math.floor((effective_max - DEFAULT_MIN_LIMIT) * fraction)
    state.current_limit = min(effective_max, DEFAULT_MIN_LIMIT + ramp_slots)

If you'd rather keep the clamp as a belt-and-suspenders guard against a future caller dropping into _apply_startup_ramp without going through register() first, leave a one-line comment explaining the specific scenario it defends — that turns it from "mystery line" into "documented invariant pin."

architecture/models.md — no mention of startup ramp

  • What: The PR touched the owning-the-model-stack dev note and the Fern Architecture & Performance page, but architecture/models.md (the engine subsystem map) still describes the ThrottleManager purely in AIMD terms.
  • Why: It's a high-level overview and not a config reference, so this is genuinely optional. But contributors land on architecture/ when getting oriented, and a one-liner under "ThrottleManager" keeps the map in sync with the dev notes.
  • Suggestion: Add a sentence like "An optional startup ramp (rampup_seconds) eases new domains from one concurrent request up to the configured peak, aborting to normal AIMD on the first 429." to the existing ThrottleManager section.

packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py:163 — test name vs. assertion mismatch

  • What: test_failure_does_not_abort_startup_ramp asserts (correctly) that release_failure preserves rampup_active, but the closing assertions also re-verify the ramp formula (current_limit == 3 at now=5.0 for a 10 s ramp to max 5). That's two behaviors in one test.
  • Why: Minor — both behaviors are valuable, but the test name suggests a single contract.
  • Suggestion: Either rename to test_release_failure_preserves_startup_ramp_and_progress (so the formula check is part of the contract), or split out the formula assertion into the existing linear-climb test. Take it or leave it.

What Looks Good

  • Backward compatibility is genuinely preserved. DEFAULT_RAMPUP_SECONDS = 0.0 plus the rampup_active = self._rampup_seconds > 0 and effective_max > DEFAULT_MIN_LIMIT guard in _get_or_create_domain means existing users see zero behavior change, and test_startup_ramp_skipped_when_effective_max_is_one locks that in.
  • The minimum-concurrency 429 fix is a sharp catch. Moving the rate_limit_ceiling update inside the state.current_limit < prev_limit branch — and adding a separate log line for the "already at minimum" case — keeps the soft ceiling from being pinned to 2 when AIMD bottoms out, both during a ramp abort and in steady state. test_rate_limit_at_start_of_ramp_does_not_pin_recovery_to_minimum_ceiling and test_non_ramp_rate_limit_at_minimum_does_not_pin_recovery_to_soft_ceiling document the regression nicely.
  • The unit test suite gives ramped/non-ramped semantics airtight now=-driven coverage. Linear climb, 429 abort, ramp-disabled-at-max-1, elapsed-only completion, and release_failure-during-ramp are all there with explicit time arithmetic — no real sleeps, no flakiness.
  • Docs read well. The Architecture & Performance page now leads with the ramp in the AIMD bullet list and follows up with a worked example; the dev-note prose explains the "optimistic but interruptible" intuition without hand-waving. The dev-note source-markdown ↔ Fern mirror is kept in lockstep (modulo the architecture-and-performance pair flagged above).

Verdict

Needs changes — the runtime logic itself is solid, but three things are worth resolving before merge: the docs/concepts/architecture-and-performance.md drift (to match the dev-note mirror this PR already maintains), making now required on _get_or_create_domain so the threaded-clock contract is enforced by the type checker rather than convention, and tightening up the wall-clock-sensitive integration test (or downgrading what it asserts). The remaining suggestions are polish.


This review was generated by an AI assistant.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel

Copy link
Copy Markdown
Contributor Author

@nabinchha thanks for the detailed pass. I pushed follow-ups through 4dba051c addressing all six items you called out.

Resolved items:

  1. Synced docs/concepts/architecture-and-performance.md with the Fern architecture page: startup-ramp bullet, worked example, ThrottleConfig example, and parameter table row.
  2. Made _get_or_create_domain(..., now=...) require an explicit timestamp and updated all callers. release_failure, acquire_sync, and acquire_async now capture/pass explicit time.monotonic() values instead of relying on helper fallback.
  3. Tightened the cold-server integration smoke test: the synthetic server clock starts on first request, the sub-50ms assertion is gone, and the final ramp-completion probe uses a wider explicit margin.
  4. Simplified _apply_startup_ramp by removing the inactive-ramp clamp so the helper only advances/finalizes active ramps.
  5. Added the ThrottleManager startup-ramp one-liner to architecture/models.md.
  6. Renamed the failure/ramp test to test_release_failure_preserves_startup_ramp_and_progress so the name matches the assertions.

Validated locally after integrating the worker branches:

git diff --check
PATH="/Users/etramel/.local/bin:$PATH" make lint-engine
PATH="/Users/etramel/.local/bin:$PATH" uv run --all-packages --group dev pytest packages/data-designer-engine/tests/engine/models/clients/test_throttle_manager.py packages/data-designer-engine/tests/engine/models/clients/test_throttled_model_client.py
PATH="/Users/etramel/.local/bin:$PATH" make test-engine
PATH="/Users/etramel/.local/bin:$PATH" make test-config

Results: focused throttle suite 62 passed, engine suite 2009 passed, config suite 564 passed with the existing deprecation warnings.

@nabinchha nabinchha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast turnaround, @eric-tramel — all six items from the prior pass landed cleanly and the fixes go beyond just "make the test pass." Re-reviewed fa992927..081e431a (5 follow-up commits + a merge from main).

Summary

This round adds the docs sync, makes now a required argument on _get_or_create_domain with all call sites updated, drops the inactive-ramp clamp from _apply_startup_ramp, renames the failure-preserves-ramp test, and rebuilds the integration test's timing model to be deterministic rather than wall-clock-sensitive. Net result: 468 / -21 across 10 files (vs. 447 / -11 last time), and ruff check + format --check are green on the changed set.

Findings

What Got Resolved (vs. prior review)

  • docs/concepts/architecture-and-performance.md drift (Warning, prior) → fixed in e3fb05f8. All three of the requested additions are there: the "During optional startup ramp" AIMD bullet, the updated worked example ("If rampup_seconds=30, it starts at one request and climbs linearly toward 32..."), and the rampup_seconds=0.0 row in both the example code block and the config-table. Source .md and Fern .mdx now read identically — pair is back in lockstep, matching the dev-note treatment.

  • now required on _get_or_create_domain (Warning, prior) → fixed in 446f753e. The signature now requires now: float; release_failure resolves now = now if now is not None else time.monotonic() at the top like its peers; acquire_sync / acquire_async each capture now = time.monotonic() once and thread it through both try_acquire and the bookkeeping _get_or_create_domain call. Nice touch: the retry loops re-capture now = time.monotonic() before each subsequent try_acquire, so a long wait doesn't carry a stale timestamp into the next acquire attempt. The type checker now enforces the contract end-to-end.

  • Wall-clock-sensitive integration assertion (Warning, prior) → fixed in e8c9e496 + 4dba051c. Rather than just widening margins, the test was redesigned: _started_at is now lazy (set on first post() instead of __init__), so the gap between client construction and first request stops counting against the ramp budget; the timing-fragile peak_in_flight_before(0.05) <= 1 assertion got replaced with peak_allowed > 1 (a deterministic property of the sample set, not a wall-clock window); the post-ramp sleep is now derived from the ramp duration (throttle_ramp_seconds + 0.2) instead of a hardcoded 0.31. The behavioral check that actually matters — ramped_client.rate_limits == 0 — was already there and remains the real proof the throttle ramp matched the server ramp.

  • Defensive clamp in _apply_startup_ramp (Suggestion, prior) → fixed in e8c9e496. The inactive-ramp branch is now a clean early return_clamp_domains keeps owning the current_limit ≤ effective_max invariant on the registration path, and the helper does exactly what its name says. The active-ramp branch's min(effective_max, …) continues to handle mid-ramp clamps. No behavior change, easier to read.

  • architecture/models.md mention (Suggestion, prior) → fixed in e3fb05f8. Added a one-line description under the ThrottleManager section that hits the three load-bearing facts: "starts new domains at one concurrent request, climbs linearly toward the peak, and aborts to normal AIMD behavior on the first 429." Subsystem map now matches the dev-note narrative.

  • Test name test_failure_does_not_abort_startup_ramp (Suggestion, prior) → fixed in e8c9e496. Renamed to test_release_failure_preserves_startup_ramp_and_progress, which now accurately describes both behaviors the test verifies (ramp preserved across release_failure, and ramp formula still advances correctly afterward).

New Findings

None — re-pass on the modified files surfaces no regressions, no new edge cases, and no leftover docs drift. Ruff check + format --check clean on the changed set.

What Looks Good

  • now propagation is more thorough than the suggestion asked for. Recapturing now inside the acquire_sync / acquire_async retry loops is the right move — without it, a long initial wait would have left subsequent try_acquire calls reading a deadline-relative timestamp instead of a fresh wall-clock one. Type checker now enforces the threaded-clock contract everywhere.
  • The integration test redesign is a real fix, not a workaround. Switching to peak_allowed > 1 plus lazy _started_at removes the wall-clock dependency entirely — the test now asserts what the system did rather than when it happened to do it. That's the version that'll keep passing on noisy CI.
  • Docs pair is fully reconciled. Source .md and Fern .mdx for architecture-and-performance read identically now; the subsystem map in architecture/models.md was a freebie that you didn't have to do but did anyway.

Verdict

Ship it. All prior findings are resolved with care, no new findings, lint/format pass. The follow-up commits also raise the test-architecture bar — the integration test is now closer to a property check than a timing race, which is worth keeping as the pattern for future async ramp work.


This review was generated by an AI assistant.

@eric-tramel eric-tramel merged commit a4085c4 into main May 13, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants