Priority Level
Medium (Annoying but has workaround)
Describe the bug
Setting max_parallel_requests in ModelConfig (or ChatCompletionInferenceParams) has no effect on the underlying HTTP connection pool. The pool is silently capped at 100 concurrent connections regardless of the configured value, because httpx.Client ignores its limits parameter when a custom transport is provided, and RetryTransport creates its internal HTTPTransport with httpx's default limits.
This means a user who sets max_parallel_requests=300 expecting ~300 concurrent LLM requests will observe at most ~100 in practice.
Steps/Code to reproduce bug
import data_designer as dd
# Configure model with high parallelism
model_config = dd.ModelConfig(
alias="my_model",
model="your-model-name",
inference_parameters=dd.ChatCompletionInferenceParams(
max_parallel_requests=300,
),
)
...
# Verify what connection pool actually gets created
from data_designer.engine.models.clients.retry import create_retry_transport
rt = create_retry_transport(config=None, strip_rate_limit_codes=False)
print(rt._sync_transport._pool._max_connections) # prints: 100, NOT 600 (= 2 * 300)
Expected behavior
600 # max(32, 2 * max_parallel_requests) = 2 * 300
Additional context
The bug spans three files and involves a silent parameter drop between layers.
1. http_model_client.py — limits are calculated correctly but passed to the wrong place
# data_designer/engine/models/clients/adapters/http_model_client.py
pool_max = max(_MIN_MAX_CONNECTIONS, _POOL_MAX_MULTIPLIER * max_parallel_requests)
pool_keepalive = max(_MIN_KEEPALIVE_CONNECTIONS, max_parallel_requests)
self._limits = lazy.httpx.Limits( # calculated correctly
max_connections=pool_max,
max_keepalive_connections=pool_keepalive,
)
# ...later, on first request:
self._transport = create_retry_transport(self._retry_config, strip_rate_limit_codes=False)
self._client = lazy.httpx.Client(
transport=self._transport, # ← custom transport provided
limits=self._limits, # ← IGNORED by httpx when transport != None
timeout=lazy.httpx.Timeout(self._timeout_s),
)
2. httpx.Client._init_transport silently ignores limits when a custom transport is provided
This is documented httpx behaviour: when transport is not None, the method returns it directly without applying limits:
# httpx source (v0.28.1)
def _init_transport(self, ..., limits=DEFAULT_LIMITS, transport=None) -> BaseTransport:
if transport is not None:
return transport # limits never used
return HTTPTransport(..., limits=limits)
3. RetryTransport creates its internal HTTPTransport with default limits
# httpx_retries/transport.py (v0.4.6)
class RetryTransport:
def __init__(self, transport=None, retry=None):
if transport is not None:
self._sync_transport = transport ...
else:
self._sync_transport = httpx.HTTPTransport() # ← no limits argument
self._async_transport = httpx.AsyncHTTPTransport() # ← no limits argument
httpx.HTTPTransport() with no arguments creates an httpcore.ConnectionPool with max_connections=100 (httpx 0.28.1 default), regardless of what was configured in ModelConfig.
Verified empirically
from httpx_retries import RetryTransport, Retry
import httpx
rt = RetryTransport(retry=Retry(total=3))
print(rt._sync_transport._pool._max_connections) # 100
# httpx.Client ignores limits= when transport= is provided:
client = httpx.Client(
transport=rt,
limits=httpx.Limits(max_connections=600, max_keepalive_connections=300),
)
print(client._transport._sync_transport._pool._max_connections) # still 100
Expected Behavior
Setting max_parallel_requests=N in ModelConfig should result in a connection pool that allows at least N (ideally 2*N as per the existing _POOL_MAX_MULTIPLIER constant) concurrent connections.
Actual Behavior
The connection pool is always limited to 100 concurrent connections (httpx's internal default), making max_parallel_requests values above ~100 have no effect on actual throughput.
Suggested Fix
Pass a pre-configured httpx.HTTPTransport (and AsyncHTTPTransport) into RetryTransport instead of letting it create its own with default limits:
# http_model_client.py — _get_sync_client()
def _get_sync_client(self) -> httpx.Client:
with self._init_lock:
if self._client is None:
if self._transport is None:
inner = lazy.httpx.HTTPTransport(limits=self._limits) # ← pass limits here
self._transport = create_retry_transport(
self._retry_config,
strip_rate_limit_codes=False,
transport=inner, # ← pass to RetryTransport
)
self._client = lazy.httpx.Client(
transport=self._transport,
timeout=lazy.httpx.Timeout(self._timeout_s),
)
return self._client
This requires create_retry_transport to accept and forward an optional transport argument to RetryTransport(transport=..., retry=...), which httpx_retries already supports.
The same fix should be applied to _get_async_client using httpx.AsyncHTTPTransport.
Workaround
Until fixed, monkey-patch RetryTransport.__init__ before any model client is created:
import httpx
from httpx_retries import RetryTransport
_orig = RetryTransport.__init__
def _fixed(self, transport=None, retry=None):
_orig(self, transport=transport, retry=retry)
if transport is None:
unlimited = httpx.Limits(max_connections=None, max_keepalive_connections=None)
self._sync_transport = httpx.HTTPTransport(limits=unlimited)
self._async_transport = httpx.AsyncHTTPTransport(limits=unlimited)
RetryTransport.__init__ = _fixed
Environment
| Component |
Version |
data-designer |
0.5.4 |
data-designer-engine |
0.5.4 |
httpx |
0.28.1 |
httpx-retries |
0.4.6 |
httpcore |
1.0.9 |
| Python |
3.12.9 |
| Platform |
macOS (darwin) |
Priority Level
Medium (Annoying but has workaround)
Describe the bug
Setting
max_parallel_requestsinModelConfig(orChatCompletionInferenceParams) has no effect on the underlying HTTP connection pool. The pool is silently capped at 100 concurrent connections regardless of the configured value, becausehttpx.Clientignores itslimitsparameter when a customtransportis provided, andRetryTransportcreates its internalHTTPTransportwith httpx's default limits.This means a user who sets
max_parallel_requests=300expecting ~300 concurrent LLM requests will observe at most ~100 in practice.Steps/Code to reproduce bug
Expected behavior
600 # max(32, 2 * max_parallel_requests) = 2 * 300
Additional context
The bug spans three files and involves a silent parameter drop between layers.
1.
http_model_client.py— limits are calculated correctly but passed to the wrong place2.
httpx.Client._init_transportsilently ignoreslimitswhen a custom transport is providedThis is documented httpx behaviour: when
transportis notNone, the method returns it directly without applyinglimits:3.
RetryTransportcreates its internalHTTPTransportwith default limitshttpx.HTTPTransport()with no arguments creates anhttpcore.ConnectionPoolwithmax_connections=100(httpx 0.28.1 default), regardless of what was configured inModelConfig.Verified empirically
Expected Behavior
Setting
max_parallel_requests=NinModelConfigshould result in a connection pool that allows at leastN(ideally2*Nas per the existing_POOL_MAX_MULTIPLIERconstant) concurrent connections.Actual Behavior
The connection pool is always limited to 100 concurrent connections (httpx's internal default), making
max_parallel_requestsvalues above ~100 have no effect on actual throughput.Suggested Fix
Pass a pre-configured
httpx.HTTPTransport(andAsyncHTTPTransport) intoRetryTransportinstead of letting it create its own with default limits:This requires
create_retry_transportto accept and forward an optionaltransportargument toRetryTransport(transport=..., retry=...), whichhttpx_retriesalready supports.The same fix should be applied to
_get_async_clientusinghttpx.AsyncHTTPTransport.Workaround
Until fixed, monkey-patch
RetryTransport.__init__before any model client is created:Environment
data-designerdata-designer-enginehttpxhttpx-retrieshttpcore