max_parallel_requests has no effect on actual HTTP connection pool size

### Priority Level

Medium (Annoying but has workaround)

### Describe the bug

Setting `max_parallel_requests` in `ModelConfig` (or `ChatCompletionInferenceParams`) has no effect on the underlying HTTP connection pool. The pool is silently capped at **100 concurrent connections** regardless of the configured value, because `httpx.Client` ignores its `limits` parameter when a custom `transport` is provided, and `RetryTransport` creates its internal `HTTPTransport` with httpx's default limits.

This means a user who sets `max_parallel_requests=300` expecting ~300 concurrent LLM requests will observe at most ~100 in practice.

### Steps/Code to reproduce bug

```python
import data_designer as dd

# Configure model with high parallelism
model_config = dd.ModelConfig(
    alias="my_model",
    model="your-model-name",
    inference_parameters=dd.ChatCompletionInferenceParams(
        max_parallel_requests=300,
    ),
)
...

# Verify what connection pool actually gets created
from data_designer.engine.models.clients.retry import create_retry_transport
rt = create_retry_transport(config=None, strip_rate_limit_codes=False)
print(rt._sync_transport._pool._max_connections)  # prints: 100, NOT 600 (= 2 * 300)
```

### Expected behavior

600  # max(32, 2 * max_parallel_requests) = 2 * 300

### Additional context

The bug spans three files and involves a silent parameter drop between layers.

### 1. `http_model_client.py` — limits are calculated correctly but passed to the wrong place

```python
# data_designer/engine/models/clients/adapters/http_model_client.py

pool_max = max(_MIN_MAX_CONNECTIONS, _POOL_MAX_MULTIPLIER * max_parallel_requests)
pool_keepalive = max(_MIN_KEEPALIVE_CONNECTIONS, max_parallel_requests)
self._limits = lazy.httpx.Limits(          # calculated correctly
    max_connections=pool_max,
    max_keepalive_connections=pool_keepalive,
)

# ...later, on first request:
self._transport = create_retry_transport(self._retry_config, strip_rate_limit_codes=False)
self._client = lazy.httpx.Client(
    transport=self._transport,   # ← custom transport provided
    limits=self._limits,         # ← IGNORED by httpx when transport != None
    timeout=lazy.httpx.Timeout(self._timeout_s),
)
```

### 2. `httpx.Client._init_transport` silently ignores `limits` when a custom transport is provided

This is documented httpx behaviour: when `transport` is not `None`, the method returns it directly without applying `limits`:

```python
# httpx source (v0.28.1)
def _init_transport(self, ..., limits=DEFAULT_LIMITS, transport=None) -> BaseTransport:
    if transport is not None:
        return transport       # limits never used
    return HTTPTransport(..., limits=limits)
```

### 3. `RetryTransport` creates its internal `HTTPTransport` with default limits

```python
# httpx_retries/transport.py (v0.4.6)
class RetryTransport:
    def __init__(self, transport=None, retry=None):
        if transport is not None:
            self._sync_transport = transport ...
        else:
            self._sync_transport = httpx.HTTPTransport()        # ← no limits argument
            self._async_transport = httpx.AsyncHTTPTransport()  # ← no limits argument
```

`httpx.HTTPTransport()` with no arguments creates an `httpcore.ConnectionPool` with **`max_connections=100`** (httpx 0.28.1 default), regardless of what was configured in `ModelConfig`.

### Verified empirically

```python
from httpx_retries import RetryTransport, Retry
import httpx

rt = RetryTransport(retry=Retry(total=3))
print(rt._sync_transport._pool._max_connections)   # 100

# httpx.Client ignores limits= when transport= is provided:
client = httpx.Client(
    transport=rt,
    limits=httpx.Limits(max_connections=600, max_keepalive_connections=300),
)
print(client._transport._sync_transport._pool._max_connections)  # still 100
```

---

## Expected Behavior

Setting `max_parallel_requests=N` in `ModelConfig` should result in a connection pool that allows at least `N` (ideally `2*N` as per the existing `_POOL_MAX_MULTIPLIER` constant) concurrent connections.

---

## Actual Behavior

The connection pool is always limited to **100 concurrent connections** (httpx's internal default), making `max_parallel_requests` values above ~100 have no effect on actual throughput.

---

## Suggested Fix

Pass a pre-configured `httpx.HTTPTransport` (and `AsyncHTTPTransport`) into `RetryTransport` instead of letting it create its own with default limits:

```python
# http_model_client.py — _get_sync_client()
def _get_sync_client(self) -> httpx.Client:
    with self._init_lock:
        if self._client is None:
            if self._transport is None:
                inner = lazy.httpx.HTTPTransport(limits=self._limits)  # ← pass limits here
                self._transport = create_retry_transport(
                    self._retry_config,
                    strip_rate_limit_codes=False,
                    transport=inner,                                    # ← pass to RetryTransport
                )
            self._client = lazy.httpx.Client(
                transport=self._transport,
                timeout=lazy.httpx.Timeout(self._timeout_s),
            )
        return self._client
```

This requires `create_retry_transport` to accept and forward an optional `transport` argument to `RetryTransport(transport=..., retry=...)`, which `httpx_retries` already supports.

The same fix should be applied to `_get_async_client` using `httpx.AsyncHTTPTransport`.

---

## Workaround

Until fixed, monkey-patch `RetryTransport.__init__` before any model client is created:

```python
import httpx
from httpx_retries import RetryTransport

_orig = RetryTransport.__init__

def _fixed(self, transport=None, retry=None):
    _orig(self, transport=transport, retry=retry)
    if transport is None:
        unlimited = httpx.Limits(max_connections=None, max_keepalive_connections=None)
        self._sync_transport = httpx.HTTPTransport(limits=unlimited)
        self._async_transport = httpx.AsyncHTTPTransport(limits=unlimited)

RetryTransport.__init__ = _fixed
```

---

## Environment

| Component | Version |
|---|---|
| `data-designer` | 0.5.4 |
| `data-designer-engine` | 0.5.4 |
| `httpx` | 0.28.1 |
| `httpx-retries` | 0.4.6 |
| `httpcore` | 1.0.9 |
| Python | 3.12.9 |
| Platform | macOS (darwin) |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_parallel_requests has no effect on actual HTTP connection pool size #459

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

1. `http_model_client.py` — limits are calculated correctly but passed to the wrong place

2. `httpx.Client._init_transport` silently ignores `limits` when a custom transport is provided

3. `RetryTransport` creates its internal `HTTPTransport` with default limits

Verified empirically

Expected Behavior

Actual Behavior

Suggested Fix

Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Version
`data-designer`	0.5.4
`data-designer-engine`	0.5.4
`httpx`	0.28.1
`httpx-retries`	0.4.6
`httpcore`	1.0.9
Python	3.12.9
Platform	macOS (darwin)

max_parallel_requests has no effect on actual HTTP connection pool size #459

Description

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

1. http_model_client.py — limits are calculated correctly but passed to the wrong place

2. httpx.Client._init_transport silently ignores limits when a custom transport is provided

3. RetryTransport creates its internal HTTPTransport with default limits

Verified empirically

Expected Behavior

Actual Behavior

Suggested Fix

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `http_model_client.py` — limits are calculated correctly but passed to the wrong place

2. `httpx.Client._init_transport` silently ignores `limits` when a custom transport is provided

3. `RetryTransport` creates its internal `HTTPTransport` with default limits