Skip to content

server: proxy timeout in router mode ignores --timeout parameter #18760

@itiu

Description

@itiu

Bug Description

In multi-model router mode (--models-dir), the internal HTTP proxy client uses hardcoded timeout values instead of respecting the --timeout CLI parameter. This causes "Failed to read connection" errors when processing large prompts with slow models.

Steps to Reproduce

  1. Start server in router mode:
llama-server --models-dir ./models --timeout 36000 --port 8087
  1. Send a request to a large model (e.g., 70B+ parameters on CPU)

  2. Observe error after ~5 minutes:

http client error: Failed to read connection

Expected Behavior

The proxy should respect --timeout parameter, same as single-model mode (-m model.gguf).

Actual Behavior

Proxy uses hardcoded timeout:

  • Connection timeout: 200ms
  • Read/Write timeout: httplib default (~300 seconds)

This makes router mode unusable for large CPU models where a single request can take several hours.

Environment

  • Large models (70B-235B parameters) running on CPU
  • Long context prompts (10k+ tokens)
  • Request processing time: 30 minutes to several hours

Suggested Fix

In tools/server/server-models.cpp, the server_http_proxy constructor should receive and use timeout_read from base_params instead of hardcoded values:

// Current code:
cli->set_connection_timeout(0, 200000); // 200 milliseconds

// Should be:
cli->set_connection_timeout(timeout_seconds);
cli->set_read_timeout(timeout_seconds);
cli->set_write_timeout(timeout_seconds);

Workaround

Use single-model mode (-m model.gguf) instead of router mode (--models-dir).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions