Skip to content

server : fix proxy client timeout in router mode#22003

Merged
ServeurpersoCom merged 1 commit into
ggml-org:masterfrom
xris99:fix/proxy-timeout-router-mode
Apr 21, 2026
Merged

server : fix proxy client timeout in router mode#22003
ServeurpersoCom merged 1 commit into
ggml-org:masterfrom
xris99:fix/proxy-timeout-router-mode

Conversation

@xris99

@xris99 xris99 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Problem

In router mode (--models-dir / --models-preset), the internal HTTP proxy client in server_http_proxy had two bugs that caused http client error: Failed to read connection errors:

  1. Hardcoded 5-second connection timeout — ignored the --timeout parameter entirely, causing failures when child model workers take longer than 5 seconds to accept the connection (common for large CPU-bound models).
  2. Swapped read/write timeoutsset_write_timeout was fed timeout_read and set_read_timeout was fed timeout_write. The original comment acknowledged this with "reversed for cli (client) vs srv (server)" but the reasoning was wrong — the names mean the same thing from both sides.

This made router mode unusable for large models (70B+) on CPU where a single request can take 30 minutes or more.

Fix

Three-line change in tools/server/server-models.cpp:

  • Connection timeout now uses timeout_read (same as the read timeout) instead of hardcoded 5 s
  • set_read_timeouttimeout_read
  • set_write_timeouttimeout_write

This matches exactly how server-http.cpp (lines 115–116) sets timeouts for the non-router path, making --timeout consistent across both modes.

Testing

Starting server in router mode with a large CPU model and --timeout 36000 no longer produces Failed to read connection after the default httplib timeout.

Fixes #18760

@xris99 xris99 requested a review from a team as a code owner April 16, 2026 16:02
@ggml-gh-bot

ggml-gh-bot Bot commented Apr 16, 2026

Copy link
Copy Markdown

Hi @xris99, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Comment on lines -1150 to -1151
cli->set_write_timeout(timeout_read, 0); // reversed for cli (client) vs srv (server)
cli->set_read_timeout(timeout_write, 0);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is supposed to be reversed, right? ref the comment that you deleted

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is an incorrect reasoning and the timeouts are swapped wrongly. From the httplib client's perspective, read = reading the response, write = sending the request body, which maps directly to the server-side meanings. In consequence the timeouts must not be swapped.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the timeout_read/write is read from CLI arg which is intended to be set to server. have you acknowledge where this params come from?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm,... this actually does not make sense at all and could be significantly simplified.
timeout_read and timeout_write are always identical. The --timeout CLI arg sets both to the same value (arg.cpp l2960–2961), defaults are also 600 for both. They cannot be set individually. Swapping them has zero effect; never matteres. I can revert and insert the swap again, but it has zero effect. Just tried to clean up a bit

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should revert the comment too (in case someone else trying to do the same change here)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. reverted completely incl. the comment. Sorry for that. Updated with last commit. single line change now for the router timeout. (the actual topic)

@xris99 xris99 force-pushed the fix/proxy-timeout-router-mode branch from 43c0d1f to fd19ccb Compare April 16, 2026 19:03
@xris99

xris99 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

You are right, I concede the point on the swap. timeout_read and timeout_write are always identical — --timeout sets both to the same value (arg.cpp l2960–2961) and there is no way to set them individually. Swapping them has zero practical effect and never mattered. I tried to clean it up but the reorder is noise. Amended the commit to revert the swap and updated the commit message accordingly — the only real fix is replacing the hardcoded 5-second connection timeout with timeout_read.

@xris99 xris99 force-pushed the fix/proxy-timeout-router-mode branch from fd19ccb to d664234 Compare April 17, 2026 07:33

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ServeurpersoCom can you merge this? thanks!

@ServeurpersoCom ServeurpersoCom merged commit ff6b106 into ggml-org:master Apr 21, 2026
1 check passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: proxy timeout in router mode ignores --timeout parameter

3 participants