fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes#27380
Conversation
Greptile SummaryFixed custom provider verification timeouts for local LLM servers by increasing timeout from 10s to 30s and reducing verification token generation from 1024 to 1.
The changes are minimal, well-targeted, and properly address the reported issue where users with functional local LLM servers couldn't complete onboarding due to premature timeout. The trade-off (users wait longer for truly unreachable servers) is acknowledged and reasonable. Confidence Score: 5/5
Last reviewed commit: 83b6ef0 |
22beb88 to
967d51f
Compare
df90a05 to
967d51f
Compare
… custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor
967d51f to
606f9d8
Compare
|
Thanks everyone for the iteration here. This PR now consolidates the onboarding verification reliability fix for custom providers. What’s included:
Validation:
If this looks good, this should be the one we merge. Thank you all for the prior PRs and review cycles. |
|
Merged, thank you everyone. Post-merge notes:
Appreciate all the iterations and review help here. |
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com> (cherry picked from commit c6cd434)
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com> (cherry picked from commit c6cd434)
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com> (cherry picked from commit c6cd434)
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
… custom provider probes (openclaw#27380) * fix(onboard): increase verification timeout and reduce max_tokens for custom provider probes The onboard wizard sends a chat-completion request to verify custom providers. With max_tokens: 1024 and a 10 s timeout, large local models (e.g. Qwen3.5-27B on llama.cpp) routinely time out because the server needs to load the model and generate up to 1024 tokens before responding. Changes: - Raise VERIFY_TIMEOUT_MS from 10 s to 30 s - Lower max_tokens from 1024 to 1 (verification only needs a single token to confirm the API is reachable and the model ID is valid) - Add explicit stream: false to both OpenAI and Anthropic probes Closes openclaw#27346 Made-with: Cursor * Changelog: note custom-provider onboarding verification fix --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
Summary
src/commands/onboard-custom.ts— raisedVERIFY_TIMEOUT_MSfrom 10s to 30s, reducedmax_tokensfrom 1024 to 1 (verification only needs a single token), added explicitstream: falseto both OpenAI and Anthropic probes.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Security Impact (required)
Repro + Verification
Environment
Steps
Expected
Actual
Evidence
Root cause:
All 14 tests pass after updating test expectations.
Human Verification (required)
Compatibility / Migration
Failure Recovery (if this breaks)
Risks and Mitigations