-
Notifications
You must be signed in to change notification settings - Fork 198
Registry API returns 500 instead of 503 when upstream registry URL is misconfigured #4351
Description
Summary
When the configured registry API URL is wrong (e.g., https://toolhive-registry.stacklok.dev/ instead of https://toolhive-registry.stacklok.dev/registry/toolhive), the GET endpoints for /api/v1beta/registry, /api/v1beta/registry/{name}, and /api/v1beta/registry/{name}/servers return 500 Internal Server Error with a generic plain-text message "Failed to get registry".
This is misleading — it's not an internal server error but an upstream dependency issue. Clients (like ToolHive Studio) cannot distinguish this from a real bug.
Steps to reproduce
- Start
thv serve - Set a wrong registry API URL:
curl -s -X PUT http://localhost:8181/api/v1beta/registry/default \
-H 'Content-Type: application/json' \
-d '{"api_url": "https://toolhive-registry.stacklok.dev/", "allow_private_ip": true}'- Query the registry:
curl -s -w '\nHTTP Status: %{http_code}\n' http://localhost:8181/api/v1beta/registryActual result: 500 Internal Server Error with plain text "Failed to get registry"
Expected result: 503 Service Unavailable with a structured JSON response containing the upstream error details.
Root cause
In pkg/registry/provider_api.go, GetRegistry() only checks for auth errors (ErrRegistryAuthRequired, ErrRegistryUnauthorized). All other errors from the upstream registry (404, timeout, connection refused, DNS failure) fall through to a generic fmt.Errorf that the HTTP handlers in pkg/api/v1/registry.go translate into a 500.
This also affects NewAPIRegistryProvider when tokenSource == nil — the validation probe failure produces a generic error that getCurrentProvider maps to 500.
Note: when auth is configured (tokenSource != nil), the validation probe is skipped entirely during provider creation, so the error only surfaces at GetRegistry() time.
Proposed fix
- Introduce a
RegistryUnavailableErrortype inpkg/registry/for non-auth upstream failures - Wrap non-auth
ListServerserrors withRegistryUnavailableErrorin bothGetRegistry()andNewAPIRegistryProvider - In the HTTP handlers, check for
RegistryUnavailableErrorand return 503 with a structured JSON response:
{
"code": "registry_unavailable",
"message": "upstream registry at https://toolhive-registry.stacklok.dev/ is unavailable: registry API returned status 404 for https://toolhive-registry.stacklok.dev/v0.1/servers?limit=100&version=latest: 404 page not found"
}This follows the same pattern as the existing registry_auth_required 503 response, allowing clients to distinguish between auth issues and upstream availability issues.