Skip to content

Registry API returns 500 instead of 503 when upstream registry URL is misconfigured #4351

@peppescg

Description

@peppescg

Summary

When the configured registry API URL is wrong (e.g., https://toolhive-registry.stacklok.dev/ instead of https://toolhive-registry.stacklok.dev/registry/toolhive), the GET endpoints for /api/v1beta/registry, /api/v1beta/registry/{name}, and /api/v1beta/registry/{name}/servers return 500 Internal Server Error with a generic plain-text message "Failed to get registry".

This is misleading — it's not an internal server error but an upstream dependency issue. Clients (like ToolHive Studio) cannot distinguish this from a real bug.

Steps to reproduce

  1. Start thv serve
  2. Set a wrong registry API URL:
curl -s -X PUT http://localhost:8181/api/v1beta/registry/default \
  -H 'Content-Type: application/json' \
  -d '{"api_url": "https://toolhive-registry.stacklok.dev/", "allow_private_ip": true}'
  1. Query the registry:
curl -s -w '\nHTTP Status: %{http_code}\n' http://localhost:8181/api/v1beta/registry

Actual result: 500 Internal Server Error with plain text "Failed to get registry"

Expected result: 503 Service Unavailable with a structured JSON response containing the upstream error details.

Root cause

In pkg/registry/provider_api.go, GetRegistry() only checks for auth errors (ErrRegistryAuthRequired, ErrRegistryUnauthorized). All other errors from the upstream registry (404, timeout, connection refused, DNS failure) fall through to a generic fmt.Errorf that the HTTP handlers in pkg/api/v1/registry.go translate into a 500.

This also affects NewAPIRegistryProvider when tokenSource == nil — the validation probe failure produces a generic error that getCurrentProvider maps to 500.

Note: when auth is configured (tokenSource != nil), the validation probe is skipped entirely during provider creation, so the error only surfaces at GetRegistry() time.

Proposed fix

  • Introduce a RegistryUnavailableError type in pkg/registry/ for non-auth upstream failures
  • Wrap non-auth ListServers errors with RegistryUnavailableError in both GetRegistry() and NewAPIRegistryProvider
  • In the HTTP handlers, check for RegistryUnavailableError and return 503 with a structured JSON response:
{
  "code": "registry_unavailable",
  "message": "upstream registry at https://toolhive-registry.stacklok.dev/ is unavailable: registry API returned status 404 for https://toolhive-registry.stacklok.dev/v0.1/servers?limit=100&version=latest: 404 page not found"
}

This follows the same pattern as the existing registry_auth_required 503 response, allowing clients to distinguish between auth issues and upstream availability issues.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggoPull requests that update go coderegistry

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions