Wire server discovery protocol into thv serve#4319
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
36bc774 to
7fdf8fa
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4319 +/- ##
==========================================
- Coverage 69.55% 69.46% -0.10%
==========================================
Files 480 483 +3
Lines 48837 49048 +211
==========================================
+ Hits 33971 34070 +99
- Misses 12249 12345 +96
- Partials 2617 2633 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7fdf8fa to
c166dd9
Compare
c166dd9 to
c6e6ece
Compare
c6e6ece to
6948298
Compare
aponcedeleonch
left a comment
There was a problem hiding this comment.
Nice work wiring in the discovery protocol. The migration path is seamless (no server.json existed before, so StateNotFound handles all upgrade scenarios cleanly), and the rollback edge case also works correctly (nonce mismatch triggers overwrite). A few non-blocking observations below, mostly around multi-process safety and context propagation.
6948298 to
e3196be
Compare
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
e3196be to
4ca07b2
Compare
aponcedeleonch
left a comment
There was a problem hiding this comment.
Code reviewed against PR description. The implementation looks solid — discovery protocol is well-structured with good security considerations (symlink rejection, loopback validation, constant-time nonce comparison, directory permission tightening, startup guard).
Mismatches between PR description and actual code:
-
SIGTERM handling not in this PR: The Summary and Changes table claim
cmd/thv/app/server.gowas changed to add SIGTERM signal handling, but this file is not modified in the diff. This change was likely landed separately (possibly in #4327 based on the recent commit history:5769a5d2 Gracefully handle SIGTERM in thv HTTP server). -
Shutdown timeout not added by this PR: The description claims "30-second shutdown timeout (was unbounded)" as a fix, but
shutdownTimeout = 30 * time.Secondappears to be pre-existing code, not introduced here. -
Discovery vs env var priority is described backwards: The Summary says the skills client "auto-discovers the server via the discovery file before falling back to
TOOLHIVE_API_URL", but the actual code inpkg/skills/client/client.gochecksTOOLHIVE_API_URLfirst (explicit override wins), then tries discovery, then falls back tolocalhost:8080. TheNewDefaultClientgodoc and the numbered resolution logic are correct — only the Summary bullet inverts the priority.
These are description-only issues; the code itself is correct and well-implemented.
4ca07b2 to
fc8052c
Compare
blkt
left a comment
There was a problem hiding this comment.
I left a couple questions, but overall looks good to me.
fc8052c to
23ec5ae
Compare
23ec5ae to
bbb5f65
Compare
The discovery package (pkg/server/discovery/) was already implemented and tested but had zero imports in the codebase. This wires it into the serve command so clients (CLI, Studio) can auto-discover a running server without hardcoded ports or environment variables. On startup, thv serve now generates a cryptographic nonce, writes a discovery file to $XDG_STATE_HOME/toolhive/server/server.json with the actual listen URL (supporting port 0 and Unix sockets), and returns the nonce via the X-Toolhive-Nonce health check header. On shutdown the file is removed. The skills client now tries discovery before falling back to the TOOLHIVE_API_URL env var or the default localhost:8080, with loopback and socket-path validation on discovered URLs. Additional fixes: SIGTERM handling in the serve command, a 30-second shutdown timeout (was unbounded), symlink rejection on the discovery file read path, directory permission tightening after MkdirAll, and constant-time nonce comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Juan Antonio Osorio <ozz@stacklok.com>
- Wrap writeDiscoveryFile check-then-write in WithFileLock to prevent TOCTOU race when two servers start simultaneously - Log FindProcess errors at Debug level instead of silently discarding - Consolidate ListenURL tests into a table-driven test - Rename healtcheck_test.go to healthcheck_test.go (fix typo) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Juan Antonio Osorio <ozz@stacklok.com>
The discovery lock file is created in the same directory as server.json, but the directory may not exist on a fresh system. MkdirAll was called inside the lock callback (via WriteServerInfo), but the lock acquisition itself needs the directory to already exist. Create the directory before calling WithFileLock so the lock file can be written. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract shared HTTPClientForURL in the discovery package to deduplicate transport setup between health.go and the skills client - Propagate context.Context through NewDefaultClient and resolveViaDiscovery instead of using context.Background() - Add comment explaining intentional opts-shadowing order so caller-supplied options can override discovery defaults - Use url.JoinPath in buildHealthClient instead of string concatenation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bbb5f65 to
3de135a
Compare
Summary
RFC-0034 proposes a long-running local server architecture where
thv serveadvertises itself via a discovery file so clients (CLI, Studio) can find it without hardcoded ports. The discovery package (pkg/server/discovery/) was already fully implemented and tested but had zero imports anywhere in the codebase. This PR wires it in, giving us a working discovery protocol that immediately benefits Studio and the skills CLI client.$XDG_STATE_HOME/toolhive/server/server.json) on startup with the actual listen URL, PID, and a cryptographic nonce, and removes it on shutdown/healthendpoint returns the nonce viaX-Toolhive-Nonceso clients can verify server instance identity (prevents PID-reuse confusion)TOOLHIVE_API_URLorlocalhost:8080, with loopback and socket-path validation on discovered URLsType of change
Test plan
task test)task lint-fix)Manual testing: run
thv serve, verifyserver.jsonappears with correct URL/PID/nonce,curl -v /healthshowsX-Toolhive-Nonceheader, stop server and verify file is removed, start a second server while first is running and verify it refuses.Changes
pkg/server/discovery/*.gopkg/api/server.goWithNonce()builder,ListenURL(), discovery write/remove in lifecycle, startup guard, shutdown timeoutpkg/api/v1/healthcheck.goX-Toolhive-Nonceheader on 204 responsescmd/thv/app/server.goSIGTERMto signal handlingpkg/skills/client/client.gopkg/api/server_test.gogenerateNonce,ListenURL(TCP + Unix)pkg/api/v1/healtcheck_test.gopkg/server/discovery/*_test.goDoes this introduce a user-facing change?
thv servenow writes a discovery file that allows other tools (Studio, skills CLI) to automatically find the running server without requiringTOOLHIVE_API_URLor knowing the port. This is transparent to users -- existing behavior is preserved as a fallback.Special notes for reviewers
pkg/server/discovery/package was previously implemented and tested in a separate effort but never wired in. This PR includes it as new files since it had no prior commits.thv server start/stop/statussubcommands, CLI auto-start, lock file singleton, SSE events, Studio integration.Large PR Justification
Generated with Claude Code