TL;DR
Gopher's biggest architectural gap isn't tunnel internals — it's that the VPS has no direct channel to client machines for anything beyond bootstrap. Status comes from rathole's own connection, config updates require SSH, logs aren't streamed, and there's no place to put health checks or self-heal. A small gopher-agent running on each client (gRPC) is the natural fix.
This is not a v1.0 deliverable. The actionable part for now is shaping the bootstrap flow + Machine model so an agent can drop in later additively.
What's missing today
The current control path is:
VPS ──SSH (bootstrap only)──► Client
VPS ◄──rathole tunnel data───► Client (no control plane on top)
Concretely that means:
- "Is the client healthy?" → only signal is whether the rathole tunnel is up. CPU/disk/memory pressure on the client is invisible.
- "Push a new rathole config" → requires SSH back into the client. Re-uses the bootstrap key, re-prompts for sudo, brittle.
- "Stream rathole logs to the dashboard" → not possible; logs only exist on the client, dashboard would need to SSH+
journalctl -f.
- "Restart rathole because it crashed" → no remote affordance. User has to log in.
Each of those is its own tracked issue (#12, #53, #42, #79); a gopher-agent is the common substrate that lets all of them be implemented cleanly instead of as four bespoke side-channels.
Proposed shape
gopher-agent is a small daemon that ships alongside rathole-client, registers a gRPC server on a known local port, and is reached from the VPS through the existing rathole tunnel (so no new firewall rules, no public listener on the client). The VPS opens a gRPC connection over the tunnel back-channel.
A compact proto sketch — full schema is implementation work, this is just the shape:
service Agent {
rpc GetStatus(GetStatusRequest) returns (GetStatusResponse);
rpc StreamMetrics(StreamMetricsRequest) returns (stream Metric);
rpc UpdateRatholeConfig(UpdateConfigRequest) returns (UpdateConfigResponse);
rpc RestartRathole(RestartRequest) returns (RestartResponse);
rpc GetLogs(GetLogsRequest) returns (stream LogEntry);
rpc RunDiagnostics(DiagnosticsRequest) returns (DiagnosticsResponse);
}
The four things that earn their keep:
- Status — one round-trip returns
{cpu, mem, disk, rathole_running, active_tunnels}. Replaces the rathole-up/down approximation we have today.
- Config push — VPS writes a new
client.toml directly via the agent + signals reload. Removes SSH from the steady-state hot path; SSH stays only for first-bootstrap.
- Streaming logs / metrics — dashboard can subscribe instead of polling
/api/tunnels every N seconds. Lower server load, real-time UX.
- Diagnostics — typed
RunDiagnostics() returns structured pass/fail across "can reach VPS", "rathole config valid", "ports open" — feeds the "Diagnose" affordance instead of asking users to read logs.
Self-healing falls out for free once status + restart exist:
if !status.Rathole.Running { agent.RestartRathole() }
What to NOT lock out today
The only near-term cost of this design note is keeping a few doors open so the agent is additive when it lands:
- Bootstrap script structure — keep
templates/bootstrap.sh shaped so adding a "fetch + install gopher-agent" step is a single insertion, not a rewrite. The script already installs rathole-client + a systemd unit; the agent is the same shape.
- Machine model — leave room for an
agent_addr / agent_status column. Don't need to add it now; just don't paint into a corner that would require a destructive migration to introduce it.
- Status reporting plumbing — when
MachineService.Status() is touched, prefer one funnel that returns (rathole_status, optional system_status) rather than scattering "is rathole up" checks across handlers. Makes "now also surface agent-derived data" a one-place change.
That's it. No protobuf in the tree yet, no agent binary yet.
Out of scope (record + park)
- Splitting the gopher-server binary into microservices (
ConfigService / TunnelService / MonitoringService / etc.). Different conversation; only worth having when scale demands it. Mentioning so future-me doesn't conflate it with this issue.
- Replacing the public REST API with gRPC. Browsers and integrators expect REST. Public surface stays REST; gRPC is the internal control transport.
- gRPC-web for the dashboard. Live updates from VPS → browser are the natural next layer once the agent → VPS streaming exists, but the transport choice (gRPC-web vs. WebSocket vs. SSE) is its own decision and the simpler answer (SSE/WebSocket from VPS) is probably right.
Phasing
Soft phasing, not commitments:
- Now (v1.x) — keep doing things the current way; design the touch points above so they don't preclude an agent.
- When
#12, #53, #79 start coming due — that's the inflection point. Their cleanest implementation is the agent; doing each as a one-off side-channel is throw-away work.
- Later — agent is required, SSH narrows to first-bootstrap only.
Related
#12 — Health monitoring & auto-recovery. Most direct beneficiary; the agent IS the substrate for this.
#53 — Tunnel latency & network diagnostics. RunDiagnostics() covers it.
#42 — Caddy / rathole logs. GetLogs(stream) covers it.
#79 — Performance tracking & resource alerts. StreamMetrics covers it.
#78 — QUIC-based tunneling eval. Adjacent transport-layer rethink; coordinate the two so we don't relitigate the back-channel twice.
TL;DR
Gopher's biggest architectural gap isn't tunnel internals — it's that the VPS has no direct channel to client machines for anything beyond bootstrap. Status comes from rathole's own connection, config updates require SSH, logs aren't streamed, and there's no place to put health checks or self-heal. A small gopher-agent running on each client (gRPC) is the natural fix.
This is not a v1.0 deliverable. The actionable part for now is shaping the bootstrap flow + Machine model so an agent can drop in later additively.
What's missing today
The current control path is:
Concretely that means:
journalctl -f.Each of those is its own tracked issue (
#12,#53,#42,#79); agopher-agentis the common substrate that lets all of them be implemented cleanly instead of as four bespoke side-channels.Proposed shape
gopher-agentis a small daemon that ships alongsiderathole-client, registers a gRPC server on a known local port, and is reached from the VPS through the existing rathole tunnel (so no new firewall rules, no public listener on the client). The VPS opens a gRPC connection over the tunnel back-channel.A compact proto sketch — full schema is implementation work, this is just the shape:
The four things that earn their keep:
{cpu, mem, disk, rathole_running, active_tunnels}. Replaces the rathole-up/down approximation we have today.client.tomldirectly via the agent + signals reload. Removes SSH from the steady-state hot path; SSH stays only for first-bootstrap./api/tunnelsevery N seconds. Lower server load, real-time UX.RunDiagnostics()returns structured pass/fail across "can reach VPS", "rathole config valid", "ports open" — feeds the "Diagnose" affordance instead of asking users to read logs.Self-healing falls out for free once status + restart exist:
What to NOT lock out today
The only near-term cost of this design note is keeping a few doors open so the agent is additive when it lands:
templates/bootstrap.shshaped so adding a "fetch + install gopher-agent" step is a single insertion, not a rewrite. The script already installs rathole-client + a systemd unit; the agent is the same shape.agent_addr/agent_statuscolumn. Don't need to add it now; just don't paint into a corner that would require a destructive migration to introduce it.MachineService.Status()is touched, prefer one funnel that returns(rathole_status, optional system_status)rather than scattering "is rathole up" checks across handlers. Makes "now also surface agent-derived data" a one-place change.That's it. No protobuf in the tree yet, no agent binary yet.
Out of scope (record + park)
ConfigService/TunnelService/MonitoringService/ etc.). Different conversation; only worth having when scale demands it. Mentioning so future-me doesn't conflate it with this issue.Phasing
Soft phasing, not commitments:
#12,#53,#79start coming due — that's the inflection point. Their cleanest implementation is the agent; doing each as a one-off side-channel is throw-away work.Related
#12— Health monitoring & auto-recovery. Most direct beneficiary; the agent IS the substrate for this.#53— Tunnel latency & network diagnostics.RunDiagnostics()covers it.#42— Caddy / rathole logs.GetLogs(stream)covers it.#79— Performance tracking & resource alerts.StreamMetricscovers it.#78— QUIC-based tunneling eval. Adjacent transport-layer rethink; coordinate the two so we don't relitigate the back-channel twice.