Design note: gopher-agent on client machines (gRPC) — replaces SSH for ongoing ops, post-v1.0

## TL;DR

Gopher's biggest architectural gap isn't tunnel internals — it's that the VPS has **no direct channel to client machines** for anything beyond bootstrap. Status comes from rathole's own connection, config updates require SSH, logs aren't streamed, and there's no place to put health checks or self-heal. A small **gopher-agent** running on each client (gRPC) is the natural fix.

This is **not** a v1.0 deliverable. The actionable part for now is shaping the bootstrap flow + Machine model so an agent can drop in later additively.

## What's missing today

The current control path is:

```
VPS ──SSH (bootstrap only)──► Client
VPS ◄──rathole tunnel data───► Client     (no control plane on top)
```

Concretely that means:

- "Is the client healthy?" → only signal is whether the rathole tunnel is up. CPU/disk/memory pressure on the client is invisible.
- "Push a new rathole config" → requires SSH back into the client. Re-uses the bootstrap key, re-prompts for sudo, brittle.
- "Stream rathole logs to the dashboard" → not possible; logs only exist on the client, dashboard would need to SSH+`journalctl -f`.
- "Restart rathole because it crashed" → no remote affordance. User has to log in.

Each of those is its own tracked issue (`#12`, `#53`, `#42`, `#79`); a `gopher-agent` is the common substrate that lets all of them be implemented cleanly instead of as four bespoke side-channels.

## Proposed shape

`gopher-agent` is a small daemon that ships alongside `rathole-client`, registers a gRPC server on a known local port, and is reached from the VPS **through the existing rathole tunnel** (so no new firewall rules, no public listener on the client). The VPS opens a gRPC connection over the tunnel back-channel.

A compact proto sketch — full schema is implementation work, this is just the shape:

```protobuf
service Agent {
  rpc GetStatus(GetStatusRequest)              returns (GetStatusResponse);
  rpc StreamMetrics(StreamMetricsRequest)      returns (stream Metric);
  rpc UpdateRatholeConfig(UpdateConfigRequest) returns (UpdateConfigResponse);
  rpc RestartRathole(RestartRequest)           returns (RestartResponse);
  rpc GetLogs(GetLogsRequest)                  returns (stream LogEntry);
  rpc RunDiagnostics(DiagnosticsRequest)       returns (DiagnosticsResponse);
}
```

The four things that earn their keep:

1. **Status** — one round-trip returns `{cpu, mem, disk, rathole_running, active_tunnels}`. Replaces the rathole-up/down approximation we have today.
2. **Config push** — VPS writes a new `client.toml` directly via the agent + signals reload. Removes SSH from the steady-state hot path; SSH stays only for first-bootstrap.
3. **Streaming logs / metrics** — dashboard can subscribe instead of polling `/api/tunnels` every N seconds. Lower server load, real-time UX.
4. **Diagnostics** — typed `RunDiagnostics()` returns structured pass/fail across "can reach VPS", "rathole config valid", "ports open" — feeds the "Diagnose" affordance instead of asking users to read logs.

Self-healing falls out for free once status + restart exist:

```go
if !status.Rathole.Running { agent.RestartRathole() }
```

## What to NOT lock out today

The only near-term cost of this design note is keeping a few doors open so the agent is additive when it lands:

- **Bootstrap script structure** — keep `templates/bootstrap.sh` shaped so adding a "fetch + install gopher-agent" step is a single insertion, not a rewrite. The script already installs rathole-client + a systemd unit; the agent is the same shape.
- **Machine model** — leave room for an `agent_addr` / `agent_status` column. Don't need to add it now; just don't paint into a corner that would require a destructive migration to introduce it.
- **Status reporting plumbing** — when `MachineService.Status()` is touched, prefer one funnel that returns `(rathole_status, optional system_status)` rather than scattering "is rathole up" checks across handlers. Makes "now also surface agent-derived data" a one-place change.

That's it. No protobuf in the tree yet, no agent binary yet.

## Out of scope (record + park)

- **Splitting the gopher-server binary into microservices** (`ConfigService` / `TunnelService` / `MonitoringService` / etc.). Different conversation; only worth having when scale demands it. Mentioning so future-me doesn't conflate it with this issue.
- **Replacing the public REST API with gRPC.** Browsers and integrators expect REST. Public surface stays REST; gRPC is the *internal* control transport.
- **gRPC-web** for the dashboard. Live updates from VPS → browser are the natural next layer once the agent → VPS streaming exists, but the transport choice (gRPC-web vs. WebSocket vs. SSE) is its own decision and the simpler answer (SSE/WebSocket from VPS) is probably right.

## Phasing

Soft phasing, not commitments:

- **Now (v1.x)** — keep doing things the current way; design the touch points above so they don't preclude an agent.
- **When `#12`, `#53`, `#79` start coming due** — that's the inflection point. Their cleanest implementation is the agent; doing each as a one-off side-channel is throw-away work.
- **Later** — agent is required, SSH narrows to first-bootstrap only.

## Related

- `#12` — Health monitoring & auto-recovery. Most direct beneficiary; the agent IS the substrate for this.
- `#53` — Tunnel latency & network diagnostics. `RunDiagnostics()` covers it.
- `#42` — Caddy / rathole logs. `GetLogs(stream)` covers it.
- `#79` — Performance tracking & resource alerts. `StreamMetrics` covers it.
- `#78` — QUIC-based tunneling eval. Adjacent transport-layer rethink; coordinate the two so we don't relitigate the back-channel twice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design note: gopher-agent on client machines (gRPC) — replaces SSH for ongoing ops, post-v1.0 #84

TL;DR

What's missing today

Proposed shape

What to NOT lock out today

Out of scope (record + park)

Phasing

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design note: gopher-agent on client machines (gRPC) — replaces SSH for ongoing ops, post-v1.0 #84

Description

TL;DR

What's missing today

Proposed shape

What to NOT lock out today

Out of scope (record + park)

Phasing

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions