Conversation
The agent control plane runs as a private rathole service entry per machine — same shape as the existing machine-ssh tunnel but bound to 127.0.0.1 on both sides. Until now it was hidden from the dashboard, making it confusing when operators noticed an extra port on the VPS. Synthesizes a Tunnel row with Kind: "machine-agent" alongside the existing machine-ssh entry whenever the machine has agent fields populated. Status comes from AgentLastSeen freshness (active within two missed health polls = active, otherwise offline; pending before the first poll lands). Tunnels page sort now pins management entries (SSH, then agent) to the top of each machine's group so user tunnels stay together below them. Managed=true is already in place so the existing protected- tunnel guard prevents accidental deletion.
…up + notifies server Three things I missed in the agent rollout: 1. Bootstrap.sh was still installing the agent as User=$SSH_USER with no gopher system user and no /etc/sudoers.d/gopher entry. Migrate.sh on existing machines created the gopher user with NOPASSWD: ALL, but fresh bootstraps ended up in a different shape — inconsistent. Bootstrap now mirrors migrate.sh exactly: creates the gopher user, writes the sudoers rule, chowns /etc/rathole/client.toml + the agent config to gopher, and installs the systemd unit with User=gopher. 2. gopher-uninstall.sh had leftover "user-mode" cleanup (kill ~/bin/gopher- agent, strip user crontab) from an abandoned design path, AND it never removed the gopher system user. Stripped the dead code; added userdel gopher AFTER the agent service is stopped; sudoers cleanup runs last so prior steps could still use sudo. 3. When an operator runs gopher-uninstall locally on a client, the dashboard's machine list went stale — there was no callback to delete the server-side record. Now the script POSTs to /api/machines/self- delete with its agent token before tearing down. The endpoint resolves the token to a machine via db.GetMachineByAgentToken and calls MachineService.DeleteFromClient — a new variant of Delete that skips the remote-uninstall step (we're already running it locally) but still does server-side cleanup (tunnels, Caddy, rathole reconcile, machine row). The notification is best-effort: failures (no curl, expired DNS, server unreachable) leave the local cleanup proceeding normally — the operator can still delete from the dashboard manually.
You're right: the dashboard kept reporting machines as "connected" even after rathole-client was removed, as long as the gopher binary was alive. The agent's status endpoint already reports rathole.Active separately, but checkViaAgent's only response to "agent up, rathole down" was to log a failed health check and try recovery — it never updated machine.Status. Combined with monitor.go skipping agent-installed machines, nothing flipped the status off "connected" from the last good poll. Now when the agent answers but reports rathole inactive, we explicitly flip the machine to offline via a new SetMachineAgentDegraded helper. AgentLastSeen still updates (the back-channel works), but Status reflects the actual tunnel-serving capability. Synthesized SSH tunnel rows derive from machine.Status and will correctly show offline once this lands. The tunnel-status path (monitor.go's checkTunnels) already TCP-probes rathole bind ports independently, so per-tunnel statuses were correct — this fix is specifically about machine.Status.
…ootstrap grace window Three real bugs from the same agent rollout: 1. The agent's systemd unit defaulted to KillMode=control-group, which means systemctl-stopping gopher-agent kills its entire cgroup — including the detached gopher-uninstall worker spawned from POST /uninstall. The script gets murdered partway through cleanup. Both bootstrap.sh and migrate.sh now set KillMode=process so only the main agent dies and children continue. This also explains why "delete machine" on the dashboard appeared to do nothing on the client side: the cleanup STARTED but got killed before it could finish (or before it got to the self-rm line, which is why gopher-uninstall didn't delete itself either). 2. gopher-uninstall.sh's self-destruct line used plain `rm -f` instead of `$SUDO rm -f`. When invoked as root via `sudo gopher-uninstall` that worked, but if the script ever ran without sudo elevation (or got partially killed before reaching it) the binary survived. Added $SUDO and moved it BEFORE the sudoers cleanup so the privilege is still in scope. 3. The migration banner showed "agent isn't set up" for machines that were freshly bootstrapped — agent_installed=false until the first successful health poll (~60s after bootstrap). MachinesWithoutAgent now excludes machines under 10 minutes old. Bootstrap inline-installs the agent + the health service polls every 60s, so any machine still missing the agent flag after 10 minutes is a real problem; before that, it's just installation latency.
NextSSHTunnelPort() and NextRatholePort() were line-for-line identical (both walked allUsedPorts() from 1024 looking for the first gap), and bootstrap.go called them back-to-back with no DB write in between — so they returned the same port. The Machine row ended up with TunnelPort == AgentRemotePort, rathole-server tried to bind two services to the same address, and the back-channel was permanently broken on every freshly bootstrapped machine. NextRatholePort now takes a variadic excluding list. Bootstrap passes the SSH tunnel port to the second call so it can't be reused for the agent. NextSSHTunnelPort is removed — it was a duplicate name for the same function, and consolidating prevents this footgun from coming back later. Added a regression test that fails on the old behavior.
Both pages had refetchInterval set to false, so changes driven by the health service (60s poll loop) and monitor (30s TCP probes) only became visible after a manual refresh. Network Map already polled at 30s; now the rest of the dashboard matches. Machines page keeps its 3s burst-refresh during bootstrap-waiting so the "machine registered!" success state still flips fast — only the steady-state behavior changes from "static" to "30s".
Bumped steady-state refresh on Machines + Tunnels pages from 30s to 15s per request, plus a 5s middle tier on the Machines page while any machine is fresh (created < 5 min ago) or still in "pending" status. The post-bootstrap window is exactly when status flips happen fastest — rathole connecting, agent installing inline, first health poll landing — so 5s polling there means the operator sees the machine go pending → connected → agent-installed without manual refresh. refetchInterval is computed via the function form of react-query so the cadence self-adjusts: once every machine has settled and aged past 5 minutes, polling drops back to 15s automatically. No timers, no state, just a derived rate from the current data.
|
Unit tests run: 275 |
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 32478099 | Triggered | Generic Password | 89b8243 | internal/api/handlers/templates/bootstrap.sh | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Resolves #__
Supports #84
Changes Made
Testing
Screenshots (if applicable)