-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
What is the issue?
One of the primary uses my organization relies on Tailscale for is to connect to our production Postgres database from our employee laptops. We commonly use psql and a MacOS GUI named TablePlus. Our database runs inside Render, where we run a subnet router (currently version 1.22.2), I believe in userspace mode (but I might be making that up—the deployment is based on this)
Twice now, we've encountered issues with Tailscale-mediated connections to the database piling up, resulting in us saturating the maximum number of connections the database allows. The number of "extra" connections is far in excess of the number of connections we expect, and has the classic "sawtooth" pattern of a resource leak:
Our Postgres server is configured to use TCP keepalives to detect idle clients that have disappeared. I forget the exact settings, but they're pretty reasonable (~10m of inactivity, 5 probes every minute or similar). The leaked connections are all in state idle (and not idle in transaction or similar), so I'd expect the TCP keepalives to be an effective solution on an "ordinary" network.
Frustratingly for both of us, I'm not certain this is a Tailscale bug! But given the centrality of networking to my understanding of the problem, I figured I'd open a ticket anyways and see if you folks have any ideas (or have heard similar rumblings from other customers). As always, happy to provide more detail if needed. Given the slow speed of the leak, I obviously don't have much by means of reproducing test cases or smoking guns, but happy to do some legwork to collect data if that's helpful.
Steps to reproduce
No response
Are there any recent changes that introduced the issue?
No response
OS
Linux, macOS
OS version
No response
Tailscale version
1.22.2
Bug report
BUG-1b1b6d797befc704c22048932ffcb72ce2677f8c2c08667f3bf69a4a4ee1ae83-20220425231410Z-d0c5b4f2ec891f08
