Conversation
… high-latency proxying Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
Review Summary by QodoImprove Grid Router WebSocket handling with TCP tunneling, keepalive pings, and idle detection
WalkthroughsDescription• Implement direct TCP tunneling for WebSocket connections to eliminate message parsing overhead • Add application-level WebSocket pings every 30s to prevent cloud load balancer idle timeouts • Handle node-initiated close frames and upstream errors with proper client channel closure • Install idle-state detection on tunnel channels to detect silently dropped connections • Add configurable --tcp-tunnel flag to disable direct tunneling for restricted network topologies File Changes1. java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java
|
Code Review by Qodo
1. Tunnel hard idle timeout
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
…, high-latency proxying (#17197) Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
🔗 Related Issues
💥 What does this PR do?
Improve three user-visible problems with WebSocket connections (BiDi, CDP) routed through Selenium Grid:
BiDi/CDP sessions hang after the browser crashes or the Node kills the session. When the Node closed the WebSocket, the client was never notified — it would sit open until the next keepalive cycle (up to 30 s) rather than receiving a clean close.
Cloud load balancers (AWS ALB, GCP, k8s ingress-nginx) silently drop idle WebSocket connections mid-session. These LBs have a 60 s idle timeout and ignore OS-level TCP keepalives. Long-running Playwright or BiDi tests that go quiet for more than 60 s would have their connection dropped without any signal. Fixed by sending application-level WebSocket pings every 30 s.
Every BiDi/CDP message was parsed and repackaged by the Router even though the Router has nothing to do with the message content. This added latency and CPU overhead proportional to message rate. The Router now bridges the connection at the TCP level, removing itself from the data path entirely after the initial handshake. Falls back to the old message-proxying path for network topologies where a direct TCP connection to the Node is not possible (e.g. Kubernetes port-forward).
🔧 Implementation Notes
💡 Additional Considerations
🔄 Types of changes