Skip to content

[BUG] Unable to connect to cluster if first machine is down #119

@luislavena

Description

@luislavena

Describe the bug

When attempting to machine ls or service ls fails when the first node in the cluster is down:

$ uc --uncloud-config ./blatta.yaml service ls
Error: connect to cluster: connect to cluster (context 'default'): connect to machine: SSH login to provision@192.168.1.203:22: connect using SSH agent: dial tcp 192.168.1.203:22: i/o timeout

How to reproduce

Local configuration:

current_context: default
contexts:
  default:
    connections:
      # blatta1
      - ssh: provision@192.168.1.203
      # blatta2
      - ssh: provision@192.168.1.194
      # blatta3
      - ssh: provision@192.168.1.246
  1. Setup 2/3 nodes cluster and ensure all nodes are listed in connections of uncloud configuration
  2. Verify services and machines are listed and running
  3. Turn off/poweroff the first node of the list in the cluster
  4. Attempt to repeat step number 2 and notice the timeout failure
  5. Comment out the first connection and try again, this time it works.

Expected behavior

I would expect uc to retry by connecting any of the servers listed under connections instead of failing at the first one.

Environment:

  • Uncloud versions:
    • Control (client) node (uc --version): 0.12.1 (darwin/arm64)
    • Uncloud daemon (from the server) (uncloudd --version): 0.12.1 (linux/amd64)
  • OS version (uname -a):
    • Client (control node): Darwin Kernel Version 24.6.0
    • Server: Debian 6.12.43-1 (2025-08-27) x86_64 GNU/Linux

Thank you
❤️ ❤️ ❤️

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions