feat: Add `uc ps` command to list all containers in the cluster #165

jabr · 2025-11-04T07:21:57Z

Implementation for #56.

Sorts by service, machine, or "health"
Has color highlight on broad categories of status/state (healthy=green/running=normal/unhealthy=red/other=yellow).
Uses "working" spinner while collecting container info.

cmd/uncloud/ps.go

psviderski

Overall looks good! Please see a few comments on the use of the custom bubble tea model, optimising the queries, and table layout.

psviderski · 2025-11-19T07:31:51Z

cmd/uncloud/ps.go

+	err        error
+}
+
+func newSpinnerModel(client *client.Client, message string) spinnerModel {


Do we need a custom spinner model and can just use the spinner with one action? https://github.com/charmbracelet/huh?tab=readme-ov-file#bonus-spinner

The reason we have a custom connectModel is because we change the spinner text to the connection we're currently trying when iterating over all of them.

For this command, looks like we're only printing "Collecting container info..." until we get all the containers.

psviderski · 2025-11-19T07:36:04Z

cmd/uncloud/ps.go

+func newSpinnerModel(client *client.Client, message string) spinnerModel {
+	s := spinner.New()
+	s.Spinner = spinner.Jump
+	s.Style = lipgloss.NewStyle().Foreground(lipgloss.Color("205"))


For the connection spinner, I also tried to use the same spinner type and colour used by the compose progress library (uc deploy output) to have a consistent UI language. Would be nice to maintain the consistency here as well

psviderski · 2025-11-19T07:44:27Z

cmd/uncloud/ps.go

+
+	var containers []containerInfo
+	for _, s := range services {
+		service, err := m.client.InspectService(context.Background(), s.ID)


The client calls that are broadcasted to all machines are quite expensive, that's why it takes a while to list all containers at the moment.

You can try using only one call client.Docker.ListServiceContainers

uncloud/internal/machine/docker/client.go

Lines 479 to 481 in a30ed60

// ListServiceContainers returns all containers on requested machines that belong to the service with the given

// name or ID. If serviceNameOrID is empty, all service containers are returned.

func (c *Client) ListServiceContainers(

See the ListServices implementation for details:

uncloud/pkg/client/service.go

Line 277 in 458d282

func (cli *Client) ListServices(ctx context.Context) ([]api.Service, error) {

psviderski · 2025-11-19T07:45:25Z

cmd/uncloud/ps.go

+		machinesNamesByID := make(map[string]string)
+		for _, m := range machines {
+			machinesNamesByID[m.Machine.Id] = m.Machine.Name
+		}


machines are always the same, they could be fetched and machinesNamesByID computed only once, not for every service to optimise the response time

psviderski · 2025-11-19T08:15:09Z

cmd/uncloud/ps.go

+		if len(id) > 12 {
+			id = id[:12]
+		}
+		name := strings.TrimPrefix(ctr.name, "/")


I don't think this is needed, we trim it when deserialising in our api.Container.

Not a big deal but also stringid.TruncateID(c.ID) from the docker package can be used to not hardcode the logic for shortening the container IDs

psviderski · 2025-11-19T08:27:46Z

cmd/uncloud/ps.go

+		header = "MACHINE\tSERVICE\tCONTAINER ID\tNAME\tIMAGE\tSTATUS"
+	} else {
+		header = "SERVICE\tMACHINE\tCONTAINER ID\tNAME\tIMAGE\tSTATUS"
+	}


I quite like the layout of pod listing in k9s (similar to kubectl):

It prints the node/machine column last. We do something similar for uc inspect SERVICE:

I think we need to decide on the layout and stick to it for all relevant commands to decrease the cognitive load on users. I might be wrong but I also feel that changing the order when grouping by another column might not be desirable as it again requires additional effort to read headers and understand where the data is relocated now.

k9s, for example, support sorting by any column and doesn't change the layout. It's like tables on website. You can click on the header and change the sorting order, but don't swap columns. So I believe the --group-by option isn't much different to this. Potentially the flag could even be called --sort. Not sure which one is more intuitive though.

To me, it seems natural to present data ordered from the largest dimension to smallest.

service > container > container properties

If we have a namespace/stack concept, then

namespace > service > container > container properties

The machine a container runs on is somewhat an odd one. It's not one to many relationship like service-container, it's many to many. So it's hard to find a good place for it. Putting it as the first column somewhat breaks the logical hierarchy. Putting it as the last one still allows to find it quickly but it remains less intrusive. That's my current reasoning

I've moved MACHINE to the last column.

One odd thing I noticed, it seems the lipgloss color styling of some cells in the second to last column and cause tabwriter to layout things incorrectly.

Have you encountered that issue?

Ah yeah I think I did and that was the reason I used the lipgloss table instead of tabwriter for formatting the uc image ls output:

uncloud/cmd/uncloud/image/ls.go

Lines 261 to 304 in 939645e

t := table.New().

// Remove the default border.

Border(lipgloss.Border{}).

BorderTop(false).

BorderBottom(false).

BorderLeft(false).

BorderRight(false).

BorderHeader(false).

BorderColumn(false).

StyleFunc(func(row, col int) lipgloss.Style {

if row == table.HeaderRow {

return lipgloss.NewStyle().Bold(true).PaddingRight(3)

}

// Regular style for data rows with padding.

return lipgloss.NewStyle().PaddingRight(3)

})

var headers []string

for _, col := range columns {

if !col.hide {

headers = append(headers, col.name)

}

}

t.Headers(headers...)

for _, row := range rows {

values := []string{

row.id,

row.name,

row.platforms,

row.createdHuman,

row.size,

row.inUse,

row.store,

row.machine,

}

var filteredValues []string

for i, v := range values {

if !columns[i].hide {

filteredValues = append(filteredValues, v)

}

}

t.Row(filteredValues...)

}

.

I think we need to use lipgloss table or something similar from the bubble tea world to correctly print styled data. Perhaps create our own convenient table abstraction wrapping one of those and use it for all CLI commands to make the formatting consistent.

I think we need to use lipgloss table or something similar from the bubble tea world to correctly print styled data.

Yeah, that fixed it. I just re-used the setup from image ls for this PR, but...

Perhaps create our own convenient table abstraction wrapping one of those and use it for all CLI commands to make the formatting consistent.

I'll do this as a follow-up PR with lipgloss.table. I was also thinking the abstraction might give us a good opportunity to add JSON output support, too, since that should be a simple second "formatter" for the table data these all generate.

btw, a draft of the "table abstraction" idea (though it ended up more of a JSON output option with some table helpers): #191

psviderski · 2025-11-21T07:45:42Z

cmd/uncloud/ps.go

+			md.Append("machines", machineIP)
+		}
+	}
+	listCtx := metadata.NewOutgoingContext(ctx, md)


JFYI you can also use this helper to create a context for broadcasting:

uncloud/pkg/api/client.go

Lines 62 to 98 in abea3e2

// ProxyMachinesContext returns a new context that proxies gRPC requests to the specified machines.

// If namesOrIDs is nil, all machines are included.

func ProxyMachinesContext(

ctx context.Context, cli MachineClient, namesOrIDs []string,

) (context.Context, MachineMembersList, error) {

// TODO: move the machine IP resolution to the proxy router to allow setting machine names and IDs in the metadata.

machines, err := cli.ListMachines(ctx, nil)

if err != nil {

return nil, nil, fmt.Errorf("list machines: %w", err)

}

var proxiedMachines MachineMembersList

var notFound []string

for _, nameOrID := range namesOrIDs {

if m := machines.FindByNameOrID(nameOrID); m != nil {

proxiedMachines = append(proxiedMachines, m)

} else {

notFound = append(notFound, nameOrID)

}

}

if len(notFound) > 0 {

return nil, nil, fmt.Errorf("machines not found: %s", strings.Join(notFound, ", "))

}

if len(namesOrIDs) == 0 {

proxiedMachines = machines

}

md := metadata.New(nil)

for _, m := range proxiedMachines {

machineIP, _ := m.Machine.Network.ManagementIp.ToAddr()

md.Append("machines", machineIP.String())

}

return metadata.NewOutgoingContext(ctx, md), proxiedMachines, nil

}

Thanks for pointing that out! I've updated this to use ProxyMachinesContext.

psviderski · 2025-11-21T07:48:22Z

cmd/uncloud/ps.go

+			continue
+		}
+		if msc.Metadata.Error != "" {
+			return nil, fmt.Errorf("list containers on machine %s: %s", msc.Metadata.Machine, msc.Metadata.Error)


I think we shouldn't fail here as this will make uc ps command useless if there is a machine that is down or temporarily unavailable. What we can do instead is print a warning(s) about unavailable machine(s) but still list the containers on the available machines.

I've changed to this a PrintWarning (like the volume command uses) and continue.

psviderski

Looks great! Just one comment to address an edge case when the cluster consists of only one machine 👍

psviderski · 2025-11-26T10:01:21Z

cmd/uncloud/ps.go

+	var containers []containerInfo
+	for _, msc := range machineContainers {
+		if msc.Metadata == nil {
+			continue


I agree this handling of broadcasted requests is very confusing and I have plans to rework it to simplify significantly. But for now, we need to still handle Metadata that is nil which could be a valid case when the cluster consists of only one machine. See

uncloud/pkg/client/service.go

Lines 123 to 140 in 458d282

// Metadata can be nil if the request was broadcasted to only one machine.

if mc.Metadata == nil && len(machineContainers) > 1 {

return svc, errors.New("something went wrong with gRPC proxy: metadata is missing for a machine response")

}

if mc.Metadata != nil && mc.Metadata.Error != "" {

// TODO: return failed machines in the response.

fmt.Printf("WARNING: failed to list containers on machine '%s': %s\n",

mc.Metadata.Machine, mc.Metadata.Error)

continue

}

machineID := ""

if mc.Metadata == nil {

// ListServiceContainers was proxied to only one machine.

for _, v := range machineIDByManagementIP {

machineID = v

break

}

Please test that it works with one-machine cluster

I see. I've added similar logic from service.go to the usage in ps.go now.

I tested it against a single machine ucind cluster locally.

I also added some unit tests on the ps.go#collectContainers for these various edge cases. It relies heavily on mocking the data returned to it, though, and not sure how you feel about that testing approach. Let me know.

Perfect, thank you so much for updating the Metadata handling! 🙏

Regarding these mocked tests, I'm not a big fan of them as it's hard to replicate all the intricacies of the dependent components and keep them up to date with the changes in those components. For example, I'm going to change how the grpc router works which will slightly change the Metadata values. Then we will need to update all the mocks to reflect the change. Hopefully LLMs will help to do it correctly.

That's why I've been implementing mainly e2e tests so far to test the real workflows. Although, these are the slowest tests and they started to become a bit flaky recently. We will likely need to do something about it soon.

And it's fine to keep the tests you added, appreciate your effort!

Regarding these mocked tests, I'm not a big fan of them as it's hard to replicate all the intricacies of the dependent components and keep them up to date with the changes in those components.

Absolutely agree.

That's why I've been implementing mainly e2e tests so far to test the real workflows. Although, these are the slowest tests and they started to become a bit flaky recently. We will likely need to do something about it soon.

Yeah, these really should be e2e tests, but I recalled you mentioning not really wanting to add more cluster setups in the e2e tests, too.

I looked at adding it to an existing e2e test, but they're oriented more around the functionality being tested rather than the cluster configuration needed, and it felt wrong to add these to e.g. the exec e2e tests...

Maybe we just need to change how the e2e tests are structured/grouped. Start with with "context" then the "functionality"? For example:

- in a one machine cluster - `exec` - does foo - `ps` - does bar - in a two machine cluster - `dns` - ... - with a suspect node - `cmd` - ...

Something like that to let us reuse the heavy, time-consuming cluster setup across as many different tests as possible. Once they are up, the individual test cases can be very quick.

Yeah, these really should be e2e tests, but I recalled you mentioning not really wanting to add more cluster setups in the e2e tests, too.

That's a tradeoff between performance and good structure. The fewer clusters we create when running the entire test suit, the faster it will run. And we want it to be faster because this will let us run it more often when developing locally and catching bugs earlier. Also, I'm not too sure how close we are to reaching the memory limit on the free GitHub CI agents we're running it now as every cluster consumes a few hundreds MB of memory.
So I was postponing any substantial work on e2e tests until they break, and they still work, not perfectly but work 😄

Maybe we just need to change how the e2e tests are structured/grouped. Start with with "context" then the "functionality"?

Yeah, that's a good option! I don't remember if we have any test suits that use different clusters or they always 1-to-1. It would be great if we don't need to split tests for one logical feature into multiple different files/suits.

I remember that we have a few cases that can't be run in parallel (e.g. depending on setting env vars) so this is something to keep in mind when restructuring test suits.

I've just asked Claude if it's possible to have something like pytest fixtures (Python) in Go to inject a cluster as a dependency. It suggested an interesting approach with a lazy cluster pool. It will allow to structure tests based on the feature/concept they test but also share clusters of the required configuration between them. We can even split the current long TestXXX suits into more scoped ones for easier management.

Plan: Shared Cluster Fixtures for E2E Tests

Goal

Reduce ucind cluster creation from 10 clusters to 2 shared clusters (one 3-machine, one 1-machine) while maintaining the ability to run individual tests in isolation when needed.

Recommended Approach: TestMain with Lazy Cluster Pools

Use Go's TestMain to manage shared cluster fixtures at the package level, with lazy initialization and fallback support for isolated test runs.

Architecture

test/e2e/ ├── main_test.go # NEW: TestMain + cluster pool management ├── cluster_test.go # Modified: use shared cluster ├── service_test.go # Modified: use shared cluster ├── ...

Key Design Decisions

Two shared cluster pools:

shared3Machine - for tests requiring 3 machines (8 tests currently)

shared1Machine - for tests requiring 1 machine (2 tests currently)

Lazy initialization with sync.Once: Clusters are created on first request, not upfront. This means running a single test still works.

Cleanup at package exit: TestMain handles cluster teardown after m.Run() completes.

Fallback support: If TEST_CLUSTER_NAME env var is set, use that cluster (existing behavior preserved).

Trade-offs

Pros:

Significant reduction in test setup time and resources

Tests still runnable in isolation (lazy init creates cluster on demand)

Backwards compatible with TEST_CLUSTER_NAME env var

Cons:

Tests must be careful about cleanup (already mostly true)

Test failures could potentially affect other tests sharing the cluster

Slightly more complex test infrastructure

… var naming)

…flag.

… for service containers

… error immediately, in `uc ps`

… machine

jabr force-pushed the list-containers branch from 3a3452a to 54c3b4f Compare November 6, 2025 04:33

jabr commented Nov 6, 2025

View reviewed changes

cmd/uncloud/ps.go Show resolved Hide resolved

jabr commented Nov 6, 2025

View reviewed changes

cmd/uncloud/ps.go Outdated Show resolved Hide resolved

jabr commented Nov 6, 2025

View reviewed changes

cmd/uncloud/ps.go Outdated Show resolved Hide resolved

jabr force-pushed the list-containers branch from 54c3b4f to 4805bb4 Compare November 10, 2025 05:04

psviderski reviewed Nov 19, 2025

View reviewed changes

jabr force-pushed the list-containers branch from 5018b5a to 890cb99 Compare November 20, 2025 08:19

psviderski reviewed Nov 21, 2025

View reviewed changes

jabr force-pushed the list-containers branch from dc014dd to 4e6c3b4 Compare November 22, 2025 08:08

jabr mentioned this pull request Nov 23, 2025

[spike] refactor/feat: Abstract cli table rendering and add JSON output format #191

Draft

psviderski approved these changes Nov 26, 2025

View reviewed changes

jabr added 14 commits November 26, 2025 21:04

feat: Add uc ps command to list all containers in the cluster

5a17cab

Fix indentation

c12d534

Improvements to the container state highlights code (use enum, better…

fdbc272

… var naming)

Add group/sort option by "health"

a7600cf

Rename uc ps --group-by flag to --sort. Also, remove --context …

9e637b1

…flag.

Use huh.spinner with standard style/type

28cd898

Remove pointless TrimPrefix on container name

006c1fe

Move machine lookup out of services loop

eb1e594

Use more efficient ListServiceContainers

6305c0a

Use consistent header other

fa6dd5f

Use api.ProxyMachinesContext helper for context to query all machines…

2346067

… for service containers

Make errored MachineServiceContainer entries a warning, not return an…

2d58fa8

… error immediately, in `uc ps`

Use lipgloss table instead of tabwriter for "uc ps" command

d3c2c39

Handle nil metadata in uc ps on container when cluster only has one…

86a8928

… machine

jabr force-pushed the list-containers branch from 9a7b410 to 86a8928 Compare November 27, 2025 05:55

psviderski approved these changes Nov 27, 2025

View reviewed changes

psviderski merged commit d89bfb8 into psviderski:main Nov 27, 2025
4 checks passed

jabr deleted the list-containers branch November 28, 2025 01:17

jabr mentioned this pull request Nov 28, 2025

New 'uc ps' command to list all service containers in the cluster #56

Closed

	// ListServiceContainers returns all containers on requested machines that belong to the service with the given
	// name or ID. If serviceNameOrID is empty, all service containers are returned.
	func (c *Client) ListServiceContainers(

	t := table.New().
	// Remove the default border.
	Border(lipgloss.Border{}).
	BorderTop(false).
	BorderBottom(false).
	BorderLeft(false).
	BorderRight(false).
	BorderHeader(false).
	BorderColumn(false).
	StyleFunc(func(row, col int) lipgloss.Style {
	if row == table.HeaderRow {
	return lipgloss.NewStyle().Bold(true).PaddingRight(3)
	}
	// Regular style for data rows with padding.
	return lipgloss.NewStyle().PaddingRight(3)
	})

	var headers []string
	for _, col := range columns {
	if !col.hide {
	headers = append(headers, col.name)
	}
	}
	t.Headers(headers...)

	for _, row := range rows {
	values := []string{
	row.id,
	row.name,
	row.platforms,
	row.createdHuman,
	row.size,
	row.inUse,
	row.store,
	row.machine,
	}
	var filteredValues []string
	for i, v := range values {
	if !columns[i].hide {
	filteredValues = append(filteredValues, v)
	}
	}
	t.Row(filteredValues...)
	}

	// ProxyMachinesContext returns a new context that proxies gRPC requests to the specified machines.
	// If namesOrIDs is nil, all machines are included.
	func ProxyMachinesContext(
	ctx context.Context, cli MachineClient, namesOrIDs []string,
	) (context.Context, MachineMembersList, error) {
	// TODO: move the machine IP resolution to the proxy router to allow setting machine names and IDs in the metadata.
	machines, err := cli.ListMachines(ctx, nil)
	if err != nil {
	return nil, nil, fmt.Errorf("list machines: %w", err)
	}

	var proxiedMachines MachineMembersList
	var notFound []string
	for _, nameOrID := range namesOrIDs {
	if m := machines.FindByNameOrID(nameOrID); m != nil {
	proxiedMachines = append(proxiedMachines, m)
	} else {
	notFound = append(notFound, nameOrID)
	}
	}

	if len(notFound) > 0 {
	return nil, nil, fmt.Errorf("machines not found: %s", strings.Join(notFound, ", "))
	}

	if len(namesOrIDs) == 0 {
	proxiedMachines = machines
	}

	md := metadata.New(nil)
	for _, m := range proxiedMachines {
	machineIP, _ := m.Machine.Network.ManagementIp.ToAddr()
	md.Append("machines", machineIP.String())
	}

	return metadata.NewOutgoingContext(ctx, md), proxiedMachines, nil
	}

	// Metadata can be nil if the request was broadcasted to only one machine.
	if mc.Metadata == nil && len(machineContainers) > 1 {
	return svc, errors.New("something went wrong with gRPC proxy: metadata is missing for a machine response")
	}
	if mc.Metadata != nil && mc.Metadata.Error != "" {
	// TODO: return failed machines in the response.
	fmt.Printf("WARNING: failed to list containers on machine '%s': %s\n",
	mc.Metadata.Machine, mc.Metadata.Error)
	continue
	}

	machineID := ""
	if mc.Metadata == nil {
	// ListServiceContainers was proxied to only one machine.
	for _, v := range machineIDByManagementIP {
	machineID = v
	break
	}

Uh oh!

feat: Add uc ps command to list all containers in the cluster #165

feat: Add uc ps command to list all containers in the cluster #165

Uh oh!

Conversation

jabr commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psviderski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psviderski Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psviderski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psviderski Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Plan: Shared Cluster Fixtures for E2E Tests

Goal

Architecture

Key Design Decisions

Trade-offs

Pros:

Cons:

Uh oh!

Uh oh!

Reviewers

feat: Add `uc ps` command to list all containers in the cluster #165

feat: Add `uc ps` command to list all containers in the cluster #165

jabr commented Nov 4, 2025 •

edited

Loading

psviderski Nov 21, 2025 •

edited

Loading

psviderski Nov 28, 2025 •

edited

Loading