Skip to content

Conversation

@jabr
Copy link
Contributor

@jabr jabr commented Nov 4, 2025

Implementation for #56.

  • Sorts by service, machine, or "health"
  • Has color highlight on broad categories of status/state (healthy=green/running=normal/unhealthy=red/other=yellow).
  • Uses "working" spinner while collecting container info.

Copy link
Owner

@psviderski psviderski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Please see a few comments on the use of the custom bubble tea model, optimising the queries, and table layout.

err error
}

func newSpinnerModel(client *client.Client, message string) spinnerModel {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a custom spinner model and can just use the spinner with one action? https://github.com/charmbracelet/huh?tab=readme-ov-file#bonus-spinner

The reason we have a custom connectModel is because we change the spinner text to the connection we're currently trying when iterating over all of them.

For this command, looks like we're only printing "Collecting container info..." until we get all the containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

func newSpinnerModel(client *client.Client, message string) spinnerModel {
s := spinner.New()
s.Spinner = spinner.Jump
s.Style = lipgloss.NewStyle().Foreground(lipgloss.Color("205"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the connection spinner, I also tried to use the same spinner type and colour used by the compose progress library (uc deploy output) to have a consistent UI language. Would be nice to maintain the consistency here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


var containers []containerInfo
for _, s := range services {
service, err := m.client.InspectService(context.Background(), s.ID)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client calls that are broadcasted to all machines are quite expensive, that's why it takes a while to list all containers at the moment.

You can try using only one call client.Docker.ListServiceContainers

// ListServiceContainers returns all containers on requested machines that belong to the service with the given
// name or ID. If serviceNameOrID is empty, all service containers are returned.
func (c *Client) ListServiceContainers(

See the ListServices implementation for details:

func (cli *Client) ListServices(ctx context.Context) ([]api.Service, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

machinesNamesByID := make(map[string]string)
for _, m := range machines {
machinesNamesByID[m.Machine.Id] = m.Machine.Name
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machines are always the same, they could be fetched and machinesNamesByID computed only once, not for every service to optimise the response time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if len(id) > 12 {
id = id[:12]
}
name := strings.TrimPrefix(ctr.name, "/")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, we trim it when deserialising in our api.Container.

Not a big deal but also stringid.TruncateID(c.ID) from the docker package can be used to not hardcode the logic for shortening the container IDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

header = "MACHINE\tSERVICE\tCONTAINER ID\tNAME\tIMAGE\tSTATUS"
} else {
header = "SERVICE\tMACHINE\tCONTAINER ID\tNAME\tIMAGE\tSTATUS"
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like the layout of pod listing in k9s (similar to kubectl):
Image

It prints the node/machine column last. We do something similar for uc inspect SERVICE:

Image

I think we need to decide on the layout and stick to it for all relevant commands to decrease the cognitive load on users. I might be wrong but I also feel that changing the order when grouping by another column might not be desirable as it again requires additional effort to read headers and understand where the data is relocated now.

k9s, for example, support sorting by any column and doesn't change the layout. It's like tables on website. You can click on the header and change the sorting order, but don't swap columns. So I believe the --group-by option isn't much different to this. Potentially the flag could even be called --sort. Not sure which one is more intuitive though.

To me, it seems natural to present data ordered from the largest dimension to smallest.

service > container > container properties

If we have a namespace/stack concept, then

namespace > service > container > container properties

The machine a container runs on is somewhat an odd one. It's not one to many relationship like service-container, it's many to many. So it's hard to find a good place for it. Putting it as the first column somewhat breaks the logical hierarchy. Putting it as the last one still allows to find it quickly but it remains less intrusive. That's my current reasoning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved MACHINE to the last column.

One odd thing I noticed, it seems the lipgloss color styling of some cells in the second to last column and cause tabwriter to layout things incorrectly.

Have you encountered that issue?

Copy link
Owner

@psviderski psviderski Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I think I did and that was the reason I used the lipgloss table instead of tabwriter for formatting the uc image ls output:

t := table.New().
// Remove the default border.
Border(lipgloss.Border{}).
BorderTop(false).
BorderBottom(false).
BorderLeft(false).
BorderRight(false).
BorderHeader(false).
BorderColumn(false).
StyleFunc(func(row, col int) lipgloss.Style {
if row == table.HeaderRow {
return lipgloss.NewStyle().Bold(true).PaddingRight(3)
}
// Regular style for data rows with padding.
return lipgloss.NewStyle().PaddingRight(3)
})
var headers []string
for _, col := range columns {
if !col.hide {
headers = append(headers, col.name)
}
}
t.Headers(headers...)
for _, row := range rows {
values := []string{
row.id,
row.name,
row.platforms,
row.createdHuman,
row.size,
row.inUse,
row.store,
row.machine,
}
var filteredValues []string
for i, v := range values {
if !columns[i].hide {
filteredValues = append(filteredValues, v)
}
}
t.Row(filteredValues...)
}
.

I think we need to use lipgloss table or something similar from the bubble tea world to correctly print styled data. Perhaps create our own convenient table abstraction wrapping one of those and use it for all CLI commands to make the formatting consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use lipgloss table or something similar from the bubble tea world to correctly print styled data.

Yeah, that fixed it. I just re-used the setup from image ls for this PR, but...

Perhaps create our own convenient table abstraction wrapping one of those and use it for all CLI commands to make the formatting consistent.

I'll do this as a follow-up PR with lipgloss.table. I was also thinking the abstraction might give us a good opportunity to add JSON output support, too, since that should be a simple second "formatter" for the table data these all generate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, a draft of the "table abstraction" idea (though it ended up more of a JSON output option with some table helpers): #191

md.Append("machines", machineIP)
}
}
listCtx := metadata.NewOutgoingContext(ctx, md)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI you can also use this helper to create a context for broadcasting:

// ProxyMachinesContext returns a new context that proxies gRPC requests to the specified machines.
// If namesOrIDs is nil, all machines are included.
func ProxyMachinesContext(
ctx context.Context, cli MachineClient, namesOrIDs []string,
) (context.Context, MachineMembersList, error) {
// TODO: move the machine IP resolution to the proxy router to allow setting machine names and IDs in the metadata.
machines, err := cli.ListMachines(ctx, nil)
if err != nil {
return nil, nil, fmt.Errorf("list machines: %w", err)
}
var proxiedMachines MachineMembersList
var notFound []string
for _, nameOrID := range namesOrIDs {
if m := machines.FindByNameOrID(nameOrID); m != nil {
proxiedMachines = append(proxiedMachines, m)
} else {
notFound = append(notFound, nameOrID)
}
}
if len(notFound) > 0 {
return nil, nil, fmt.Errorf("machines not found: %s", strings.Join(notFound, ", "))
}
if len(namesOrIDs) == 0 {
proxiedMachines = machines
}
md := metadata.New(nil)
for _, m := range proxiedMachines {
machineIP, _ := m.Machine.Network.ManagementIp.ToAddr()
md.Append("machines", machineIP.String())
}
return metadata.NewOutgoingContext(ctx, md), proxiedMachines, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I've updated this to use ProxyMachinesContext.

continue
}
if msc.Metadata.Error != "" {
return nil, fmt.Errorf("list containers on machine %s: %s", msc.Metadata.Machine, msc.Metadata.Error)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't fail here as this will make uc ps command useless if there is a machine that is down or temporarily unavailable. What we can do instead is print a warning(s) about unavailable machine(s) but still list the containers on the available machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to this a PrintWarning (like the volume command uses) and continue.

Copy link
Owner

@psviderski psviderski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one comment to address an edge case when the cluster consists of only one machine 👍

var containers []containerInfo
for _, msc := range machineContainers {
if msc.Metadata == nil {
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this handling of broadcasted requests is very confusing and I have plans to rework it to simplify significantly. But for now, we need to still handle Metadata that is nil which could be a valid case when the cluster consists of only one machine. See

// Metadata can be nil if the request was broadcasted to only one machine.
if mc.Metadata == nil && len(machineContainers) > 1 {
return svc, errors.New("something went wrong with gRPC proxy: metadata is missing for a machine response")
}
if mc.Metadata != nil && mc.Metadata.Error != "" {
// TODO: return failed machines in the response.
fmt.Printf("WARNING: failed to list containers on machine '%s': %s\n",
mc.Metadata.Machine, mc.Metadata.Error)
continue
}
machineID := ""
if mc.Metadata == nil {
// ListServiceContainers was proxied to only one machine.
for _, v := range machineIDByManagementIP {
machineID = v
break
}

Please test that it works with one-machine cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I've added similar logic from service.go to the usage in ps.go now.

I tested it against a single machine ucind cluster locally.

I also added some unit tests on the ps.go#collectContainers for these various edge cases. It relies heavily on mocking the data returned to it, though, and not sure how you feel about that testing approach. Let me know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you so much for updating the Metadata handling! 🙏

Regarding these mocked tests, I'm not a big fan of them as it's hard to replicate all the intricacies of the dependent components and keep them up to date with the changes in those components. For example, I'm going to change how the grpc router works which will slightly change the Metadata values. Then we will need to update all the mocks to reflect the change. Hopefully LLMs will help to do it correctly.

That's why I've been implementing mainly e2e tests so far to test the real workflows. Although, these are the slowest tests and they started to become a bit flaky recently. We will likely need to do something about it soon.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's fine to keep the tests you added, appreciate your effort!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding these mocked tests, I'm not a big fan of them as it's hard to replicate all the intricacies of the dependent components and keep them up to date with the changes in those components.

Absolutely agree.

That's why I've been implementing mainly e2e tests so far to test the real workflows. Although, these are the slowest tests and they started to become a bit flaky recently. We will likely need to do something about it soon.

Yeah, these really should be e2e tests, but I recalled you mentioning not really wanting to add more cluster setups in the e2e tests, too.

I looked at adding it to an existing e2e test, but they're oriented more around the functionality being tested rather than the cluster configuration needed, and it felt wrong to add these to e.g. the exec e2e tests...

Maybe we just need to change how the e2e tests are structured/grouped. Start with with "context" then the "functionality"? For example:

- in a one machine cluster
  - `exec`
    - does foo
  - `ps`
    - does bar
- in a two machine cluster
  - `dns`
    -  ... 
  - with a suspect node
    - `cmd`
      - ...

Something like that to let us reuse the heavy, time-consuming cluster setup across as many different tests as possible. Once they are up, the individual test cases can be very quick.

Copy link
Owner

@psviderski psviderski Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these really should be e2e tests, but I recalled you mentioning not really wanting to add more cluster setups in the e2e tests, too.

That's a tradeoff between performance and good structure. The fewer clusters we create when running the entire test suit, the faster it will run. And we want it to be faster because this will let us run it more often when developing locally and catching bugs earlier. Also, I'm not too sure how close we are to reaching the memory limit on the free GitHub CI agents we're running it now as every cluster consumes a few hundreds MB of memory.
So I was postponing any substantial work on e2e tests until they break, and they still work, not perfectly but work 😄

Maybe we just need to change how the e2e tests are structured/grouped. Start with with "context" then the "functionality"?

Yeah, that's a good option! I don't remember if we have any test suits that use different clusters or they always 1-to-1. It would be great if we don't need to split tests for one logical feature into multiple different files/suits.

I remember that we have a few cases that can't be run in parallel (e.g. depending on setting env vars) so this is something to keep in mind when restructuring test suits.

I've just asked Claude if it's possible to have something like pytest fixtures (Python) in Go to inject a cluster as a dependency. It suggested an interesting approach with a lazy cluster pool. It will allow to structure tests based on the feature/concept they test but also share clusters of the required configuration between them. We can even split the current long TestXXX suits into more scoped ones for easier management.

Plan: Shared Cluster Fixtures for E2E Tests

Goal

Reduce ucind cluster creation from 10 clusters to 2 shared clusters (one 3-machine, one 1-machine) while maintaining the ability to run individual tests in isolation when needed.

Recommended Approach: TestMain with Lazy Cluster Pools

Use Go's TestMain to manage shared cluster fixtures at the package level, with lazy initialization and fallback support for isolated test runs.

Architecture

 test/e2e/
 ├── main_test.go      # NEW: TestMain + cluster pool management
 ├── cluster_test.go   # Modified: use shared cluster
 ├── service_test.go   # Modified: use shared cluster
 ├── ...

Key Design Decisions

  1. Two shared cluster pools:
  • shared3Machine - for tests requiring 3 machines (8 tests currently)
  • shared1Machine - for tests requiring 1 machine (2 tests currently)
  1. Lazy initialization with sync.Once: Clusters are created on first request, not upfront. This means running a single test still works.
  2. Cleanup at package exit: TestMain handles cluster teardown after m.Run() completes.
  3. Fallback support: If TEST_CLUSTER_NAME env var is set, use that cluster (existing behavior preserved).

Trade-offs

Pros:

  • Significant reduction in test setup time and resources
  • Tests still runnable in isolation (lazy init creates cluster on demand)
  • Backwards compatible with TEST_CLUSTER_NAME env var

Cons:

  • Tests must be careful about cleanup (already mostly true)
  • Test failures could potentially affect other tests sharing the cluster
  • Slightly more complex test infrastructure

@psviderski psviderski merged commit d89bfb8 into psviderski:main Nov 27, 2025
4 checks passed
@jabr jabr deleted the list-containers branch November 28, 2025 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants