Service: Wait for LXD members to be ready after join by roosterfish · Pull Request #1246 · canonical/microcloud

roosterfish · 2026-03-02T09:27:24Z

We sometimes observe this error in the pipeline and it seems to be a race.
This is to prevent the following error from happening in case resources are getting deployed right after creating the MicroCloud:

Error: Failed instance creation: Fetch project database object: Failed to fetch from projects table: Failed to fetch from projects table: Failed to fetch from projects table: sql: transaction has already been committed or rolled back

An equivalent fix was once added to lxd-ci, see https://github.com/canonical/lxd-ci/pull/577/files.

In addition the force start of LXD in the pipeline is moved from reset_system into restore_system to ensure it always runs.

Copilot

Pull request overview

This PR reduces post-cluster-join race conditions by ensuring LXD is responsive before returning from the join workflow, and adjusts the test harness so LXD is force-started after snapshot restores.

Changes:

Add a post-Join readiness wait (via internal/ready) before returning from LXDService.Join.
Move the CI “force LXD to start” step from reset_system to restore_system so it runs after snapshot restore.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`test/includes/microcloud.sh`	Moves the “force LXD startup” step to the snapshot restore path to avoid missing it when `SNAPSHOT_RESTORE=1`.
`service/lxd.go`	Adds a bounded wait for LXD readiness after joining a cluster to avoid early follow-up operations hitting an unready member.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

service/lxd.go

This allows reusing the same const inside the service package too. Signed-off-by: Julian Pelizäus <julian.pelizaeus@canonical.com>

This is to prevent the following error from happening in case resources are getting deployed right after creating the MicroCloud: Error: Failed instance creation: Fetch project database object: Failed to fetch from projects table: Failed to fetch from projects table: Failed to fetch from projects table: sql: transaction has already been committed or rolled back Signed-off-by: Julian Pelizäus <julian.pelizaeus@canonical.com>

When SNAPSHOT_RESTORE=1, the reset_systems func is returning early and doesn't run reset_system which does not trigger the force start of LXD. Instead perform this action in restore_system so it always runs regardless whether or not SNAPSHOT_RESTORE is set. Signed-off-by: Julian Pelizäus <julian.pelizaeus@canonical.com>

simondeziel

LGTM, thanks

roosterfish requested a review from Copilot March 9, 2026 13:55

Copilot started reviewing on behalf of roosterfish March 9, 2026 13:56 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

service/lxd.go Show resolved Hide resolved

service/lxd.go Outdated Show resolved Hide resolved

service/lxd.go Show resolved Hide resolved

roosterfish added 3 commits March 9, 2026 15:53

service: Move LXDInitializationTimeout closer to its source

7d13f44

This allows reusing the same const inside the service package too. Signed-off-by: Julian Pelizäus <julian.pelizaeus@canonical.com>

roosterfish force-pushed the prevent_sql_race branch from a558c50 to a026e75 Compare March 9, 2026 14:53

roosterfish marked this pull request as ready for review March 10, 2026 08:34

roosterfish requested review from simondeziel and tugbataluy March 10, 2026 08:35

simondeziel approved these changes Mar 10, 2026

View reviewed changes

roosterfish merged commit b2058bd into canonical:main Mar 10, 2026
56 of 57 checks passed

roosterfish deleted the prevent_sql_race branch March 10, 2026 12:48

roosterfish mentioned this pull request Mar 11, 2026

Sometimes reoccuring pipeline failures #653

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service: Wait for LXD members to be ready after join#1246

Service: Wait for LXD members to be ready after join#1246
roosterfish merged 3 commits intocanonical:mainfrom
roosterfish:prevent_sql_race

roosterfish commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simondeziel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

roosterfish commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simondeziel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roosterfish commented Mar 2, 2026 •

edited

Loading