Skip to content

[nexus] Add tests for instance_create unwind safety#1843

Merged
smklein merged 25 commits into
mainfrom
instance-create-saga-tests
Nov 2, 2022
Merged

[nexus] Add tests for instance_create unwind safety#1843
smklein merged 25 commits into
mainfrom
instance-create-saga-tests

Conversation

@smklein

@smklein smklein commented Oct 20, 2022

Copy link
Copy Markdown
Collaborator

Builds on #1835 and oxidecomputer/steno#67

Adds a test for the "instance create" saga's unwind safety by:

  • Using a saga which attempts to create a network interface, allocate an external IP, and attach a disk
  • Failing after every single intermediate node in the saga
  • Verifying that no intermediate resources remain after any of the attempted executions

Comment thread common/Cargo.toml Outdated
slog = { version = "2.5", features = [ "max_level_trace", "release_max_level_debug" ] }
smf = "0.2"
steno = "0.2"
steno = { git = "https://github.com/oxidecomputer/steno", branch = "node-access" }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd use a new rev of steno instead before merging. Same elsewhere in this PR

Comment thread nexus/src/app/sagas/instance_create.rs Outdated
let params = new_test_params(&opctx, project_id);
let dag = create_saga_dag::<SagaInstanceCreate>(params).unwrap();

for i in 0..dag.get_node_count() {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "more automated" version of what I tried doing in #1835 - I can definitely consolidate the testing here, and create a "saga unwind tester", if that would be useful.

I basically see three phases:

  1. Setup the saga. Create any orgs / projects / disks / etc needed to run the saga.
  2. Iterate over every single node in the DAG, injecting failures at each one. Log which ones we're working on, to make it clear what went wrong if an error occurs. Run the saga.
  3. Assert that no detritus remains, and that the system is in the same state before / after saga execution.

@davepacheco davepacheco left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@leftwo leftwo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks great. I look forward to this going back so I can do the same things
with the volume sagas.

Not actually. There is definitely still a dependency, but the "test
setup" part can be re-used with Nexus tests without causing any real
circular dependency issues.

See #1835 (comment)
for an example of one such issue.

Frankly the rest of nexus-test-utils probably *should* continue to be
refactored such that there is no real dependency on Nexus; doing so
will reduce the risk of "mis-matched versions of Nexus" classes of
errors in the future.

This required the following changes:

- Move `nexus`'s config information into `common/src/nexus_config.rs`,
  so that it can be used by `nexus-test-utils`.
- Make `nexus-test-utils` operate on a generic version of `NexusServer`,
  exposed by a new crate named `nexus-test-interface`. This crate can
  contain any information used to operate Nexus during shared test
  setup, without actually requiring a full dependency on `nexus`.
- Note, adding this generic required making `ControlPlaneTestContext`
  generic, which ended up bubbling up to a lot of test setup functions,
  including the one used in the `#[nexus_test]` macro...
- ... So, make it possible to customize which version of `NexusServer`
  is used by `#[nexus_test]`. This means callers can supply their *own*
  version of `NexusServer`, either from their own crate (see: Nexus
  unit tests) or the version from the `omicron_nexus` crate (see: Nexus
  integration tests).
Base automatically changed from disk-create-idempotency to main November 2, 2022 17:21
@smklein smklein merged commit 25ffded into main Nov 2, 2022
@smklein smklein deleted the instance-create-saga-tests branch November 2, 2022 17:38
@leftwo leftwo mentioned this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants