Skip to content

supply real volume data to propolis#771

Merged
jmpesp merged 18 commits into
oxidecomputer:mainfrom
jmpesp:volumes
Mar 25, 2022
Merged

supply real volume data to propolis#771
jmpesp merged 18 commits into
oxidecomputer:mainfrom
jmpesp:volumes

Conversation

@jmpesp

@jmpesp jmpesp commented Mar 15, 2022

Copy link
Copy Markdown
Contributor

Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).

There's a few important changes here.

Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.

sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.

Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.

Nexus' disk_set_runtime now optionally calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.

request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).

Note: spawning an instance with this will not work unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.

@jmpesp

jmpesp commented Mar 15, 2022

Copy link
Copy Markdown
Contributor Author

This is in no way ready to go in, but opening this for early reviews

disks: vec![],
disks: self.requested_disks.clone(),
migrate,
cloud_init_bytes: None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @iliana

@jmpesp

jmpesp commented Mar 15, 2022

Copy link
Copy Markdown
Contributor Author

Note you'll need oxidecomputer/typify#41 for this to compile

@jmpesp

jmpesp commented Mar 16, 2022

Copy link
Copy Markdown
Contributor Author

Note I have to do a rebase on top of the "Remove 'sled-agent' API parameters from 'common'" PR, so expect a force push.

@jmpesp

jmpesp commented Mar 16, 2022

Copy link
Copy Markdown
Contributor Author

@smklein fyi rebased

@bnaecker this commit impacts how nics are attached a bit, please have a look - I tested by booting the regular alpine ISO via the instance_create_demo and saw eth0, so I don't think I screwed up the refactor :)

Comment thread nexus/src/sagas.rs
.windows(2)
.all(|w| w[0].1.block_size == w[1].1.block_size);

if !all_region_have_same_block_size {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I had some bad experience panicking during a saga. The recovery mechanisms will basically just retry this node indefinitely, not unwind the whole thing. I'd consider returning an error instead.

@jmpesp

jmpesp commented Mar 18, 2022

Copy link
Copy Markdown
Contributor Author

Rebasing again, will comment when it's done.

@jmpesp jmpesp force-pushed the volumes branch 2 times, most recently from c833176 to 35b5a38 Compare March 18, 2022 19:56
Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).

There's a few important changes here.

Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.

sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.

Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.

Nexus' disk_set_runtime now *optionally* calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.

request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).

Note: spawning an instance with this *will not work* unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.
@jmpesp

jmpesp commented Mar 18, 2022

Copy link
Copy Markdown
Contributor Author

Rebase #N completed, but oxidecomputer/crucible#243 is required to test.

@jmpesp

jmpesp commented Mar 18, 2022

Copy link
Copy Markdown
Contributor Author

Note I've updated this PR's first comment with the text of the commit.

@jmpesp

jmpesp commented Mar 18, 2022

Copy link
Copy Markdown
Contributor Author

This is ready for review now

@jmpesp jmpesp marked this pull request as ready for review March 18, 2022 20:29
@jmpesp jmpesp changed the title [DRAFT] supply real volume data to propolis supply real volume data to propolis Mar 18, 2022

@smklein smklein left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments, this looks awesome. So sick to see this coming together.

Comment thread common/src/api/external/mod.rs Outdated
Comment thread nexus/src/db/model.rs Outdated
Comment thread nexus/src/nexus.rs Outdated
Comment thread nexus/tests/integration_tests/instances.rs
Comment thread nexus/src/nexus.rs Outdated
Comment thread nexus/src/nexus.rs Outdated
Comment thread nexus/src/nexus.rs Outdated
Comment thread nexus/src/sagas.rs Outdated
Comment thread nexus/src/sagas.rs Outdated

if attached {
osagactx
.nexus()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each step in the saga must be idempotent - namely, repeatedly calling ensure_instance_disk_attach_state, either as part of the action or undo-action must:

  • Result in the same effective state
  • Return the same result

Is this currently the case?

(I think it is - especially since we're operating at exclusively a DB layer here - but wanted to check that assumption)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is true for the same reason but I'm not sure. At the very least, repeated calls to instance_attach_disk and instance_detach_disk is tested by some of the disk related integration tests.

What isn't tested is doing that through the sled agent for a running instance but that code path isn't hit here (?).

Comment thread nexus/src/sagas.rs Outdated
saga_params.organization_name.clone().into();
let project_name: db::model::Name = saga_params.project_name.clone().into();

for disk in saga_disks {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not explicitly a request, more just out-loud thinking)

To clarify the path, from a high-level:

  • sic_attach_disks_to_instance happens right before actually sending the "instance ensure" request
  • It calls Nexus::instance_attach_disk, which calls Nexus::disk_set_runtime
  • If the instance state is InstanceState::Creating and the requested disk state is DiskStateRequested::Attached, we write the requested disk state ("attached") to the database, but don't send any requests to the sled agent (yet).

I wonder if we want to bifurcate disk_set_runtime into two different functions, within Nexus. The fact that it "may or may not make a request to sled agent" feels a little unintuitive to me - I 100% understand why it's acting that way, and I think deferring the request until actually calling "instance ensure" makes sense, but understanding when it would / would not make a call to sled agent feels a little non-obvious.

WDYT? I think this is subjective, but just wanna make sure the flows are as straightforward as they can be.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true (though the actual path traversed through the code has changed as part of 581bdee), with the idea that attaching disks to the instance only in the DB is preparing for the logic in instance_set_runtime that queries the disks attached to the instance.

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

Comment thread nexus/src/nexus.rs Outdated
Comment thread nexus/src/nexus.rs
Comment on lines +1435 to +1436
// TODO this will probably involve volume construction requests as
// well!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this remarking that the API to disk_set_runtime is insufficient? I know most of this hot-plug path isn't implemented (which is 100% fine) but I just wanna make sure we're very explicit about what we expect to work / not work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propolis needs to be handed the volume construction request, so sled agent's disk_put is currently insufficient, yeah, but what do you think about

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change sounds good to me!

Comment thread nexus/src/sagas.rs Outdated
Comment thread nexus/src/sagas.rs Outdated
Comment thread nexus/src/sagas.rs Outdated
jmpesp added 2 commits March 24, 2022 13:30
I had an incorrect mental model of how distributed sagas work and this
caused problems - disks would not be detached when instance creation
failed.

I tried to correct the code in a slightly different way than was done
for NICs. Instead of pre-allocating IDs and having a previous saga node
delete based on these IDs, I chose to split the attach disks to instance
node into 8, each with its own undo.

This is now tested by trying to attach 8 disks, one of which is marked
as faulted. The whole saga should unwind and the test checks for
semantic equivalence.

Added some tests for the new arbitrary 8 disk limit. This required
bumping the request max body size because of the larger instance ensure
bodies.

@smklein smklein left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only remaining comments are basically nits

Comment thread nexus/src/sagas.rs Outdated
Comment thread nexus/src/sagas.rs
// conditional logic depending on if that disk index is going to be used.
// Steno does not currently support the saga node graph changing shape.
for i in 0..8 {
template_builder.append(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about calling append_parallel here, and running these concurrently? Same for attach?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't, unfortunately, unless I store the strings somewhere persistent. append_parallel wants tuples that look like (&str, &str, Arc<dyn steno::Action<SagaInstanceCreate>>).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not be possible through:

template_builder.append_parallel(
  (0..MAX_DISKS).map(|i| {
    (
      &format!("create_disks{}", i),
      "CreateDisksForInstance",
      AsyncFunc::new_action(....),
    )
  }).collect::<Vec<_>>()
);

(same for "AttachDisks" below)

If Rust gives you issues with lifetimes, then maybe:

let names = (0...MAX_DISKS).map(|i| format!("create_disks{}", i)).collect::<Vec<String>>();
template_builder.append_parallel(
  (0..MAX_DISKS).map(|i| {
    (
      names[i],
      "CreateDisksForInstance",
      AsyncFunc::new_action(....),
    )
  }).collect::<Vec<_>>()
);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust does give you issues with lifetimes for the first one, and for the second I'm pretty sure names has to live longer than the template builder function.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure names doesn't need to last longer than the function - the signature doesn't imply that, and the implementation actually copies whatever string is provided by calling .to_string() : https://github.com/oxidecomputer/steno/blob/578498bbb43f9061c636b938dff98e7ece0b0efc/src/saga_template.rs#L315-L359

Comment thread nexus/src/sagas.rs
@jmpesp

jmpesp commented Mar 25, 2022

Copy link
Copy Markdown
Contributor Author

:( Looks like this PR is hitting an issue due to zeroize being pinned by aes-gcm-siv:

cargo build
    Updating git repository `https://github.com/oxidecomputer/crucible`
error: failed to select a version for `zeroize`.
    ... required by package `aes-gcm-siv v0.10.3`
    ... which satisfies dependency `aes-gcm-siv = "^0.10.3"` (locked to 0.10.3) of package `crucible v0.0.1 (https://github.com/oxidecomputer/crucible?rev=4090a023b77dcab7a5000057cf2c96cdbb0469b6#4090a023)`
    ... which satisfies git dependency `crucible` (locked to 0.0.1) of package `propolis-client v0.1.0 (https://github.com/oxidecomputer/propolis?rev=832a86afce308d3210685af987ace1ba74c2ecd6#832a86af)`
    ... which satisfies git dependency `propolis-client` (locked to 0.1.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`
versions that meet the requirements `>=1, <1.4` are: 1.3.0, 1.2.0, 1.1.1, 1.1.0, 1.0.0

all possible versions conflict with previously selected packages.

  previously selected package `zeroize v1.4.3`
    ... which satisfies dependency `zeroize = ">=1, <1.5"` (locked to 1.4.3) of package `crypto-bigint v0.2.11`
    ... which satisfies dependency `crypto-bigint = "^0.2.4"` (locked to 0.2.11) of package `elliptic-curve v0.10.6`
    ... which satisfies dependency `elliptic-curve = "^0.10.4"` (locked to 0.10.6) of package `ecdsa v0.12.4`
    ... which satisfies dependency `ecdsa-core = "^0.12"` (locked to 0.12.4) of package `p256 v0.9.0`
    ... which satisfies dependency `p256 = "^0.9.0"` (locked to 0.9.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`

failed to select a version for `zeroize` which could resolve this conflict

the linked issue mentions that four days ago there was a prereleases with unpinned zeroize, hopefully that gets pushed soon.

@jmpesp

jmpesp commented Mar 25, 2022

Copy link
Copy Markdown
Contributor Author

All TODOs are now tracked with items.

@smklein

smklein commented Mar 25, 2022

Copy link
Copy Markdown
Collaborator

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

@jmpesp

jmpesp commented Mar 25, 2022

Copy link
Copy Markdown
Contributor Author

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

This didn't work for me?

@jmpesp jmpesp merged commit 3424aeb into oxidecomputer:main Mar 25, 2022
@jmpesp jmpesp deleted the volumes branch March 25, 2022 20:57
@iliana iliana mentioned this pull request Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants