Issue extent flushes in parallel#801
Conversation
First, move the concept of whether an Extent is opened or closed *out*
of Extent by removing the Option around Mutex<Inner>:
- inner: Option<Mutex<Inner>>,
+ pub inner: Mutex<Inner>,
This moves up to Region as a new enum:
- pub extents: Vec<Extent>,
+ pub extents: Vec<Arc<Mutex<ExtentState>>>,
where ExtentState uses the variant to store if the extent is opened or
closed:
+pub enum ExtentState {
+ Opened(Arc<Extent>),
+ Closed,
+}
This is all in service of `fn region_flush` issuing a `tokio::spawn` for
each `extent.flush` call, and waiting on each join handle that is
returned. This is possible when an `Arc<Extent>` is used because a clone
can be passed to each task.
Using an enum in this way is also a little safer: `Region::close_extent`
actually consumes the old extent while closing it, meaning that the old
Extent no longer hangs around. It does however mean that the downstairs
will panic if the ExtentState is not expected: both `get_opened_extent`
and `close_extent` panic if the extent in question is closed.
`close_extent` also panics if you're trying to close an extent where
there's an Arc clone somewhere as part of doing some other work.
Note this work required updating to Rust 1.70 (for `Arc::into_inner`).
There's a few unrelated changes due to this.
leftwo
left a comment
There was a problem hiding this comment.
We should run the ./tools/test_repair.sh and ./tools/test_restart_repair.sh tests, and possibly bump their internal loop counters and let them run for longer.
| @@ -1,3 +1,3 @@ | |||
| [toolchain] | |||
| channel = "1.66" | |||
| channel = "1.70" | |||
There was a problem hiding this comment.
If we bump this, do we need it bumped anywhere else in Omicron/Propolis?
I don't think so, as the downstairs itself is standalone, but not sure about any
other consumers of the upstairs
There was a problem hiding this comment.
I'm not sure, leaning towards yes - I can't see how (for example) Propolis would be able to compile this in as a library without support for Arc::into_inner. We'll see when that rev gets bumped I guess :)
There was a problem hiding this comment.
We might want to try to bump propolis and see if it compiles. If there is some entanglement
and it turns out to be a huge amount of work, we might consider holding this fix till we can
get propolis ready for us. I'm sure we will have other changes we want in Crucible and I want
to be able to update omicron/propolis frequently over the next few weeks.
There was a problem hiding this comment.
i was compiling latest propolis against this commit to run my perf tests fwiw and didnt run into any issues. though i did not like, run the propolis test suite or anything like that. just launched an instance and ran fio in it.
| assert_eq!(res, ReplaceResult::Started); | ||
|
|
||
| eprintln!("Wait for some repair work to proceed"); | ||
| tokio::time::sleep(tokio::time::Duration::from_secs(4)).await; |
There was a problem hiding this comment.
I've been pulling out hair tracking down failing CI tests, and the sleep here makes me
nervous. Is there any way to make this deterministic? Like checking for some job progress but
not completed repair yet? On a busy system or a fast system, could this sleep be too little
or too much?
There was a problem hiding this comment.
Good point, I changed this to check for the StartedAlready and panic if the repair completed in 9994e79
|
I ran All passed. I upped the loop count for each to get some extended runtime. |
| for join_handle in join_handles { | ||
| join_handle | ||
| .await | ||
| .map_err(|e| CrucibleError::GenericError(e.to_string()))??; |
There was a problem hiding this comment.
If we error here, where does this bubble out to?
There was a problem hiding this comment.
if any of the flushes fails, then it will get returned as a failure here for region_flush. Though, see 53c45fb for the fix for an issue you brought up during a chat: we should wait for all results before returning, instead of (potentially) cancelling these flush tasks in the middle of execution if one of them errors.
Bump crucible rev to pick up: - Self assembling zones must set their own MTU - fix build break caused by merging oxidecomputer/crucible#801 - Issue extent flushes in parallel - Add Logger Option to Volume construct method - Update Rust crate libc to 0.2.147 - Update Rust crate percent-encoding to 2.3 - Retry jobs until they succeed - Reorder select arms so pings can't be starved out - Treat a skipped IO like an error IO for ACK results. - Retry pantry requests - Remove panics and asserts in dummy tests - Update Rust crate csv to 1.2.2 - Update Rust crate reedline to 0.21.0 - Set open file resource limit to the max - Update Rust crate ringbuffer to 0.14 - DTrace meet cmon - Widen assert values to u128 to deal with u64::MAX - Change size_to_validate from usize to u64 - Turn on live-repair test in CI - Increase flush_timeout for some tests, collect cores - Update to latest dropshot Bump propolis rev to pick up: - Self assembling zones must set their own MTU - Bump crucible rev to latest - Make the propolis zone self-assembling - Flesh out more PIIX3-PM to suppress log gripes - Bump crucible rev to latest - Restructure PM-timer under PIIX3 device - Fix inventory handling for nested child entities - Centralize vmm-data interface into bhyve_api - Clean up PCI device classes - Update openssl dep to 0.10.55 - Allow propolis-standalone to use VMM reservoir - only allow one request to reboot to be enqueued at a time Nexus is currently setting the MTU inside self-assembling zones, but this goes against the idea that self-assembling zones perform their own configuration. Remove the code in `RunningZone::boot` that performs commands on self-assembling zones, and set the MTU of $DATALINK in the Clickhouse and CockroachDB method scripts. Fixes oxidecomputer#3512
Bump crucible rev to pick up: - Self assembling zones must set their own MTU - fix build break caused by merging oxidecomputer/crucible#801 - Issue extent flushes in parallel - Add Logger Option to Volume construct method - Update Rust crate libc to 0.2.147 - Update Rust crate percent-encoding to 2.3 - Retry jobs until they succeed - Reorder select arms so pings can't be starved out - Treat a skipped IO like an error IO for ACK results. - Retry pantry requests - Remove panics and asserts in dummy tests - Update Rust crate csv to 1.2.2 - Update Rust crate reedline to 0.21.0 - Set open file resource limit to the max - Update Rust crate ringbuffer to 0.14 - DTrace meet cmon - Widen assert values to u128 to deal with u64::MAX - Change size_to_validate from usize to u64 - Turn on live-repair test in CI - Increase flush_timeout for some tests, collect cores - Update to latest dropshot Bump propolis rev to pick up: - Self assembling zones must set their own MTU - Bump crucible rev to latest - Make the propolis zone self-assembling - Flesh out more PIIX3-PM to suppress log gripes - Bump crucible rev to latest - Restructure PM-timer under PIIX3 device - Fix inventory handling for nested child entities - Centralize vmm-data interface into bhyve_api - Clean up PCI device classes - Update openssl dep to 0.10.55 - Allow propolis-standalone to use VMM reservoir - only allow one request to reboot to be enqueued at a time Nexus is currently setting the MTU inside self-assembling zones, but this goes against the idea that self-assembling zones perform their own configuration. Remove the code in `RunningZone::boot` that performs commands on self-assembling zones, and set the MTU of $DATALINK in the Clickhouse and CockroachDB method scripts. Fixes oxidecomputer#3512
Bump crucible rev to pick up: - Self assembling zones must set their own MTU - fix build break caused by merging oxidecomputer/crucible#801 - Issue extent flushes in parallel - Add Logger Option to Volume construct method - Update Rust crate libc to 0.2.147 - Update Rust crate percent-encoding to 2.3 - Retry jobs until they succeed - Reorder select arms so pings can't be starved out - Treat a skipped IO like an error IO for ACK results. - Retry pantry requests - Remove panics and asserts in dummy tests - Update Rust crate csv to 1.2.2 - Update Rust crate reedline to 0.21.0 - Set open file resource limit to the max - Update Rust crate ringbuffer to 0.14 - DTrace meet cmon - Widen assert values to u128 to deal with u64::MAX - Change size_to_validate from usize to u64 - Turn on live-repair test in CI - Increase flush_timeout for some tests, collect cores - Update to latest dropshot Bump propolis rev to pick up: - Self assembling zones must set their own MTU - Bump crucible rev to latest - Make the propolis zone self-assembling - Flesh out more PIIX3-PM to suppress log gripes - Bump crucible rev to latest - Restructure PM-timer under PIIX3 device - Fix inventory handling for nested child entities - Centralize vmm-data interface into bhyve_api - Clean up PCI device classes - Update openssl dep to 0.10.55 - Allow propolis-standalone to use VMM reservoir - only allow one request to reboot to be enqueued at a time Nexus is currently setting the MTU inside self-assembling zones, but this goes against the idea that self-assembling zones perform their own configuration. Remove the code in `RunningZone::boot` that performs commands on self-assembling zones, and set the MTU of $DATALINK in the Clickhouse and CockroachDB method scripts. Fixes #3512
First, move the concept of whether an Extent is opened or closed out of Extent by removing the Option around Mutex:
This moves up to Region as a new enum:
where ExtentState uses the variant to store if the extent is opened or closed:
This is all in service of
fn region_flushissuing atokio::spawnfor eachextent.flushcall, and waiting on each join handle that is returned. This is possible when anArc<Extent>is used because a clone can be passed to each task.Using an enum in this way is also a little safer:
Region::close_extentactually consumes the old extent while closing it, meaning that the old Extent no longer hangs around. It does however mean that the downstairs will panic if the ExtentState is not expected: bothget_opened_extentandclose_extentpanic if the extent in question is closed.close_extentalso panics if you're trying to close an extent where there's an Arc clone somewhere as part of doing some other work.Note this work required updating to Rust 1.70 (for
Arc::into_inner). There's a few unrelated changes due to this.