Skip to content

prevent accidentally destroying shared machines#977

Merged
jclulow merged 1 commit into
mainfrom
key-switch-cover
Apr 27, 2022
Merged

prevent accidentally destroying shared machines#977
jclulow merged 1 commit into
mainfrom
key-switch-cover

Conversation

@jclulow

@jclulow jclulow commented Apr 26, 2022

Copy link
Copy Markdown
Collaborator

Some of the scripts in tools/ are increasingly destructive, and can
be deleterious when executed on a shared build machine where the
user has appropriate privileges. To avoid issues, we will look for
a marker file that should prevent us from doing anything (especially
as root) to mutate the machine. To mark a build machine as
off-limits:

# mkdir -p /etc/opt/oxide
# touch /etc/opt/oxide/NO_INSTALL

Some of the scripts in tools/ are increasingly destructive, and can
be deleterious when executed on a shared build machine where the
user has appropriate privileges.  To avoid issues, we will look for
a marker file that should prevent us from doing anything (especially
as root) to mutate the machine.  To marker a build machine as
off-limits:

    # mkdir -p /etc/opt/oxide
    # touch /etc/opt/oxide/NO_INSTALL
@jclulow jclulow enabled auto-merge (squash) April 26, 2022 23:39
@jclulow jclulow merged commit af29b7a into main Apr 27, 2022
@jclulow jclulow deleted the key-switch-cover branch April 27, 2022 00:02
leftwo pushed a commit that referenced this pull request Oct 16, 2023
Propolis changes:
PHD: refactor & add support Propolis server "environments" (#547)
Begin making Accessor interface more robust
Update Crucible and Omicron deps for Hakari fixes
Add cloud-init volume generation to standalone
Use specified toolchain version for all GHA checks
Use params to configure rust-toolchain in GHA
Update and lock GHA dependencies

Crucible changes:
Use regions_dataset path for apply_smf (#1000)
Don't unwrap when we can't create a dataset (#992)
Fix tests and update log messages. (#995)
Better backpressure (#990)
Update Rust crate proptest to 1.3.1 (#977)
Read only downstairs can skip Live Repair (#984)
Update Rust crate expectorate to 1.1.0 (#975)
Add trait for `ExtentInner` (#982)
report backpressure in upstairs_info dtrace probe (#987)
Support multiple downstairs operations in GtoS (#985)
leftwo added a commit that referenced this pull request Oct 16, 2023
Propolis changes:
PHD: refactor & add support Propolis server "environments" (#547) Begin
making Accessor interface more robust
Update Crucible and Omicron deps for Hakari fixes
Add cloud-init volume generation to standalone
Use specified toolchain version for all GHA checks Use params to
configure rust-toolchain in GHA
Update and lock GHA dependencies

Crucible changes:
Use regions_dataset path for apply_smf (#1000)
Don't unwrap when we can't create a dataset (#992) Fix tests and update
log messages. (#995)
Better backpressure (#990)
Update Rust crate proptest to 1.3.1 (#977)
Read only downstairs can skip Live Repair (#984)
Update Rust crate expectorate to 1.1.0 (#975)
Add trait for `ExtentInner` (#982)
report backpressure in upstairs_info dtrace probe (#987) Support
multiple downstairs operations in GtoS (#985)

---------

Co-authored-by: Alan Hanson <alan@oxide.computer>
iximeow added a commit that referenced this pull request Jan 8, 2026
Propolis changes are pretty important, it's unfortunate I'd overlooked
driving the changes through to Omicron. The full list of changes:

* Do not lose wakeups for block device workers (#973)
* fix arg for dtrace script (#978)
  - no changes to Propolis
* Add support for programmable SMBIOS Type 1 table (#977)
  - unused in Nexus
* NVMe reset can discard request Permits early (#983)
* Wire up viona notify fast path (#754)
* distinguish probes across NVMe devices (#993)
* update dropshot to 0.16.6, dropshot-api-manager to 0.3.0 (#992)
* distinguish file backend thread names across backend instances (#997)
* bump softnpu (#1000)
* block/file: do pread/pwrite from Propolis heap instead of VM memory (#985)
* bins: intial CPU binding support (#991)

At least some of these are fixes for bugs we've seen in the last month
or two (#983 made for a very coreful dogfood update as Propolises were
stopped), many are either unused by Nexus (#977, #987, #993, #997) or
just not reachable from a Nexus-configured VM *yet* (#973, #985).

Both #991 and #754 are effectful today.

The initial CPU binding in Propolis is such that if a VM would use more
than half of a sled's CPUs, we explicitly bind vCPU threads to uppermost
CPUs 1:1. This helps keep VM exit/reentry quick, mostly interesting
while under high I/O load. In the limit this binding should be chosen by
Nexus, and probably should be applied for all VMs.

Additionally, #754 affects all VM NICs. It's just to avoid going out to
Propolis and back for what really can be handled in-kernel; there should
be no behavioral change here.

Finally, while there *is* a new `propolis-server` API version, and I
have sled-agent using it, I'm intentionally not plumbing that through to
Nexus. There isn't anything valuable for Nexus to do with it, so it
would be a new sled-agent API rev to include the new Propolis API type
.. for no reason. Instead, I've opted to have sled-agent do the lossless
conversion from older Propolis spec as provided by Nexus, to newer
Propolis spec.

We'll definitely have more changes to the Propolis API that will go
through to Nexus, such as aforementioned CPU binding assignments, so I
figure this will get more normalized sooner than later.
askfongjojo pushed a commit that referenced this pull request Jan 9, 2026
Propolis changes are pretty important, it's unfortunate I'd overlooked
driving the changes through to Omicron. The full list of changes:

* Do not lose wakeups for block device workers (#973)
* fix arg for dtrace script (#978)
  - no changes to Propolis
* Add support for programmable SMBIOS Type 1 table (#977)
  - unused in Nexus
* NVMe reset can discard request Permits early (#983)
* Wire up viona notify fast path (#754)
* distinguish probes across NVMe devices (#993)
* update dropshot to 0.16.6, dropshot-api-manager to 0.3.0 (#992)
* distinguish file backend thread names across backend instances (#997)
* bump softnpu (#1000)
* block/file: do pread/pwrite from Propolis heap instead of VM memory (#985)
* bins: intial CPU binding support (#991)

At least some of these are fixes for bugs we've seen in the last month
or two (#983 made for a very coreful dogfood update as Propolises were
stopped), many are either unused by Nexus (#977, #987, #993, #997) or
just not reachable from a Nexus-configured VM *yet* (#973, #985).

Both #991 and #754 are effectful today.

The initial CPU binding in Propolis is such that if a VM would use more
than half of a sled's CPUs, we explicitly bind vCPU threads to uppermost
CPUs 1:1. This helps keep VM exit/reentry quick, mostly interesting
while under high I/O load. In the limit this binding should be chosen by
Nexus, and probably should be applied for all VMs.

Additionally, #754 affects all VM NICs. It's just to avoid going out to
Propolis and back for what really can be handled in-kernel; there should
be no behavioral change here.

Finally, while there *is* a new `propolis-server` API version, and I
have sled-agent using it, I'm intentionally not plumbing that through to
Nexus. There isn't anything valuable for Nexus to do with it, so it
would be a new sled-agent API rev to include the new Propolis API type
.. for no reason. Instead, I've opted to have sled-agent do the lossless
conversion from older Propolis spec as provided by Nexus, to newer
Propolis spec.

We'll definitely have more changes to the Propolis API that will go
through to Nexus, such as aforementioned CPU binding assignments, so I
figure this will get more normalized sooner than later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants