Skip to content

Capacity and utilization#4696

Merged
just-be-dev merged 73 commits into
mainfrom
capacity-and-utilization
Dec 15, 2023
Merged

Capacity and utilization#4696
just-be-dev merged 73 commits into
mainfrom
capacity-and-utilization

Conversation

@just-be-dev

@just-be-dev just-be-dev commented Dec 14, 2023

Copy link
Copy Markdown
Contributor

This PR is a follow up to #4605 which adds views into capacity and utilization both at the silo and system level.

API:

op method url
silo_utilization_list GET /v1/system/utilization/silos
silo_utilization_view GET /v1/system/utilization/silos/{silo}
utilization_view GET /v1/utilization

I'm not entirely satisfied w/ the silo utilization endpoints. They could be this instead:

op method url
silo_utilization_list GET /v1/system/silos-utilization
silo_utilization_view GET /v1/system/silos/{silo}/utilization

Also take special note of the views

// For the eyes of end users
/// View of the current silo's resource utilization and capacity
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct Utilization {
    /// Accounts for resources allocated to running instances or storage allocated via disks or snapshots
    /// Note that CPU and memory resources associated with a stopped instances are not counted here
    /// whereas associated disks will still be counted
    pub provisioned: VirtualResourceCounts,
    /// The total amount of resources that can be provisioned in this silo
    /// Actions that would exceed this limit will fail
    pub capacity: VirtualResourceCounts,
}

// For the eyes of an operator
/// View of a silo's resource utilization and capacity
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct SiloUtilization {
    pub silo_id: Uuid,
    pub silo_name: Name,
    /// Accounts for resources allocated by in silos like CPU or memory for running instances and storage for disks and snapshots
    /// Note that CPU and memory resources associated with a stopped instances are not counted here
    pub provisioned: VirtualResourceCounts,
    /// Accounts for the total amount of resources reserved for silos via their quotas
    pub allocated: VirtualResourceCounts,
}

For users in the silo I use provisioned and capacity as the language. Their capacity is represented by the quota set by an operator. For the operator provisioned is the same but allocated is used to denote the amount of resources allotted via quotas.


Note: I had planned to add a full system utilization endpoint to this PR but that would increase the scope. Instead will ship that API as a part of the next release. We can calculate some version of the full system utilization on the client by listing all the silos and their utilization.

just-be-dev and others added 30 commits November 20, 2023 23:05
In RFD-427 we specify that for the initial implementation we want to require
that all silos have a quota. In considering the API and implementaiton more carefully
I decided that at least temporarily we should drop create/delete given that deletions should
never happen and creations should only happen when the silo is created (or via some internal process
which creates quotas for pre-existing silos)
There will be a follow up PR to add capacity/utilization to the API both
at the silo and system levels. Given that, and the general lack of actionability
of quotas to silo users, I've just dropped the quota view from the API.
&self.identity().name
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sneaky

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the constraint of implementing ObjectIdentity felt a little unnecessary. We have several of these not-really-resource resources and we should still be able to treat as if they were (even when they don't have a resource/asset identity)

Comment thread nexus/db-model/src/quota.rs Outdated

@karencfv karencfv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Thanks for pushing through with this :)

Comment thread openapi/nexus.json
}
}
},
"/v1/utilization": {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be /v1/system/utilization? It also has the tag "silos" like the other endpoints. Conversely, if we want to keep this one as /v1/utilization, perhaps we could change the other endpoints to be /v1/system/utilization. Otherwise it feels like one set of endpoints is very explicit and the other is not. WDYT?

@just-be-dev just-be-dev Dec 14, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the silo admin's view of the utilization inside the silo. So this is a developers way of seeing how many resources their silo can provision. I dropped /v1/system/utilization (which summarizes the entirety of the system utilization) from this PR as it was more complicated to deliver.

In my mind /v1/system/... means operator whereas /v1/... is inner silo / developer stuff

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped /v1/system/utilization (which summarizes the entirety of the system utilization) from this PR as it was more complicated to deliver.

Ah! I think this was the piece of information I was missing. If we want to have this endpoint in the future then the current implementation makes sense, thanks!

Comment thread schema/crdb/dbinit.sql
q.storage_bytes AS storage_allocated
FROM
omicron.public.virtual_provisioning_collection AS c
RIGHT JOIN omicron.public.silo_quotas AS q

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a right join because a silo which has never had a provisioning even would have no entry in virtual_provisioning_collection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A provisioning collection can be the fleet, a silo, or a project. I want all silo_quotas and whatever entries in the virtual_provisioning_collection as match.

@david-crespo david-crespo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! Would like to see integration tests on the new endpoints. I figure you have those on the way.

#[endpoint {
method = GET,
path = "/v1/system/utilization/silos/{silo}",
tags = ["silos"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be system/silos instead of silos

/// Accounts for resources allocated to running instances or storage allocated via disks or snapshots
/// Note that CPU and memory resources associated with a stopped instances are not counted here
/// whereas associated disks will still be counted
pub provisioned: VirtualResourceCounts,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these supposed to have #[serde(flatten)] on them too? I think this is why VirtualResourceCounts is showing up in the spec. On the other hand it's kind of cool to not flatten it and share the structure, but it's possible the other language clients might not like it as much.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I see now why these are not. They're supposed to be under these keys. In any case this is still probably why VirtualResourceCounts is making it into the spec.

@just-be-dev just-be-dev marked this pull request as ready for review December 14, 2023 22:36
@morlandi7 morlandi7 linked an issue Dec 14, 2023 that may be closed by this pull request
@just-be-dev just-be-dev enabled auto-merge (squash) December 15, 2023 00:57
@just-be-dev just-be-dev merged commit 5827188 into main Dec 15, 2023
@just-be-dev just-be-dev deleted the capacity-and-utilization branch December 15, 2023 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tracking Issue: Resource Allocation (Silo)

5 participants