Capacity and utilization#4696
Conversation
In RFD-427 we specify that for the initial implementation we want to require that all silos have a quota. In considering the API and implementaiton more carefully I decided that at least temporarily we should drop create/delete given that deletions should never happen and creations should only happen when the silo is created (or via some internal process which creates quotas for pre-existing silos)
There will be a follow up PR to add capacity/utilization to the API both at the silo and system levels. Given that, and the general lack of actionability of quotas to silo users, I've just dropped the quota view from the API.
…nt objectidentity
| &self.identity().name | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Yeah, the constraint of implementing ObjectIdentity felt a little unnecessary. We have several of these not-really-resource resources and we should still be able to treat as if they were (even when they don't have a resource/asset identity)
karencfv
left a comment
There was a problem hiding this comment.
Almost there! Thanks for pushing through with this :)
| } | ||
| } | ||
| }, | ||
| "/v1/utilization": { |
There was a problem hiding this comment.
Should this not be /v1/system/utilization? It also has the tag "silos" like the other endpoints. Conversely, if we want to keep this one as /v1/utilization, perhaps we could change the other endpoints to be /v1/system/utilization. Otherwise it feels like one set of endpoints is very explicit and the other is not. WDYT?
There was a problem hiding this comment.
This is the silo admin's view of the utilization inside the silo. So this is a developers way of seeing how many resources their silo can provision. I dropped /v1/system/utilization (which summarizes the entirety of the system utilization) from this PR as it was more complicated to deliver.
In my mind /v1/system/... means operator whereas /v1/... is inner silo / developer stuff
There was a problem hiding this comment.
I dropped /v1/system/utilization (which summarizes the entirety of the system utilization) from this PR as it was more complicated to deliver.
Ah! I think this was the piece of information I was missing. If we want to have this endpoint in the future then the current implementation makes sense, thanks!
| q.storage_bytes AS storage_allocated | ||
| FROM | ||
| omicron.public.virtual_provisioning_collection AS c | ||
| RIGHT JOIN omicron.public.silo_quotas AS q |
There was a problem hiding this comment.
Is this a right join because a silo which has never had a provisioning even would have no entry in virtual_provisioning_collection?
There was a problem hiding this comment.
A provisioning collection can be the fleet, a silo, or a project. I want all silo_quotas and whatever entries in the virtual_provisioning_collection as match.
david-crespo
left a comment
There was a problem hiding this comment.
Very clean! Would like to see integration tests on the new endpoints. I figure you have those on the way.
| #[endpoint { | ||
| method = GET, | ||
| path = "/v1/system/utilization/silos/{silo}", | ||
| tags = ["silos"], |
There was a problem hiding this comment.
I think these should be system/silos instead of silos
| /// Accounts for resources allocated to running instances or storage allocated via disks or snapshots | ||
| /// Note that CPU and memory resources associated with a stopped instances are not counted here | ||
| /// whereas associated disks will still be counted | ||
| pub provisioned: VirtualResourceCounts, |
There was a problem hiding this comment.
Are these supposed to have #[serde(flatten)] on them too? I think this is why VirtualResourceCounts is showing up in the spec. On the other hand it's kind of cool to not flatten it and share the structure, but it's possible the other language clients might not like it as much.
There was a problem hiding this comment.
Whoops, I see now why these are not. They're supposed to be under these keys. In any case this is still probably why VirtualResourceCounts is making it into the spec.
This PR is a follow up to #4605 which adds views into capacity and utilization both at the silo and system level.
API:
I'm not entirely satisfied w/ the silo utilization endpoints. They could be this instead:
Also take special note of the views
For users in the silo I use
provisionedandcapacityas the language. Theircapacityis represented by the quota set by an operator. For the operatorprovisionedis the same butallocatedis used to denote the amount of resources allotted via quotas.Note: I had planned to add a full system utilization endpoint to this PR but that would increase the scope. Instead will ship that API as a part of the next release. We can calculate some version of the full system utilization on the client by listing all the silos and their utilization.