decide whether each API should be server- or client-side-versioned#7138
Conversation
sunshowers
left a comment
There was a problem hiding this comment.
Looks great! Just a few comments, nothing major.
| ptime -m cargo xtask ls-apis deployment-units && | ||
| ptime -m cargo xtask ls-apis servers | ||
| ptime -m cargo xtask ls-apis servers && | ||
| ptime -m cargo xtask ls-apis dag-check |
There was a problem hiding this comment.
Super nit:
cargo xtask ls-apis dag-check
Could this just be check, or check-dag? I personally like anything that does a check have its name start with check.
There was a problem hiding this comment.
Okay, I clearly overthought this. I'm trying to resolve this (useful!) guideline with a couple of other rules/guidelines I use:
- I usually use the
subject-verb[-object]naming convention (hence "dag-check"), both in code and shell command names. This is pretty arbitrary but I do find it results in more discoverable code/commands because related things are grouped together. This would be a bigger deal if we ever add otherdagsubcommands (i.e., I thinkdag-checkanddag-displayis much better thancheck-daganddisplay-dag, and also lends itself todag check/dag display). - I follow clap's default of having the Rust struct/function names match the command names so it's easy to find the source for a command.
So if we change this to check-dag: (1) this gets awkward if we add other dag commands later, and (2) either we change all the Rust names also or we don't, but either way, they diverge from some of these conventions.
Like I said, I'm overthinking this.
But what if we just call this cargo xtask ls-apis check on the outside? I'd have the command (the names in src/bin/ls-apis.rs) call it "check" but the library interface would still call it dag_check()/DagCheck. If we add other checks in the future, we'd probably want them done by ls-apis check anyway.
|
I neglected to actually write down the most important results here: the sequence. Running this tool now (using This shows that:
|
|
wow, that is really some of the graphviz of all time... |

This PR:
cargo xtask ls-apis dag-checkcargo xtask ls-apis checktool that:cargo xtask ls-apis adoccommand to include this information in the summary table. The result is that the table in our wiki will include this information. See oxidecomputer/meta#524 for what this looks like. (I will land that PR after this one lands.)(note: this was called
dag-checkwhen I wrote the rest of this but it's now calledcheck)Copying a summary from the README, after this PR:
If I haven't messed this up too badly, this is a pretty big step because:
I hope this will help keep us from accidentally introducing new circular dependencies.
Goals
My main goals here, in order:
How I made the choice for each API
Where possible, we want APIs to be server-side-versioned only so that clients don't have to think about this. See the deck I linked above for more on this.
Initially, I had all APIs marked as
versioned_how = "unknown", built the check indag-checkthat the "server-side" graph be acyclic, and then built some facilities indag-checkfor it to propose that an API be client- or server-managed based on heuristics like "if this API has no deployed consumers, then it can be server-side only" and "if two components depend directly on each other, one of them must be client-side-managed".I also started with the assumption that we'll want APIs that Nexus talks to to be server-side-versioned. That's because control generally flows from Nexus to the other services. It'd be pretty painful if Nexus had to have multiple client versions for every service it talked to. Relatedly, the assumption is that the APIs from Nexus -> other services are generally more complex and volatile than in the other direction.
With this, I iteratively looked at the proposals, checked the visual DAG (from
cargo xtask ls-apis servers --output-format=dot) and made some judgment calls.The results
On success, the tool prints out a summary of how our APIs are versioned:
In terms of the client-side-managed APIs:
So all of that seems pretty reasonable and everything else was able to be server-side-managed.
Out of curiosity, I did this quick check to see how volatile each API was: in each of the Omicron, Dendrite, Maghemite, Propolis, and Crucible repos, I ran this command to count how many times each of the APIs has been changed in any way in 2024 (almost 11 months):
Putting the results from all repos together and sorting them:
The volatility of the Nexus-internal API, which is unfortunately client-side-managed, makes one wonder if we can rework things so that fewer things need to invoke it (e.g., use long polls in the other direction, or use much simpler APIs that just cause Nexus to reach back out to the other service for details? or in the case of Oximeter, eliminate the use of the API altogether as we've discussed?).
Caveats
This is still a little messy. For example, the metadata has ways to mark both an edge (dependency) and a particular API as client-side or server-side managed.
That's because the tool already supported the edge annotations, but while doing this, I decided it would be way too tedious to figure this out if I had to annotate every edge, when for the vast majority of APIs it's going to be one or the other. We should clean that up in the long term but I think it's a fair bit of work and doesn't help de-risk this plan.
I think there are a few APIs where we might want this choice to be per-edge, not per-API. For example: DPD's API is marked client-managed because of its circular dependency with Sled Agent. That may be unavoidable between those two because there's no way to update DPD before all sled agents, nor to update all sled agents before updating DPD (since both are contained in the host OS deployment unit). However, it should be easy to make sure Nexus (and maybe other DPD consumers) are updated after DPD. And that'd be very valuable. So we may want to say that some of these edges are client-side-managed while some of the others are server-side-managed.