IAM: support assigning roles on the Fleet#1051
Conversation
plotnick
left a comment
There was a problem hiding this comment.
The churn is no problem - it lead to cleaner code. I like the new trait much more than the magic argument.
|
While self-reviewing this, I saw the comment about disallowing people from assigning the "external-authenticator" role. I thought I'd add a test for that and discovered that it works (which is bad). I'm not sure how yet, given the I think the next step will be to create a minimal dropshot example and dig in. Even so, the serde annotation does prevent this variant from showing up in the OpenAPI spec. That's a nice plus. Anyway, fixing this won't be easy: the fleet-wide policy comes with an assignment of "external-authenticator" (for internal use). So if we disallow the update-fleet-policy endpoint from accepting that role, then you won't be able to change the policy except by removing this role -- which, come to think of it, you shouldn't be able to do anyway. That'd break the system badly. It feels like we should hide assignments like this from the "fetch policy" endpoint and leave them alone in the "update policy" endpoint. There are a few ways to do this:
By "hide assignments", I mean that the fetch-policy API endpoint would explicitly ignore these role assignments and the update-policy API endpoint would not invalidate them when it replaces the policy. Having done that, we could remove the enum variant for this role from I think hiding assignments to built-in users is probably the right direction, but it creates another problem, which is that we're still using built-in users to test all this, so the whole test would have to go. So I think my plan is:
|
This problem appears to result from DetailsI modified the dropshot `basic.rs` example to test this. Here's the meat:use dropshot::endpoint;
use dropshot::ApiDescription;
use dropshot::ConfigDropshot;
use dropshot::ConfigLogging;
use dropshot::ConfigLoggingLevel;
use dropshot::HttpError;
use dropshot::HttpResponseOk;
use dropshot::HttpServerStarter;
use dropshot::RequestContext;
use dropshot::TypedBody;
use parse_display::Display;
use parse_display::FromStr;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;
use slog::info;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), String> {
let config_dropshot: ConfigDropshot = Default::default();
let config_logging =
ConfigLogging::StderrTerminal { level: ConfigLoggingLevel::Info };
let log = config_logging
.to_logger("test-skip-deserializing")
.map_err(|error| format!("failed to create logger: {}", error))?;
let mut api = ApiDescription::new();
api.register(example_api_echo_1).unwrap();
api.register(example_api_echo_2).unwrap();
api.openapi("test-deserialize", "0.0.0").write(&mut std::io::stdout()).unwrap();
let api_context = ();
let server =
HttpServerStarter::new(&config_dropshot, api, api_context, &log)
.map_err(|error| format!("failed to create server: {}", error))?
.start();
server.await
}
#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing1 {
kind: MyThingKind1,
}
#[derive(Serialize, Deserialize, Debug, JsonSchema)]
enum MyThingKind1 {
Variant1,
Variant2,
#[serde(skip_deserializing)]
Variant3,
}
#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing2 {
kind: MyThingKind2,
}
#[derive(
Display, FromStr, SerializeDisplay, DeserializeFromStr, Debug, JsonSchema,
)]
enum MyThingKind2 {
Variant1,
Variant2,
#[serde(skip_deserializing)]
Variant3,
}
type ExampleContext = ();
/** Return the thing we were given */
#[endpoint {
method = PUT,
path = "/thing1",
}]
async fn example_api_echo_1(
rqctx: Arc<RequestContext<ExampleContext>>,
given: TypedBody<MyThing1>,
) -> Result<HttpResponseOk<MyThing1>, HttpError> {
info!(rqctx.log, "got"; "given" => ?given);
Ok(HttpResponseOk(given.into_inner()))
}
/** Return the thing we were given */
#[endpoint {
method = PUT,
path = "/thing2",
}]
async fn example_api_echo_2(
rqctx: Arc<RequestContext<ExampleContext>>,
given: TypedBody<MyThing2>,
) -> Result<HttpResponseOk<MyThing2>, HttpError> {
info!(rqctx.log, "got"; "given" => ?given);
Ok(HttpResponseOk(given.into_inner()))
}I also needed: With this server, the The In the generated OpenAPI spec, both enums don't have |
The problem at this point is that the tests fail because: - we can only see/modify role assignments for silo users - we can only authenticate as a built-in user so there's no way to actually test the enforcement.
|
The latest round of commits makes the change I proposed: we now only fetch and update role assignments on non-builtin users. This created a complexity for the tests because currently we have no way to authenticate silo users. The solution here was to extend the oxide-spoof authenticator to authenticate silo users by default, with a way to specify you want a built-in user instead. This will hopefully get cleaner once we have more first-class silo users because we'll be able to create the "privileged" and "unprivileged" test users in the test suite (instead of assuming they're built-in) and use those. |
|
@plotnick do you want to take a look at the latest round of changes? |
plotnick
left a comment
There was a problem hiding this comment.
I think that this is more than fine for now. I'm not wild about introducing more complexity to the spoof authn, but I also don't see any other way before non-spoof authn is available. I imagine that we'll want to revisit a lot of this code (especially the tests) when that point comes, but in the meantime I think this gets us where we want to be. And as always I appreciate the thoroughness of your analysis and testing; thank you.
| )] | ||
| #[serde(rename_all = "snake_case")] | ||
| pub enum IdentityType { | ||
| UserBuiltin, |
There was a problem hiding this comment.
Nice. Not exposing built-in users publicly seems like a big win.
This change adds support for assigning global roles.
The biggest change was reworking the role_assignment test. Sorry for all the churn here today. The test feels a little overcomplicated now, but it's cleaner in that we're properly abstracting the differences between the test at the Fleet level vs. Silo level vs. for other resources. And we're still sharing the code that implements the guts of the test. I like this because there are a bunch of pieces to it and I'd rather we not copy/paste those for each kind of resource that supports role assignments.
Also worth noting: the authz subsystem uses "the Fleet" (from RFD 24) to mean "global". To be clear, the roles being assigned here are global to "the rack" in our MVP. When we grow to support multi-rack deployments, they will probably be scoped to the availability zone or region, not Fleet in the way that RFD 24 defines it. But I'm not yet sure which, and I'm not sure we want to commit to that in the MVP. We should resolve this, but I'd like to defer it for now so I've stuck with "Fleet" here. I've added an item to #849 to revisit this.