Skip to content

IAM: support assigning roles on the Fleet#1051

Merged
davepacheco merged 15 commits into
mainfrom
role-assignments-global
May 17, 2022
Merged

IAM: support assigning roles on the Fleet#1051
davepacheco merged 15 commits into
mainfrom
role-assignments-global

Conversation

@davepacheco

Copy link
Copy Markdown
Collaborator

This change adds support for assigning global roles.

The biggest change was reworking the role_assignment test. Sorry for all the churn here today. The test feels a little overcomplicated now, but it's cleaner in that we're properly abstracting the differences between the test at the Fleet level vs. Silo level vs. for other resources. And we're still sharing the code that implements the guts of the test. I like this because there are a bunch of pieces to it and I'd rather we not copy/paste those for each kind of resource that supports role assignments.

Also worth noting: the authz subsystem uses "the Fleet" (from RFD 24) to mean "global". To be clear, the roles being assigned here are global to "the rack" in our MVP. When we grow to support multi-rack deployments, they will probably be scoped to the availability zone or region, not Fleet in the way that RFD 24 defines it. But I'm not yet sure which, and I'm not sure we want to commit to that in the MVP. We should resolve this, but I'd like to defer it for now so I've stuck with "Fleet" here. I've added an item to #849 to revisit this.

@davepacheco davepacheco requested a review from plotnick May 9, 2022 22:51
@davepacheco davepacheco mentioned this pull request May 9, 2022
69 tasks

@plotnick plotnick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The churn is no problem - it lead to cleaner code. I like the new trait much more than the magic argument.

@davepacheco

Copy link
Copy Markdown
Collaborator Author

While self-reviewing this, I saw the comment about disallowing people from assigning the "external-authenticator" role. I thought I'd add a test for that and discovered that it works (which is bad). I'm not sure how yet, given the #[serde(skip_deserializing)] on that variant. I confirmed that this annotation normally does prevent you from deserializing a JSON value with such a variant:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a83f66d46e0ebea5107bdac3b3b328b4

I think the next step will be to create a minimal dropshot example and dig in. Even so, the serde annotation does prevent this variant from showing up in the OpenAPI spec. That's a nice plus.

Anyway, fixing this won't be easy: the fleet-wide policy comes with an assignment of "external-authenticator" (for internal use). So if we disallow the update-fleet-policy endpoint from accepting that role, then you won't be able to change the policy except by removing this role -- which, come to think of it, you shouldn't be able to do anyway. That'd break the system badly.

It feels like we should hide assignments like this from the "fetch policy" endpoint and leave them alone in the "update policy" endpoint. There are a few ways to do this:

  • Hide any assignments to a built-in user. Built-in users are total implementation details anyway. I initially left these in because it was easier and more transparent to include them, but this problem changes things. This might be kind of an invasive change but I think it's the cleanest of these options.
  • Add a boolean on the role_builtin table indicating that a role is internal-only and hide assignments to internal roles. In this world, any assignment to "external-authenticator" would be hidden. In practice, this shouldn't be any different from the first case because it shouldn't be possible to assign this role to anything other than the one built-in user that already gets it. This approach could be better if we want to still be able to report things like "this part of Nexus can read anything in the Fleet". But that doesn't feel very principled: why report that and not the external-authenticator thing? Plus, again, these internal users are all meaningless to customers.
  • Add a boolean on the role_assignment_builtin table indicating the assignment is internal-only. In this world, you could have another built-in user with the "external-authenticator" role on the Fleet that we would report. I can't see why this would be useful.

By "hide assignments", I mean that the fetch-policy API endpoint would explicitly ignore these role assignments and the update-policy API endpoint would not invalidate them when it replaces the policy. Having done that, we could remove the enum variant for this role from authz::FleetRoles. These roles would still exist in the database and internally, but you couldn't see them in the API.

I think hiding assignments to built-in users is probably the right direction, but it creates another problem, which is that we're still using built-in users to test all this, so the whole test would have to go.

So I think my plan is:

  • Figure out why this works at all. This doesn't really matter given the solution above but I want to understand if there's a dropshot bug here or something that could affect us elsewhere.
  • Change the policy stuff to only report and update roles for silo users.
  • See if there's some way to preserve the test (e.g., is there some internal interface for creating a silo user that we can use?)

@davepacheco

Copy link
Copy Markdown
Collaborator Author

I saw the comment about disallowing people from assigning the "external-authenticator" role. I thought I'd add a test for that and discovered that it works (which is bad). I'm not sure how yet, given the #[serde(skip_deserializing)] on that variant.

This problem appears to result from derive(FromStr) and derive(DeserializeFromStr). That is, we're using parse_display's ability to derive FromStr, and then using serde_with's ability to derive Deserialize from the FromStr impl. It makes sense that parse_display doesn't know anything about the serde(skip_deserializing) annotation, so the generated FromStr impl allows that variant, and so the generated Deserialize impl allows it. I haven't dug very deep into this, but I gather the JsonSchema impl does see the skip_deserializing annotation and that's why the generated openapi schema leaves out include the annotated variant.

Details I modified the dropshot `basic.rs` example to test this. Here's the meat:
use dropshot::endpoint;
use dropshot::ApiDescription;
use dropshot::ConfigDropshot;
use dropshot::ConfigLogging;
use dropshot::ConfigLoggingLevel;
use dropshot::HttpError;
use dropshot::HttpResponseOk;
use dropshot::HttpServerStarter;
use dropshot::RequestContext;
use dropshot::TypedBody;
use parse_display::Display;
use parse_display::FromStr;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;
use slog::info;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), String> {
    let config_dropshot: ConfigDropshot = Default::default();
    let config_logging =
        ConfigLogging::StderrTerminal { level: ConfigLoggingLevel::Info };
    let log = config_logging
        .to_logger("test-skip-deserializing")
        .map_err(|error| format!("failed to create logger: {}", error))?;
    let mut api = ApiDescription::new();
    api.register(example_api_echo_1).unwrap();
    api.register(example_api_echo_2).unwrap();
    api.openapi("test-deserialize", "0.0.0").write(&mut std::io::stdout()).unwrap();
    let api_context = ();
    let server =
        HttpServerStarter::new(&config_dropshot, api, api_context, &log)
            .map_err(|error| format!("failed to create server: {}", error))?
            .start();
    server.await
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing1 {
    kind: MyThingKind1,
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
enum MyThingKind1 {
    Variant1,
    Variant2,
    #[serde(skip_deserializing)]
    Variant3,
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing2 {
    kind: MyThingKind2,
}

#[derive(
    Display, FromStr, SerializeDisplay, DeserializeFromStr, Debug, JsonSchema,
)]
enum MyThingKind2 {
    Variant1,
    Variant2,
    #[serde(skip_deserializing)]
    Variant3,
}

type ExampleContext = ();

/** Return the thing we were given */
#[endpoint {
    method = PUT,
    path = "/thing1",
}]
async fn example_api_echo_1(
    rqctx: Arc<RequestContext<ExampleContext>>,
    given: TypedBody<MyThing1>,
) -> Result<HttpResponseOk<MyThing1>, HttpError> {
    info!(rqctx.log, "got"; "given" => ?given);
    Ok(HttpResponseOk(given.into_inner()))
}

/** Return the thing we were given */
#[endpoint {
    method = PUT,
    path = "/thing2",
}]
async fn example_api_echo_2(
    rqctx: Arc<RequestContext<ExampleContext>>,
    given: TypedBody<MyThing2>,
) -> Result<HttpResponseOk<MyThing2>, HttpError> {
    info!(rqctx.log, "got"; "given" => ?given);
    Ok(HttpResponseOk(given.into_inner()))
}

I also needed:

--- a/dropshot/Cargo.toml
+++ b/dropshot/Cargo.toml
@@ -77,6 +77,8 @@ hyper-staticfile = "0.8"
 lazy_static = "1.4.0"
 libc = "0.2.125"
 mime_guess = "2.0.4"
+parse-display = "0.5"
+serde_with = "1.13"
 subprocess = "0.2.8"
 tempfile = "3.3"
 trybuild = "1.0.53"

With this server, the thing1 endpoint (which directly derives Serialize/Deserialize) works as expected:

$ echo '{"kind": "Variant3"}' | curl -s -T- -i 127.0.0.1:50457/thing1 -X PUT 
HTTP/1.1 100 Continue

HTTP/1.1 400 Bad Request
content-type: application/json
x-request-id: 5adfb428-6560-4569-8f43-c0b6b488e1b9
content-length: 178
date: Mon, 16 May 2022 20:09:42 GMT

{
  "request_id": "5adfb428-6560-4569-8f43-c0b6b488e1b9",
  "message": "unable to parse body: unknown variant `Variant3`, expected `Variant1` or `Variant2` at line 1 column 19"
}

The thing2 endpoint (which uses FromStr/DeserializeFromStr, like the original case in omicron) has the same problem we saw in omicron:

$ echo '{"kind": "Variant3"}' | curl -s -T- -i 127.0.0.1:50457/thing2 -X PUT 
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
content-type: application/json
x-request-id: edb7d4ce-4583-49ac-9d79-f5da332d7943
content-length: 19
date: Mon, 16 May 2022 20:10:31 GMT

{"kind":"Variant3"}

In the generated OpenAPI spec, both enums don't have Variant3:

    "schemas": {
...
      "MyThingKind1": {
        "type": "string",
        "enum": [
          "Variant1",
          "Variant2"
        ]
      },
      "MyThingKind2": {
        "type": "string",
        "enum": [
          "Variant1",
          "Variant2"
        ]
      }
    }
  }

@davepacheco

Copy link
Copy Markdown
Collaborator Author

The latest round of commits makes the change I proposed: we now only fetch and update role assignments on non-builtin users. This created a complexity for the tests because currently we have no way to authenticate silo users. The solution here was to extend the oxide-spoof authenticator to authenticate silo users by default, with a way to specify you want a built-in user instead. This will hopefully get cleaner once we have more first-class silo users because we'll be able to create the "privileged" and "unprivileged" test users in the test suite (instead of assuming they're built-in) and use those.

@davepacheco davepacheco marked this pull request as ready for review May 17, 2022 00:44
@davepacheco

Copy link
Copy Markdown
Collaborator Author

@plotnick do you want to take a look at the latest round of changes?

@plotnick plotnick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is more than fine for now. I'm not wild about introducing more complexity to the spoof authn, but I also don't see any other way before non-spoof authn is available. I imagine that we'll want to revisit a lot of this code (especially the tests) when that point comes, but in the meantime I think this gets us where we want to be. And as always I appreciate the thoroughness of your analysis and testing; thank you.

)]
#[serde(rename_all = "snake_case")]
pub enum IdentityType {
UserBuiltin,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Not exposing built-in users publicly seems like a big win.

@davepacheco davepacheco enabled auto-merge (squash) May 17, 2022 21:50
@davepacheco davepacheco merged commit 5b45003 into main May 17, 2022
@davepacheco davepacheco deleted the role-assignments-global branch May 17, 2022 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants