IAM: support assigning roles on the Fleet by davepacheco · Pull Request #1051 · oxidecomputer/omicron

davepacheco · 2022-05-09T22:51:47Z

This change adds support for assigning global roles.

The biggest change was reworking the role_assignment test. Sorry for all the churn here today. The test feels a little overcomplicated now, but it's cleaner in that we're properly abstracting the differences between the test at the Fleet level vs. Silo level vs. for other resources. And we're still sharing the code that implements the guts of the test. I like this because there are a bunch of pieces to it and I'd rather we not copy/paste those for each kind of resource that supports role assignments.

Also worth noting: the authz subsystem uses "the Fleet" (from RFD 24) to mean "global". To be clear, the roles being assigned here are global to "the rack" in our MVP. When we grow to support multi-rack deployments, they will probably be scoped to the availability zone or region, not Fleet in the way that RFD 24 defines it. But I'm not yet sure which, and I'm not sure we want to commit to that in the MVP. We should resolve this, but I'd like to defer it for now so I've stuck with "Fleet" here. I've added an item to #849 to revisit this.

plotnick

The churn is no problem - it lead to cleaner code. I like the new trait much more than the magic argument.

davepacheco · 2022-05-09T23:52:22Z

While self-reviewing this, I saw the comment about disallowing people from assigning the "external-authenticator" role. I thought I'd add a test for that and discovered that it works (which is bad). I'm not sure how yet, given the #[serde(skip_deserializing)] on that variant. I confirmed that this annotation normally does prevent you from deserializing a JSON value with such a variant:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a83f66d46e0ebea5107bdac3b3b328b4

I think the next step will be to create a minimal dropshot example and dig in. Even so, the serde annotation does prevent this variant from showing up in the OpenAPI spec. That's a nice plus.

Anyway, fixing this won't be easy: the fleet-wide policy comes with an assignment of "external-authenticator" (for internal use). So if we disallow the update-fleet-policy endpoint from accepting that role, then you won't be able to change the policy except by removing this role -- which, come to think of it, you shouldn't be able to do anyway. That'd break the system badly.

It feels like we should hide assignments like this from the "fetch policy" endpoint and leave them alone in the "update policy" endpoint. There are a few ways to do this:

Hide any assignments to a built-in user. Built-in users are total implementation details anyway. I initially left these in because it was easier and more transparent to include them, but this problem changes things. This might be kind of an invasive change but I think it's the cleanest of these options.
Add a boolean on the role_builtin table indicating that a role is internal-only and hide assignments to internal roles. In this world, any assignment to "external-authenticator" would be hidden. In practice, this shouldn't be any different from the first case because it shouldn't be possible to assign this role to anything other than the one built-in user that already gets it. This approach could be better if we want to still be able to report things like "this part of Nexus can read anything in the Fleet". But that doesn't feel very principled: why report that and not the external-authenticator thing? Plus, again, these internal users are all meaningless to customers.
Add a boolean on the role_assignment_builtin table indicating the assignment is internal-only. In this world, you could have another built-in user with the "external-authenticator" role on the Fleet that we would report. I can't see why this would be useful.

By "hide assignments", I mean that the fetch-policy API endpoint would explicitly ignore these role assignments and the update-policy API endpoint would not invalidate them when it replaces the policy. Having done that, we could remove the enum variant for this role from authz::FleetRoles. These roles would still exist in the database and internally, but you couldn't see them in the API.

I think hiding assignments to built-in users is probably the right direction, but it creates another problem, which is that we're still using built-in users to test all this, so the whole test would have to go.

So I think my plan is:

Figure out why this works at all. This doesn't really matter given the solution above but I want to understand if there's a dropshot bug here or something that could affect us elsewhere.
Change the policy stuff to only report and update roles for silo users.
See if there's some way to preserve the test (e.g., is there some internal interface for creating a silo user that we can use?)

davepacheco · 2022-05-16T20:18:27Z

I saw the comment about disallowing people from assigning the "external-authenticator" role. I thought I'd add a test for that and discovered that it works (which is bad). I'm not sure how yet, given the #[serde(skip_deserializing)] on that variant.

This problem appears to result from derive(FromStr) and derive(DeserializeFromStr). That is, we're using parse_display's ability to derive FromStr, and then using serde_with's ability to derive Deserialize from the FromStr impl. It makes sense that parse_display doesn't know anything about the serde(skip_deserializing) annotation, so the generated FromStr impl allows that variant, and so the generated Deserialize impl allows it. I haven't dug very deep into this, but I gather the JsonSchema impl does see the skip_deserializing annotation and that's why the generated openapi schema leaves out include the annotated variant.

Details

I modified the dropshot `basic.rs` example to test this. Here's the meat:

use dropshot::endpoint;
use dropshot::ApiDescription;
use dropshot::ConfigDropshot;
use dropshot::ConfigLogging;
use dropshot::ConfigLoggingLevel;
use dropshot::HttpError;
use dropshot::HttpResponseOk;
use dropshot::HttpServerStarter;
use dropshot::RequestContext;
use dropshot::TypedBody;
use parse_display::Display;
use parse_display::FromStr;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;
use slog::info;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), String> {
    let config_dropshot: ConfigDropshot = Default::default();
    let config_logging =
        ConfigLogging::StderrTerminal { level: ConfigLoggingLevel::Info };
    let log = config_logging
        .to_logger("test-skip-deserializing")
        .map_err(|error| format!("failed to create logger: {}", error))?;
    let mut api = ApiDescription::new();
    api.register(example_api_echo_1).unwrap();
    api.register(example_api_echo_2).unwrap();
    api.openapi("test-deserialize", "0.0.0").write(&mut std::io::stdout()).unwrap();
    let api_context = ();
    let server =
        HttpServerStarter::new(&config_dropshot, api, api_context, &log)
            .map_err(|error| format!("failed to create server: {}", error))?
            .start();
    server.await
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing1 {
    kind: MyThingKind1,
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
enum MyThingKind1 {
    Variant1,
    Variant2,
    #[serde(skip_deserializing)]
    Variant3,
}

#[derive(Serialize, Deserialize, Debug, JsonSchema)]
struct MyThing2 {
    kind: MyThingKind2,
}

#[derive(
    Display, FromStr, SerializeDisplay, DeserializeFromStr, Debug, JsonSchema,
)]
enum MyThingKind2 {
    Variant1,
    Variant2,
    #[serde(skip_deserializing)]
    Variant3,
}

type ExampleContext = ();

/** Return the thing we were given */
#[endpoint {
    method = PUT,
    path = "/thing1",
}]
async fn example_api_echo_1(
    rqctx: Arc<RequestContext<ExampleContext>>,
    given: TypedBody<MyThing1>,
) -> Result<HttpResponseOk<MyThing1>, HttpError> {
    info!(rqctx.log, "got"; "given" => ?given);
    Ok(HttpResponseOk(given.into_inner()))
}

/** Return the thing we were given */
#[endpoint {
    method = PUT,
    path = "/thing2",
}]
async fn example_api_echo_2(
    rqctx: Arc<RequestContext<ExampleContext>>,
    given: TypedBody<MyThing2>,
) -> Result<HttpResponseOk<MyThing2>, HttpError> {
    info!(rqctx.log, "got"; "given" => ?given);
    Ok(HttpResponseOk(given.into_inner()))
}

I also needed:

--- a/dropshot/Cargo.toml
+++ b/dropshot/Cargo.toml
@@ -77,6 +77,8 @@ hyper-staticfile = "0.8"
 lazy_static = "1.4.0"
 libc = "0.2.125"
 mime_guess = "2.0.4"
+parse-display = "0.5"
+serde_with = "1.13"
 subprocess = "0.2.8"
 tempfile = "3.3"
 trybuild = "1.0.53"

With this server, the thing1 endpoint (which directly derives Serialize/Deserialize) works as expected:

$ echo '{"kind": "Variant3"}' | curl -s -T- -i 127.0.0.1:50457/thing1 -X PUT 
HTTP/1.1 100 Continue

HTTP/1.1 400 Bad Request
content-type: application/json
x-request-id: 5adfb428-6560-4569-8f43-c0b6b488e1b9
content-length: 178
date: Mon, 16 May 2022 20:09:42 GMT

{
  "request_id": "5adfb428-6560-4569-8f43-c0b6b488e1b9",
  "message": "unable to parse body: unknown variant `Variant3`, expected `Variant1` or `Variant2` at line 1 column 19"
}

The thing2 endpoint (which uses FromStr/DeserializeFromStr, like the original case in omicron) has the same problem we saw in omicron:

$ echo '{"kind": "Variant3"}' | curl -s -T- -i 127.0.0.1:50457/thing2 -X PUT 
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
content-type: application/json
x-request-id: edb7d4ce-4583-49ac-9d79-f5da332d7943
content-length: 19
date: Mon, 16 May 2022 20:10:31 GMT

{"kind":"Variant3"}

In the generated OpenAPI spec, both enums don't have Variant3:

    "schemas": {
...
      "MyThingKind1": {
        "type": "string",
        "enum": [
          "Variant1",
          "Variant2"
        ]
      },
      "MyThingKind2": {
        "type": "string",
        "enum": [
          "Variant1",
          "Variant2"
        ]
      }
    }
  }

The problem at this point is that the tests fail because: - we can only see/modify role assignments for silo users - we can only authenticate as a built-in user so there's no way to actually test the enforcement.

davepacheco · 2022-05-17T00:44:16Z

The latest round of commits makes the change I proposed: we now only fetch and update role assignments on non-builtin users. This created a complexity for the tests because currently we have no way to authenticate silo users. The solution here was to extend the oxide-spoof authenticator to authenticate silo users by default, with a way to specify you want a built-in user instead. This will hopefully get cleaner once we have more first-class silo users because we'll be able to create the "privileged" and "unprivileged" test users in the test suite (instead of assuming they're built-in) and use those.

davepacheco · 2022-05-17T00:45:12Z

@plotnick do you want to take a look at the latest round of changes?

plotnick

I think that this is more than fine for now. I'm not wild about introducing more complexity to the spoof authn, but I also don't see any other way before non-spoof authn is available. I imagine that we'll want to revisit a lot of this code (especially the tests) when that point comes, but in the meantime I think this gets us where we want to be. And as always I appreciate the thoroughness of your analysis and testing; thank you.

plotnick · 2022-05-17T16:09:57Z

 )]
 #[serde(rename_all = "snake_case")]
 pub enum IdentityType {
-    UserBuiltin,


Nice. Not exposing built-in users publicly seems like a big win.

IAM: support assigning roles on the Fleet

30e06a0

davepacheco requested a review from plotnick May 9, 2022 22:51

davepacheco mentioned this pull request May 9, 2022

tracking issue for MVP IAM work #849

Closed

69 tasks

plotnick approved these changes May 9, 2022

View reviewed changes

davepacheco added 10 commits May 16, 2022 13:24

nits (will need test update)

c3f8de8

add test

9f34eb0

built-in-user role assignments should not be visible

224aa4f

The problem at this point is that the tests fail because: - we can only see/modify role assignments for silo users - we can only authenticate as a built-in user so there's no way to actually test the enforcement.

support silo user in spoof authn scheme

ba0b143

fix one xxx

d888391

flip the default for compat; document and test

6706666

more docs

b2a15a0

fix tests

9e7b851

Merge remote-tracking branch 'origin/main' into role-assignments-global

96ab4c5

fix style

a8bc5b3

davepacheco marked this pull request as ready for review May 17, 2022 00:44

fix clippy

ee1f463

plotnick approved these changes May 17, 2022

View reviewed changes

davepacheco added 2 commits May 17, 2022 14:42

Merge remote-tracking branch 'origin/main' into role-assignments-global

99aa530

update for consistency with #1024

a90d529

davepacheco enabled auto-merge (squash) May 17, 2022 21:50

fix test

1db340b

davepacheco merged commit 5b45003 into main May 17, 2022

davepacheco deleted the role-assignments-global branch May 17, 2022 22:54

davepacheco mentioned this pull request May 19, 2022

clean up "test-privileged", "test-unprivileged" users #1087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IAM: support assigning roles on the Fleet#1051

IAM: support assigning roles on the Fleet#1051
davepacheco merged 15 commits into
mainfrom
role-assignments-global

davepacheco commented May 9, 2022

Uh oh!

plotnick left a comment

Uh oh!

davepacheco commented May 9, 2022

Uh oh!

davepacheco commented May 16, 2022

Uh oh!

davepacheco commented May 17, 2022

Uh oh!

davepacheco commented May 17, 2022

Uh oh!

plotnick left a comment

Uh oh!

plotnick May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davepacheco commented May 9, 2022

Uh oh!

plotnick left a comment

Choose a reason for hiding this comment

Uh oh!

davepacheco commented May 9, 2022

Uh oh!

davepacheco commented May 16, 2022

Uh oh!

davepacheco commented May 17, 2022

Uh oh!

davepacheco commented May 17, 2022

Uh oh!

plotnick left a comment

Choose a reason for hiding this comment

Uh oh!

plotnick May 17, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants