Skip to content

Session cookie auth scheme for console#326

Merged
david-crespo merged 42 commits into
mainfrom
cookie-auth
Nov 2, 2021
Merged

Session cookie auth scheme for console#326
david-crespo merged 42 commits into
mainfrom
cookie-auth

Conversation

@david-crespo

@david-crespo david-crespo commented Oct 20, 2021

Copy link
Copy Markdown
Contributor

If console is going to remain client-side only, then Nexus likely needs to be able to accept cookie-based auth. What I envision is very basic and standard: on login (by whatever mechanism), Nexus generates a random session token, stores it in a table with a TTL and an associated user, and sends it down to the console in a cookie. On subsequent requests, if the cookie is present, we look it up in the database to tell whether the request is authenticated.

This PR handles the part where we auth subsequent requests. I will do a provisional login endpoint that sets the cookie in a separate PR.

A lot of the decisions here make reference to the OWASP Session Management Cheat Sheet.

Follow-up work

See #345 v1 session management cleanup

How it works

When a request comes in, we look at the cookie with the key session (a name we choose; a generic name is recommended). If the cookie isn't there, we consider this a NotRequested (a pass or continue or next if you think of this as a middleware).

If it is there, the value of the cookie is a random token that may point to a row in the ConsoleSession table. We look up the row, and if it exists, we have to decide whether the session is active or expired. There are two limits on session length: an idle TTL and an absolute TTL (see the relevant section in the OWASP guide). The absolute TTL is a maximum lifetime on the token regardless of extension. Say the idle TTL is 1 hour and the absolute TTL is 8 hours. As long as the user pokes the site and makes API requests every 59 minutes, their session will keep getting extended, up until 8 hours, at which point it will get expired no matter what and they will have to reauthenticate.

Expiration and extension logic

If a session cookie is present on the request:

  • If the session doesn't exist in the DB, fail.
  • If time_last_used is more than IDLE_TTL ago, the session is expired, so fail.
  • If time_created is more than ABSOLUTE_TTL ago, the session is expired, so fail.
  • If none of those is true, session is active, so user is authenticated.
    • Extend session by updating the last_used timestamp to now.

Manual expiration, i.e., logout

Not implemented in this PR. There will be a logout endpoint that does this.

When the user logs out we have to expire their session. This means clearing the cookie in their browser, but more importantly marking the session expired in the database. Right now this is assumed to mean a hard delete, but I'm still thinking about that and it could easily be changed. This is also relevant to the question of what we do when we get a request with an expired token. If expiring means deleting, we should probably also delete expired sessions when requests for them come in.

Components

Session token

Relevant OWASP quotes

The session ID content (or value) must be meaningless to prevent information disclosure attacks, where an attacker is able to decode the contents of the ID and extract details of the user, the session, or the inner workings of the web application.

If you need to create your own sessionID, use a cryptographically secure pseudorandom number generator (CSPRNG) with a size of at least 128 bits and ensure that each sessionID is unique.

The session ID must be unpredictable (random enough) to prevent guessing attacks, where an attacker is able to guess or predict the ID of a valid session through statistical analysis techniques. For this purpose, a good CSPRNG (Cryptographically Secure Pseudorandom Number Generator) must be used.

The session ID value must provide at least 64 bits of entropy.

Session tokens are random strings that point to a row in the sessions table. OWASP says you need at least 64 bits of entropy. I went with 16 bytes = 128 bits of random data and hex encoded it, which makes the session token a 32 character string. I did this because Thomas Ptacek said it's "hard to beat."

Cookie

The cookie name is session. We will set Secure and HttpOnly attributes on it (probably a separate PR with provisional login endpoints). TTL is TBD.

Comment thread nexus/src/authn/external/cookie.rs Outdated
}
}
Ok(cookies)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on how Rocket does it — note that the Rocket dev, @SergioBenitez, is also the dev of the cookie crate.

Dropshot currently does not provide this functionality but it easily could. Or not. Doing it here seems fine too.

Comment thread nexus/src/authn/external/cookie.rs Outdated
// Some(exp) => {
// // continue
// }
// }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure we should check both cookie expiration time and database? or we could just ignore the one on the cookie since the one we have stored is really the source of truth

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expires set on a cookie is intended for the client. After the expires date lapses from the client's perspective the cookie will be removed. I'd say if we get a cookie we should try to verify it. The expiration we really care about should be managed server side. Typically you might have something like Redis or some other datastore with a TTL setting on a row that'd remove it after it expires. Cockroach doesn't have that though (see bulk deleting expired data).

@davepacheco davepacheco left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is still draft so these comments might be getting ahead of things.

As I look at this I'm finding myself lacking a lot of context about how this typically works. Is there an RFD (or RFC or anything else you'd recommend) describing exactly how Cookies are securely used for session authn (like the format of the cookie, what fields are expected to be present, how replay attacks are avoided, are cookies always bearer tokens, etc.).

Comment thread nexus/src/authn/external/cookie.rs Outdated
Comment thread nexus/src/authn/external/cookie.rs Outdated
Ok(string) => string,
Err(_) => continue,
};
for chunk in raw_str.split(';').map(|s| s.trim()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this matters, but according to MDN, the key-value pairs in the Cookie header are separated by ; , not just ;.

Along these lines, is there something out there that's already vetted that parses the key-value pairs out of the Cookie header? It looks like Cookie::parse already does this, but we're pre-splitting the header and using it here to parse just a chunk?

Sorry if a lot of this is understood. It's obviously outside my wheelhouse. I'm just nervous because this seems like pretty important code to get right and I'm trying to really convince myself it's correct.

@david-crespo david-crespo Oct 20, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following this comment: rwf2/cookie-rs#137 (comment)

I admit I did not look into the reasoning.

I agree that it's odd that the cookie lib doesn't have something that does this, and so do others. @SergioBenitez seems to think it's too simple to warrant a utility for it, or rather any appropriately general utility would be too complex to be worth it. See also https://github.com/SergioBenitez/Rocket/blob/7ffe3a73601aabdfe3190a809e660538acd5bdce/core/lib/src/request/request.rs#L995-L1007.

@david-crespo

Copy link
Copy Markdown
Contributor Author

I appreciate the early feedback. I'll put together some stuff on how this is supposed to work. In the CP tactical you missed, we discussed Justin and I sprucing up RFD 169 and probably splitting it into two RFDs: one for Node vs. Rust, and one for console auth, and we could cover that in the latter.

@david-crespo

Copy link
Copy Markdown
Contributor Author

In the meantime this is a good resource that addresses almost everything in your list:

https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html

When I polish up this PR, I’ll write up how it achieves the various recommendations.

expected_status,
)
.await
}

This comment was marked as resolved.

Comment thread nexus/src/nexus.rs
// TODO: the 16 should be a constant somewhere, maybe even in config?
let mut random_bytes: [u8; 16] = [0; 16];
rng.fill_bytes(&mut random_bytes);
hex::encode(random_bytes)

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Comment thread nexus/src/http_entrypoints_external.rs Outdated

let authn = apictx.external_authn.authn_request(&rqctx).await?;
println!("authn_request(): {:?}", authn);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davepacheco unlike Spoof, this authn method depends on the rest of Nexus because it depends on the sessions table, so in order to test it I had to add this call to a real Nexus endpoint. Ideally I could test by adding a special test-only endpoint that I can add the auth call to and leave the real endpoints alone, but I'm not sure how to do that. Ideas?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question after some sleep, as it often is: the way these often work when they're pluggable middleware type things (kind of like what we have here) is to factor out the session lookup function into something that can be passed in. It would probably have to take a generic argument referring to the server context type, which in the real Nexus case would look like

async fn lookup_session(ctx: &Arc<ServerContext>, token: &str) -> Result<Session, Error> {
    ctx.nexus.session_fetch(token.to_string()).await
}

and in the test case it would ignore the argument and hard code in a map:

async fn lookup_session(_ctx: <something>, token: &str) -> Result<Session, Error> {
    match token {
        "good" => Ok(<good_token>), 
        "expired" => Ok(<expired_token>),
        _ => Err(<not found>),
    }
}

I will try to figure out how to do this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part are you talking about testing? For "spoof", I separated out the guts (simple as it is) so that I could test it with a unit test without a whole HTTP server. I also have an integration test that looks at the external HTTP response for various authn results. That one should cover you here, too.

For what it's worth, while working on "spoof" I considered adding a separate "test" dropshot server with its own API that we could use for stuff like this. But I found I could do it entirely outside with a custom Dropshot server in the integration test rather than Nexus, and that seemed better.

@david-crespo david-crespo Oct 25, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean helped me with this (change here: 54282ad )— the scheme is now generic so it can take any kind of context that implements SessionBackend, which just means it has a session lookup function. I need to make the type of Session generic and write some tests and you'll see what I mean

Comment thread common/src/sql/dbinit.sql Outdated
Comment on lines +477 to +479
-- TODO: let's stay agnostic about what this means for now, but obviously the
-- simplest interpretation is that it points to a row in the User table
user_id UUID NOT NULL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in #299 - and the discussion in the product chat room - that we might be punting on having a user table? FYI @teisenbe , not sure if the alternative will have a UUID

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to store a record of users existing somehow because we need something to reference in the ACLs. I understood the conclusion of those discussions to be about how much additional info to store, besides the basic requirement of an internal UUID plus an external ID representing the user in the customer's auth system.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the conclusion from the product chat to be only that we wouldn't support POST /users and DELETE /users/:user. I think it's an important open question exactly what data is stored and how it's kept in sync with the directory, but I expect it'll be more than nothing.

Comment thread nexus/src/authn/external/session_cookie.rs Outdated
Comment on lines +51 to +54
let token = match cookies.get(SESSION_COOKIE_COOKIE_NAME) {
Some(cookie) => cookie.value(),
None => return SchemeResult::NotRequested,
};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're aiming to cut down on the control flow paths, this is a decent candidate for ok_or_else, to convert the option to a Result (and to use the ? operator more):

let token = cookies.get(SESSION_COOKIE_NAME)
  .map(|c| c.value())
  .ok_or_else(|| SchemeResult::NotRequested)?;

Ultimately it's not too different from what you're already doing, but I've seen some minor preference for "reduce the number of explicit return calls" in Rust functions to make following control flow easier.

@davepacheco

Copy link
Copy Markdown
Collaborator

In the meantime this is a good resource that addresses almost everything in your list:

https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html

When I polish up this PR, I’ll write up how it achieves the various recommendations.

Thanks. That link is very helpful!

@davepacheco davepacheco left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. It's starting to feel a lot more real.

Comment thread common/src/sql/dbinit.sql Outdated
time_deleted TIMESTAMPTZ,

token STRING(32) PRIMARY KEY,
time_expires TIMESTAMPTZ NOT NULL,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: store "last_used" instead of "time_expires"? That's just as useful for expiring it, gives the client more flexibility in interpreting it, and is also a potentially useful fact for us to record.

I haven't gotten to where we delete records for expired sessions, but I suspect we'll want an index on whichever field we use for this so that we can quickly enumerate the oldest sessions.

Comment thread common/src/sql/dbinit.sql Outdated
/*
* Sessions for use by web console.
*/
CREATE TABLE omicron.public.Session (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to punt this to later, but a relatively easy thing I saw in the cheat sheet is binding the session to certain properties of the client, like the source IP and user agent headers. We could add those here and check them on fetch.

@david-crespo david-crespo Oct 25, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. Your IP would change if you went from wi-fi to LTE or something, but that's rare enough that I wouldn't feel bad. Will probably make an issue or add to the tracking list rather than implementing here. (Edit: I see it's already in #341)

Comment thread common/src/sql/dbinit.sql Outdated
/*
* Sessions for use by web console.
*/
CREATE TABLE omicron.public.Session (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Call this ConsoleSession? I suspect we're going to wind up with a lot of things called sessions (e.g., TLS session, cloud shell session, VNC session, serial console session, etc.)

Alternatively, maybe better: we could try to keep the console stuff in a separate CockroachDB schema or database (not cluster). That might be good for improved security and smaller blast radius.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change the name. I'd want to wait to see how much more stuff we have. It might only be this table and maybe some user preferences.

Comment thread common/src/api/external/mod.rs Outdated
DiskAttachment,
Instance,
Rack,
Session,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is used for cases where we're reporting something like "object not found: session "foo"". For the usual security reasons, I wonder if we don't want to leak that information, and so maybe don't need this. (If we do, thoughts on "console session"?)

Comment thread nexus/src/context.rs Outdated
config::SchemeName::Spoof => Box::new(HttpAuthnSpoof),
config::SchemeName::SessionCookie => {
Box::new(HttpAuthnSessionCookie)
// TODO: gross

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this is needed here but not for the the Spoof one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean helped me fix this by moving the type annotation to map: c2a9329#diff-5a2c045a788ce52d2612b2ffa2e163fe66cbee0fb72776cef9df4e9695262213R49

The reason it wasn't a problem with Spoof alone is that the error was about the arms of the match not matching. When there was only Spoof, there was only one arm.

Comment thread nexus/src/http_entrypoints_external.rs Outdated

let authn = apictx.external_authn.authn_request(&rqctx).await?;
println!("authn_request(): {:?}", authn);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part are you talking about testing? For "spoof", I separated out the guts (simple as it is) so that I could test it with a unit test without a whole HTTP server. I also have an integration test that looks at the external HTTP response for various authn results. That one should cover you here, too.

For what it's worth, while working on "spoof" I considered adding a separate "test" dropshot server with its own API that we could use for stuff like this. But I found I could do it entirely outside with a custom Dropshot server in the integration test rather than Nexus, and that seemed better.

Comment thread nexus/src/db/datastore.rs Outdated
.get_result_async(self.pool())
.await
.map_err(|e| {
public_error_from_diesel_pool_create(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this return if the 'id' is in use? That seems almost certainly a server bug (500 error).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns one of these. I think I'd consider it the job of the nexus wrapper or even the endpoint calling that to turn it into a 500. I just added an entry to #341 to make sure we're not leaking info about the nature of the error when we respond.

Err(ObjectAlreadyExists { type_name: ConsoleSession, object_name: "a_token" })

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's right. ObjectAlreadyExists is already an api::external::Error, right? That's the type we turn directly into an HTTP response and the usual pattern is to just ? and bail early when you get one. It feels pretty error prone to have cases where consumers have to check which one they got and potentially translate it to a different variant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you’re right. Now I understand what @smklein meant when he was helping me with this PR and said it wasn’t ideal that we’re using these external errors basically everywhere. The DB Result types should not be using the public-facing error type. I’ll look into it but it probably goes beyond this PR. For now I’ll fix this one so it doesn’t leak info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make them all return 500s on errors for now, which will get ignored by the session cookie scheme either way because I'm collapsing Result to Option in the calling code for the moment. Or I could put in TODOs, add it to the list in #341, and address it after #347.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may as well just do the better thing, as long we're going to do #347.

if session.time_expires < Utc::now() {
return SchemeResult::Failed(Reason::BadCredentials {
actor,
source: anyhow!("expired session"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest putting the expire time in here, and maybe even the time we checked against.

Comment thread nexus/src/nexus.rs
// TODO: the 16 should be a constant somewhere, maybe even in config?
let mut random_bytes: [u8; 16] = [0; 16];
rng.fill_bytes(&mut random_bytes);
hex::encode(random_bytes)

This comment was marked as resolved.

// request with expired session should 401
let _ = get_projects_with_cookie(
&client,
Some("session=expired"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about trying with one that's actually expired (i.e., use session_create_with with a time in the past)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I'm doing! these could be more nicely grouped visually, I did think about that

let ago_5_minutes = Utc::now() - Duration::seconds(300);
let _ =
    nexus.session_create_with("expired".into(), user2, ago_5_minutes).await;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! My mistake. I saw "expired" was the session token itself and thought it was a magic value.

@david-crespo david-crespo marked this pull request as ready for review October 28, 2021 15:41
@david-crespo

david-crespo commented Oct 28, 2021

Copy link
Copy Markdown
Contributor Author

This is close enough to start looking at, though I still have a couple of things left to address in this PR that I don't think belong in followups:

  • Don't use the orgs endpoint for my integration tests, use a custom server
  • Pull idle and absolute timeouts from config
  • Maybe remove ResourceType::ConsoleSession or at least make sure we're not leaking too much info to the client in error messages
  • Think about whether hard delete is appropriate or if we want to keep expired sessions around for some strange reason
  • Proper logging — how do I do that? just drop some info!() around?
  • Maybe change the methods on SessionStore back to returning Result<> instead of Option<>. It's a lot more annoying to have to make the trait aware of the error types though

Comment thread nexus/src/context.rs

fn absolute_timeout(&self) -> Duration {
Duration::hours(8)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably be in the config — @davepacheco what's a nice way to plumb through the values here? I almost want to stick the entire config on server context so I can do something like

fn absolute_timeout(&self) -> Duration {
    Duration::minutes(self.config.session_absolute_ttl_minutes)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I think it's reasonable to expose simple tunables like this on the Context. The thing I worry about is exposing more complicated things that ought to be better encapsulated. It would be easy for someone to mistakenly think they can figure out the server's bound address by grabbing config.dropshot_external.bind_address for example. As long as this is behind a method, that doesn't seem like a problem. We could even put this in a tunables section of the config so that and only store that in the Context.

I'd just make the name more descriptive -- absolute_timeout doesn't say anything to me about being config-related or session-related.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I suppose the fact that's it's a method on the SessionStore trait doesn't help when you're just doing context.absolute_timeout() or you see it in an autocomplete menu.

// without passing TestServerContext around as mutable
struct TestServerContext {
sessions: Mutex<HashMap<String, FakeSession>>,
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an interesting learning experience. because the logic for session expiration requires making multiple calls to the session store, I couldn't extract out a pure function of the request that's easily unit testable, like the Spoof scheme has. Instead I end up testing the full authn method with a fake context that implements SessionStore with a hashmap as the store.

Comment thread nexus/src/http_entrypoints_external.rs Outdated

let authn = apictx.external_authn.authn_request(&rqctx).await?;
println!("authn_request(): {:?}", authn);

@david-crespo david-crespo Oct 28, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to stay in, I just need to figure out out to use a custom server for integration tests like WhoamiServer is being used to test the spoof scheme. There are some annoying parts but they're not at all insurmountable.

The main annoying thing is that in order to implement SessionStore on Arc<WhoamiServerState> (source) I have to wrap it in a newtype because I don't own SessionStore or Arc.

@david-crespo david-crespo Oct 28, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out the server state didn't need to be wrapped in Arc anyway, so getting rid of that solves my orphan rule problem. The other problem is that the diff is gnarly as hell and the test is getting messy. So I think what I'll do is still basically this, but move it into its own file so it's not mixed up with all the spoof tests. Unfortunately there's a lot of similar boilerplate between that we'll want to think about cleaning up later.

cookie-auth...cookie-whoami done properly in bcd45d3

Comment thread nexus/src/lib.rs Outdated

pub mod authn; // Public only for testing
mod config;
pub mod config; // Public only for testing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go away once I stop using the real nexus for the integration tests

let authn = omicron_nexus::authn::external::Authenticator::new(
authn_schemes_configured,
);
Arc::new(WhoamiServerState { authn })

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to make this stop being Arc so I could implement SessionStore directly on WhoamiServerState

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that worked but that's no problem.

I would think that if you impl'd SessionStore on WhoamiServerState, you could still use Arc<WhoamiServerState> where you need a something that impl'd SessionStore, as long as you took the value by reference and didn't need it to be mutable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it that way and it complained that Arc<WhoamiServerState> didn't have SessionStore impl'd, but it's very possible I wasn't doing it right.

Comment thread nexus/src/db/datastore.rs
let delete_again = datastore.session_hard_delete(token.clone()).await;
assert_eq!(delete_again, Ok(()));

let _ = db.cleanup().await;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed Tess's lead here adding tests directly to datastore. I find these tests pretty reassuring. Too bad they're slow to spin up, at least on Mac (see #53).

Comment thread common/src/sql/dbinit.sql

-- to be used for cleaning up old tokens
CREATE INDEX ON omicron.public.ConsoleSession (
time_created

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seems like this should be time_last_used right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked @teisenbe's idea that if this was time_created, a) the index would change less often, and b) it's basically just as good for the purpose of running a cleanup query.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why (b) would be true...but we can see how it goes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me (b) is motivated by these assumptions:

  • both TTLs are pretty short (less than a day)
  • the max number of sessions per day for the foreseeable future (say 50k = 10k users x 5 sessions per day) would mean we probably don't need to run a cleanup job more than daily.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we're using about 100 bytes per row (not as familiar with the CockroachDB internals, so this is a ballpark), so even if we had 1M sessions per day and the absolute limit were the original 1 week, the datastore is still not particularly large. Indexing on time_last_used seems like a risk for creating database contention from all API requests that renew the session due to the roughly sequential index (cautioned against in the cockroach docs). time_created also has this problem, but it'll only be on every session creation instead of every renewal, which should reduce the risk. Though, it sounds like Cockroach has a database-side fix for this in the works, with their experimental hash-sharded indexes.

authn_schemes_external = []
[authn]
schemes_external = []
session_idle_timeout_minutes = 60

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting the units in this name :)

let authn = omicron_nexus::authn::external::Authenticator::new(
authn_schemes_configured,
);
Arc::new(WhoamiServerState { authn })

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that worked but that's no problem.

I would think that if you impl'd SessionStore on WhoamiServerState, you could still use Arc<WhoamiServerState> where you need a something that impl'd SessionStore, as long as you took the value by reference and didn't need it to be mutable.

match ctx.session_update_last_used(token.clone()).await {
Some(_) => {}
None => {
// couldn't renew session wtf

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TODO or 500 or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll fix it. I don't think I want it to 500 because at this point the request is successfully authenticated, it's just that the next one might fail because we didn't move the active window forward.

// request.
let updated_session = ctx.session_update_last_used(token.clone()).await;
if updated_session.is_none() {
debug!(log, "failed to extend session")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll definitely want to have a counter for this that we raise an alarm on, I'd say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

console Web console

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants