-
Notifications
You must be signed in to change notification settings - Fork 4k
Expose OIDC tokens #12044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose OIDC tokens #12044
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
@vdonato Thank you for getting the logout PR across the finish line yesterday :) |
|
Hey @velochy, happy to assist with this PR as well! Heads up that I do expect to be relatively busy for the next week or so, so it might take me a bit of time to get around to giving this a review. Feel free to ping me to remind me if you haven't heard back in a week's time. |
|
No prob @vdonato . I'll also be taking a trip and will be away from the computer for a few days. |
|
Hey, I'm OOO for the next 1.5 weeks but will take a look afterwards! |
|
@sfc-gh-jrieke gentle reminder now that you are hopefully back. While you were away, #12182 got reverted, as users were experiencing issues and for at least one user, it would seem the only way to fix it requires keeping track of id_token, which is provided by this PR. So it would be really nice to get this PR done soon-ish so I can fix the logout one and get that re-merged as well. |
|
Hey @velochy, thanks for the wait! First of all, let me say these are things we really want (and we know many users ask for them, especially the access token). Unfortunately, the engineer who originally worked on authentication is no longer with Streamlit, so we currently have a pretty limited understanding of all the details in the team, which will make things slower. Plus, changes to this feature will need to go through an internal review with our security team, which again makes it a bit more complicated. So please bear with us! :D I read up on your proposed changes and have a few questions for understanding – would be great if you could try to explain them in simple terms since I'm not an expert on OIDC:
|
No worries. Summer is vacations time, and I was also largely away myself :)
I'll do my best. I currently have a very hacky workaround for this with two otherwise equivalent oauth confs, one with prompt="login" and one without, that in a roundabout way almost achieves the desired behavior. But I would really like to clean up my code and the only way to do that is... well to get this and the login PR merged to Streamlit. So I'm moderately incentivized, but it's not urgent.
Technically we can leave id_token behind the scenes, but as APIs are different, I can easily see a scenario of some users needing that instead/alongside their access_token. So at least to me it makes sense to expose both in the same place. Id token is relatively safe to expose, as it is basically the user object (current user object is basically extracted from it), just signed by the oauth provider - so it adds less risk than exposing the access token as most of the info contained in it is already in the object.
I will defer to your security team on this as I'm not a cybersecurity expert (fun fact: I do have a PhD in cryptography, but a) it was 15 years ago and b) I was heavily on the theory side, so my practical security knowledge was always flaky at best). Basic googling told me that storing the token in an encrypted and only server-accessible cookie is pretty standard practice though, so as best I can tell, it should be ok to do for all users.
The way I see it: we still need to store it, even if we don't expose it. And as long as we store it, anyone writing a streamlit app can still access the cookie that stores it. So the only thing exposing it changes is making it slightly more likely to leak by accident (i.e. someone posting the user object somewhere without filtering for fields). Which, now that I think about it, is not an insignificant risk. So yes. You are right in that refresh_token should probably not be exposed in the user object. Again - if someone really wants or needs it, they can read the cookie and access it, but it would require very deliberate effort an control of the streamlit application code.
As you correctly noted, for my use case, i.e. refreshing the user object after the user has changed their settings like language (which I don't think is an uncommon use case), a manual refresh is definitely needed. But it can easily be moved out of sight, for instance to st.user.refresh() instead of the current st.refresh_user(). As for automatic refreshing of access tokens, we could start every session with a refresh if we have a cookie present (which is basically what I plan to do in my app), but there are legitimate cases where people might not want that (i.e. to keep login periods short and force a relogin every day or so for security reasons), so we would then need to add a configuration variable to control this behavior. And I think we would both prefer to not create another rabbithole on an already complex PR ;) |
On this proposal, it is worth noting that reacting to st.login parameters is very cumbersome, as they would need to be passed through the OIDC authentication process via ?state parameter which is currently only used to avoid replay attacks. A far easier solution (at least on the technical side) would be to have a secrets.toml flag controlling this behavior, as that object is easily accessible everywhere and usually already accessed in the relevant code parts. So having something like |
|
To add some concrete context, especially around the need for the In the application I'm developing, we use Keycloak as our identity provider (IdP). A key feature we want to offer is the ability for certain privileged users (e.g., team managers) to create and manage other users directly from within the Streamlit application's UI. Without APIThere’s also a big administrative overhead. For the app to function, a user record must exist both:
Currently, this means:
This dual-entry process is tedious, error-prone, and does not scale. The Ideal (and Secure) Workflow This PR Enables
Why this is secure:
The Current (Insecure and Cumbersome) WorkaroundWithout access to the logged-in user’s token, the process looks like this:
Problems with this approach:
Why This PR MattersBy exposing the
|
|
Building on the valid security concerns raised, perhaps there's a middle ground that provides both developer flexibility and enhanced, user-centric security. What if accessing the token required explicit user consent via a pop-up? Instead of the token being directly available in the # The developer requests the token
access_token = st.user.get_access_token()When this function is called for the first time in a session, Streamlit could present a modal dialog to the end-user:
The Advantages of this approach:
|
|
@jrieke have you had a chance to think on it, and does anything else need clarifying? Based on your (very good) questions and the technical limitations, my proposal would be to not expose any tokens by default, but to add an "expose" flag to secrets.toml that allows user to specify what they need (access, id) and then those get added to the st.user. In terms of refresh, moving it from st.refresh_user() to st.user.refresh() seems pretty sensible and straightforward, as UserInfoProxy is already a class due to needing to override default setters on dict. If this makes sense to you as well, I'd be happy to make the changes. |
|
Sorry, didn't get back to this yet since we have annual planning going on. Will have a look in the next few days! |
This violates the ruff recommendation, so I think we should choose one, but will keep exception for now. |
|
Finished and merged the spec now! @velochy as you pointed out, this should already conform with your implementation, but just for safety would be good if you could give this a final look and double check :) Two more points that came to my mind while finishing the spec:
|
|
I believe we can just annotate the |
485d0c9 to
bbec13a
Compare
|
Confirmed that the changes conform to the spec. This currently will not work on community cloud. This is due to security and filtering out unexpected cookies. I have a change under review, and we can deploy the latest changes to work in production. |
| - expose_tokens = ["id", "access"] -> ["id", "access"] | ||
| """ | ||
| auth_section = get_secrets_auth_section() | ||
| expose_tokens = auth_section.get("expose_tokens") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: If I'm reading this correctly, this only reads from the top-level [auth] config section. However, the docs in TokensProxy mentions that it can be set from [auth] and [auth.<provider>] (eg: [auth.google]) config blocks. Which one should be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe having it for the provider does not make as much sense here (at least for now), but I will confirm with @jrieke We load the user info and token info on browser connection, and we don't store any context on the provider, so I expect that the expose_tokens to be a global. I think we can consider storing the provider in the cookie as well in order to determine which provider section to look into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provider is indeed useful to store (logout needs it, for instance) but it makes sense to put that into userinfo rather than under tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both fine to me! Might be a tiny bit nicer to be able to set it per provider, but if that's hard, totally fine to just keep it as a global setting and wait until someone creates an issue. I don't think that's a super common use case.
lib/streamlit/auth_util.py
Outdated
| The main cookie always exists and contains the first chunk. | ||
| Additional chunks are stored as cookie_name_1, cookie_name_2, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This docstring seems to be out of line with what is actually implemented. It looks like if there are multiple cookies then the first cookie is just the count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct.
| set_cookie_with_chunks( | ||
| self._set_single_cookie, | ||
| self._create_signed_value, | ||
| TOKENS_COOKIE_NAME, | ||
| tokens, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It seems like we're always setting this now. Shouldn't we only set this if expose_tokens is configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We will need the id token and refresh token internally for proper logout/refresh so the cookie should be set irrespective of expose_tokens.
See the two other PRs linked above that are waiting for this to get merged.
lib/streamlit/auth_util.py
Outdated
|
|
||
| # Maximum size for a single cookie chunk (conservatively set to allow for overhead) | ||
| # Overhead seems to be around 110 bytes, but depends on Tornado internals, so having a safe 3x margin | ||
| cookie_space = 3800 - len( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: How sure are we about this 3800 magic number? Would it make more sense to actually compute this instead so that we are both more efficient and never going over the max size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a 3X margin, but I think I have an idea to make this cleaner.
f51bf69 to
8d4a512
Compare
| available_for_base64_value = available_for_signed_value - signing_overhead | ||
|
|
||
| # Convert from base64 space to raw value space (base64 has 4/3 expansion ratio) | ||
| chunk_size = (available_for_base64_value * 3) // 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we need guard this potentially being a negative number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You thinking just raising an error here? I figure if it's negative that means there's an error in the server signing (or our logic of sending the right function). Happy to add an error, but it might be a little overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suppose raising an error makes most sense. I believe that it would output malformed tokens if we didn't raise here, which seems like a worse failure state.
8d4a512 to
ec9fdd7
Compare
d075a3a to
74b31c2
Compare
|
@velochy I know it's been a while! But everything got approved and I successfully merged it in! Thank you again for your patience here! |
|
Amazing, thank you so much everyone and especially @velochy! This is (I think) the most complex external contribution we ever got, so huge kudos to you. We'd love to send you some swag as a thank you, feel free to fill out the form here, and we'll get something nice on the way: https://forms.gle/RZCUTKCrg7aHyqH78 |
Wow. Would not have thought this would be the case as I don't feel the contribution was particularly big. I'll try to find some time over the holidays to rebase the other two PR-s and review them in light of the comments here. I really do feel both are important for the OAuth functionality to be fully usable. But yes, right now, the ball is back in my court :) Meanwhile, happy holidays for everyone involved, and thank you all - @jrieke @kmcgrady @sfc-gh-bnisco @lukasmasuch @pallagj |
…2) (#12693) ## Describe your changes Currently, st.logout ignores end_session_endpoint (ESE) in the OIDC configuration. This can cause issues, as some OIDC providers keep their own cookies and skip the login screen if they have an active session. This change: a) Writes the name of provider into the session cookie (so it would be clear which provider the user is currently logged into) b) Makes st.logout redirect to ESE if one is provided This is a revised version of #11901 to adress the issues raised in #12144 (comment) Most notably: 1) client_id is added to ESE request (original issue, with attempted fix at #12182) 2) id_token_hint is also added (addressing #12144 (comment)) 3) ESE request redirect target it changed to /oauth2callback (adressing #12144 (comment)) This is a follow-up to #12044 ## GitHub Issue Link (if applicable) Closes #11900 ## Testing Plan - unit tests added both for a case where ESE is provided as well as when not --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license. --------- Co-authored-by: Vincent Donato <vincent@streamlit.io> Co-authored-by: Ken McGrady <ken.mcgrady@gmail.com>
Describe your changes
Store the OIDC auth response in a cookie and expose them to users for API calls
Currently, from user perspective,
st.usergets an extra field called "tokens" that is a dict of all the tokens that were made available - most often the triplet of acces_token, id_token, refresh_tokenThis is useful for multiple use cases (see the issues below).
To achieve (a) reliably, the tokens are stored in a separate cookie from user info. This is because cookies are limited to be <4096 bytes and I ran into that limit in my own testing already if I combined the two dictionaries. Additionally, both cookies can now be chunked so as a by-product, this now supports login systems with larger userdata.
I have tested it to work in my own use case and it seems to be performing as intended. But it is just one use case and I may well be missing some others. so any proposals to improve this are very welcome.
Originally, also contained st.user_refresh command, but that got it's own PR: #12696
GitHub Issue Link (if applicable)
Closes #10378
Exposing access_token: #10378
Too large usedata for current login system: #12518
Need to provide id token hint at logout: #12144 (comment) -> PR #12693
Need for user refresh: #12043 -> PR #12696
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.