Skip to content

Conversation

@velochy
Copy link
Contributor

@velochy velochy commented Jul 26, 2025

Describe your changes

Store the OIDC auth response in a cookie and expose them to users for API calls

Currently, from user perspective, st.user gets an extra field called "tokens" that is a dict of all the tokens that were made available - most often the triplet of acces_token, id_token, refresh_token

This is useful for multiple use cases (see the issues below).

To achieve (a) reliably, the tokens are stored in a separate cookie from user info. This is because cookies are limited to be <4096 bytes and I ran into that limit in my own testing already if I combined the two dictionaries. Additionally, both cookies can now be chunked so as a by-product, this now supports login systems with larger userdata.

I have tested it to work in my own use case and it seems to be performing as intended. But it is just one use case and I may well be missing some others. so any proposals to improve this are very welcome.

Originally, also contained st.user_refresh command, but that got it's own PR: #12696

GitHub Issue Link (if applicable)

Closes #10378

Exposing access_token: #10378
Too large usedata for current login system: #12518

Need to provide id token hint at logout: #12144 (comment) -> PR #12693
Need for user refresh: #12043 -> PR #12696

Testing Plan

  • Unit tests present
  • Manually tested to work for my own use case

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@velochy velochy requested a review from a team as a code owner July 26, 2025 08:47
@snyk-io
Copy link
Contributor

snyk-io bot commented Jul 26, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@velochy
Copy link
Contributor Author

velochy commented Jul 26, 2025

@vdonato Thank you for getting the logout PR across the finish line yesterday :)
I'm hoping you'll help me with this one as well, as it is kind of a follow-up.

@vdonato
Copy link
Collaborator

vdonato commented Jul 27, 2025

Hey @velochy, happy to assist with this PR as well!

Heads up that I do expect to be relatively busy for the next week or so, so it might take me a bit of time to get around to giving this a review. Feel free to ping me to remind me if you haven't heard back in a week's time.

@sfc-gh-lwilby sfc-gh-lwilby added change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Jul 27, 2025
@velochy
Copy link
Contributor Author

velochy commented Jul 27, 2025

No prob @vdonato . I'll also be taking a trip and will be away from the computer for a few days.
And while this would be a useful feature for my company, it's not a blocker for anything.

@lukasmasuch lukasmasuch added the status:needs-product-approval PR requires product approval before merging label Jul 28, 2025
@lukasmasuch
Copy link
Collaborator

I added the status:needs-product-approval since adding a new command - st.refresh_user - will need product approval. (cc @jrieke). Also, I think @jrieke already wrote a spec some time ago related to exposing the access token.

@lukasmasuch lukasmasuch requested a review from jrieke July 28, 2025 13:22
@velochy
Copy link
Contributor Author

velochy commented Aug 4, 2025

@jrieke would love to see the spec and build to match it if you had a plan already.

Also @vdonato this is the gentle reminder nudge you asked me to provide :)

@sfc-gh-jrieke
Copy link
Contributor

Hey, I'm OOO for the next 1.5 weeks but will take a look afterwards!

@velochy
Copy link
Contributor Author

velochy commented Aug 18, 2025

@sfc-gh-jrieke gentle reminder now that you are hopefully back.

While you were away, #12182 got reverted, as users were experiencing issues and for at least one user, it would seem the only way to fix it requires keeping track of id_token, which is provided by this PR. So it would be really nice to get this PR done soon-ish so I can fix the logout one and get that re-merged as well.

@jrieke
Copy link
Collaborator

jrieke commented Aug 18, 2025

Hey @velochy, thanks for the wait!

First of all, let me say these are things we really want (and we know many users ask for them, especially the access token). Unfortunately, the engineer who originally worked on authentication is no longer with Streamlit, so we currently have a pretty limited understanding of all the details in the team, which will make things slower. Plus, changes to this feature will need to go through an internal review with our security team, which again makes it a bit more complicated. So please bear with us! :D

I read up on your proposed changes and have a few questions for understanding – would be great if you could try to explain them in simple terms since I'm not an expert on OIDC:

  1. Why do we need to expose the id_token in st.user? I understand there's this issue with having to provide the ID token at logout for specific providers (st.logout() doesn't redirect to correct Cognito ESE #12144 (comment)) but isn't that something we'd need to do behind the scenes in st.logout instead of the developer manually doing that?

  2. Exposing the access_token in st.user is something we probably want to do but I think it might make sense to gate that behind a parameter (e.g. st.login(..., expose_access_token: bool = False)) to make it more secure (given that most users will probably not need the access token). Do you have any opinion on that? Also noting this will definitely need a review from our security team.

  3. Why do we need to expose the refresh_token in st.user? Isn't that something that would anyway be handled behind the scenes, even with your st.refresh_user function? As far as I understand it, exposing the refresh token poses a significant security risk since you can use it to get new access tokens, right? (Of course, if devs 100% understand all these tokens and handle them responsibly, there's no risk; but given that our users are mostly data-focused personas who don't have a great understanding of auth, I'd lean towards only exposing what's absolutely necessary.)

  4. Is there a way we can refresh the user info/access token automatically behind the scenes instead of having to manually call st.refresh_user? I read your issue Option to refresh user info for user authenticated via st.login #12043 and see that you want to specifically update the user info manually but I think for most use cases, devs will probably just want to have a fresh access token when it expires, right? I think eventually I could imagine both an automatic refresh + a manual command to refresh whenever you need, just trying to understand what's possible and what the priorities are. Also, just want to note already that I'm not sure how well st.refresh_user fits into our API, so if we do want a manual command for this, we'd definitely need to think through naming and whether we make this a top-level command or can somehow integrate it more nicely with st.user or st.login.

@velochy
Copy link
Contributor Author

velochy commented Aug 18, 2025

Hey @velochy, thanks for the wait!

No worries. Summer is vacations time, and I was also largely away myself :)

First of all, let me say these are things we really want (and we know many users ask for them, especially the access token). Unfortunately, the engineer who originally worked on authentication is no longer with Streamlit, so we currently have a pretty limited understanding of all the details in the team, which will make things slower. Plus, changes to this feature will need to go through an internal review with our security team, which again makes it a bit more complicated. So please bear with us! :D

I'll do my best. I currently have a very hacky workaround for this with two otherwise equivalent oauth confs, one with prompt="login" and one without, that in a roundabout way almost achieves the desired behavior. But I would really like to clean up my code and the only way to do that is... well to get this and the login PR merged to Streamlit. So I'm moderately incentivized, but it's not urgent.

I read up on your proposed changes and have a few questions for understanding – would be great if you could try to explain them in simple terms since I'm not an expert on OIDC:

  1. Why do we need to expose the id_token in st.user? I understand there's this issue with having to provide the ID token at logout for specific providers (st.logout() doesn't redirect to correct Cognito ESE #12144 (comment)) but isn't that something we'd need to do behind the scenes in st.logout instead of the developer manually doing that?

Technically we can leave id_token behind the scenes, but as APIs are different, I can easily see a scenario of some users needing that instead/alongside their access_token. So at least to me it makes sense to expose both in the same place. Id token is relatively safe to expose, as it is basically the user object (current user object is basically extracted from it), just signed by the oauth provider - so it adds less risk than exposing the access token as most of the info contained in it is already in the object.

  1. Exposing the access_token in st.user is something we probably want to do but I think it might make sense to gate that behind a parameter (e.g. st.login(..., expose_access_token: bool = False)) to make it more secure (given that most users will probably not need the access token). Do you have any opinion on that? Also noting this will definitely need a review from our security team.

I will defer to your security team on this as I'm not a cybersecurity expert (fun fact: I do have a PhD in cryptography, but a) it was 15 years ago and b) I was heavily on the theory side, so my practical security knowledge was always flaky at best). Basic googling told me that storing the token in an encrypted and only server-accessible cookie is pretty standard practice though, so as best I can tell, it should be ok to do for all users.

  1. Why do we need to expose the refresh_token in st.user? Isn't that something that would anyway be handled behind the scenes, even with your st.refresh_user function? As far as I understand it, exposing the refresh token poses a significant security risk since you can use it to get new access tokens, right? (Of course, if devs 100% understand all these tokens and handle them responsibly, there's no risk; but given that our users are mostly data-focused personas who don't have a great understanding of auth, I'd lean towards only exposing what's absolutely necessary.)

The way I see it: we still need to store it, even if we don't expose it. And as long as we store it, anyone writing a streamlit app can still access the cookie that stores it. So the only thing exposing it changes is making it slightly more likely to leak by accident (i.e. someone posting the user object somewhere without filtering for fields).

Which, now that I think about it, is not an insignificant risk. So yes. You are right in that refresh_token should probably not be exposed in the user object. Again - if someone really wants or needs it, they can read the cookie and access it, but it would require very deliberate effort an control of the streamlit application code.

  1. Is there a way we can refresh the user info/access token automatically behind the scenes instead of having to manually call st.refresh_user? I read your issue Option to refresh user info for user authenticated via st.login #12043 and see that you want to specifically update the user info manually but I think for most use cases, devs will probably just want to have a fresh access token when it expires, right? I think eventually I could imagine both an automatic refresh + a manual command to refresh whenever you need, just trying to understand what's possible and what the priorities are. Also, just want to note already that I'm not sure how well st.refresh_user fits into our API, so if we do want a manual command for this, we'd definitely need to think through naming and whether we make this a top-level command or can somehow integrate it more nicely with st.user or st.login.

As you correctly noted, for my use case, i.e. refreshing the user object after the user has changed their settings like language (which I don't think is an uncommon use case), a manual refresh is definitely needed. But it can easily be moved out of sight, for instance to st.user.refresh() instead of the current st.refresh_user().

As for automatic refreshing of access tokens, we could start every session with a refresh if we have a cookie present (which is basically what I plan to do in my app), but there are legitimate cases where people might not want that (i.e. to keep login periods short and force a relogin every day or so for security reasons), so we would then need to add a configuration variable to control this behavior. And I think we would both prefer to not create another rabbithole on an already complex PR ;)

@velochy
Copy link
Contributor Author

velochy commented Aug 18, 2025

(e.g. st.login(..., expose_access_token: bool = False))

On this proposal, it is worth noting that reacting to st.login parameters is very cumbersome, as they would need to be passed through the OIDC authentication process via ?state parameter which is currently only used to avoid replay attacks.
So while doable, it would add A LOT of additional complexity, and considering you told me the person in charge of this part of code is no longer with you, this is something you are likely looking to avoid.

A far easier solution (at least on the technical side) would be to have a secrets.toml flag controlling this behavior, as that object is easily accessible everywhere and usually already accessed in the relevant code parts. So having something like
expose = "acess,id"
that controls which tokens are exposed with a default value of an empty string might be the most reasonable solution

@pallagj
Copy link

pallagj commented Aug 19, 2025

To add some concrete context, especially around the need for the access_token, here’s my primary use case. This should help clarify why exposing it is not just a convenience but a critical feature for building more integrated and secure applications with Streamlit.

In the application I'm developing, we use Keycloak as our identity provider (IdP). A key feature we want to offer is the ability for certain privileged users (e.g., team managers) to create and manage other users directly from within the Streamlit application's UI.


Without API

There’s also a big administrative overhead. For the app to function, a user record must exist both:

  1. In Keycloak (for authentication).
  2. In the Streamlit app’s database (for app-specific data and settings).

Currently, this means:

  • Create the user in Keycloak.
  • Copy the unique Keycloak user ID.
  • Manually create a corresponding user in the Streamlit app’s DB and paste in the Keycloak ID.

This dual-entry process is tedious, error-prone, and does not scale.


The Ideal (and Secure) Workflow This PR Enables

  1. A manager logs into the Streamlit app via OIDC with their Keycloak account.
  2. The Streamlit backend receives and securely stores the user’s access_token (as proposed in this PR). This token contains the permissions and roles assigned to that specific manager in Keycloak.
  3. Inside the app, the manager navigates to a user management page and fills out a form to create a new user.
  4. On submission, the Streamlit backend takes the manager’s own access_token and uses it to make a secure API call to the Keycloak Admin API to create the new user.

Why this is secure:

  • Operates under the Principle of Least Privilege.
  • Success depends entirely on the permissions granted to the logged-in manager.
  • If they have the right role, Keycloak allows the action; if not, it’s denied.
  • No additional or long-lived credentials are required.

The Current (Insecure and Cumbersome) Workaround

Without access to the logged-in user’s token, the process looks like this:

  • I must create a powerful “service account” or admin user in Keycloak with broad permissions.
  • Store the credentials (e.g., client ID and secret, or username/password) for this admin user in secrets.toml.
  • When a manager tries to create a user, the backend authenticates with Keycloak in the background using these stored admin credentials.
  • It then uses this all-powerful admin token to call the Keycloak Admin API.

Problems with this approach:

  • Requires storing long-lived, high-privilege credentials in app configuration.
  • Bypasses the granular, role-based permissions of the actual user.
  • Increases attack surface and creates unnecessary risk.

Why This PR Matters

By exposing the access_token, this PR allows developers to:

  • Eliminate the insecure service-account workaround.
  • Build integrations authenticated by the actual end-user’s identity and permissions.
  • Automate user provisioning, avoiding manual syncing between systems.
  • Dramatically improve the security posture and reduce admin overhead.

@pallagj
Copy link

pallagj commented Aug 19, 2025

Building on the valid security concerns raised, perhaps there's a middle ground that provides both developer flexibility and enhanced, user-centric security.

What if accessing the token required explicit user consent via a pop-up?

Instead of the token being directly available in the st.user object, a developer would have to call a function to retrieve it. For example:

# The developer requests the token
access_token = st.user.get_access_token()

When this function is called for the first time in a session, Streamlit could present a modal dialog to the end-user:

"This application is requesting permission to use your authentication token to perform actions on your behalf. Do you approve?"

[Allow] [Deny]

The get_access_token() function would only return the token if the user clicks "Allow." This function could also handle the automatic token refreshing in the background.

Advantages of this approach:

  • Puts the User in Control: It makes token access an explicit, user-approved action. This prevents an application from silently using the token in the background and ensures the user is aware of what permissions they are granting.

  • Maintains Flexibility: Unlike a fully abstracted st.authenticated_request() function, this approach still gives the developer the actual token. This is crucial for using complex third-party SDKs or libraries that require the token to be passed directly.

  • Balances Security and Functionality: It directly addresses the risk of accidental token exposure by making the access intentional and auditable from the user's perspective, while still enabling the powerful integration use cases that motivated this pull request.

@velochy
Copy link
Contributor Author

velochy commented Aug 21, 2025

@jrieke have you had a chance to think on it, and does anything else need clarifying?

Based on your (very good) questions and the technical limitations, my proposal would be to not expose any tokens by default, but to add an "expose" flag to secrets.toml that allows user to specify what they need (access, id) and then those get added to the st.user.

In terms of refresh, moving it from st.refresh_user() to st.user.refresh() seems pretty sensible and straightforward, as UserInfoProxy is already a class due to needing to override default setters on dict.

If this makes sense to you as well, I'd be happy to make the changes.

@velochy
Copy link
Contributor Author

velochy commented Aug 29, 2025

@jrieke @vdonato there has been no reply here over a week, so just a gentle reminder - I'm happy to make any changes as long as I know which ones :)

@jrieke
Copy link
Collaborator

jrieke commented Sep 4, 2025

Sorry, didn't get back to this yet since we have annual planning going on. Will have a look in the next few days!

@kmcgrady kmcgrady removed the do-not-merge PR is blocked from merging label Dec 11, 2025
@kmcgrady
Copy link
Collaborator

However, the incorrect use of _LOGGER.exception() should be fixed before merging as it will produce confusing log output without actual exception tracebacks.

This violates the ruff recommendation, so I think we should choose one, but will keep exception for now.

@jrieke
Copy link
Collaborator

jrieke commented Dec 11, 2025

Finished and merged the spec now! @velochy as you pointed out, this should already conform with your implementation, but just for safety would be good if you could give this a final look and double check :)

Two more points that came to my mind while finishing the spec:

  • Did you test that this works on Community Cloud? I guess it should given that normal auth works there as well, but would be good to do a quick smoke test to see that nothing changed.
  • Are we collecting telemetry metrics for this in some way? Would be good to either track how often the expose_tokens parameter is used (even though not sure whether we want to track things from secrets.toml) or how often st.user.tokens is accessed. This helps us figure out how often this feature is used in practice and how we should prioritize furutre improvements. @kmcgrady and @lukasmasuch can you help with this point?

@lukasmasuch
Copy link
Collaborator

I believe we can just annotate the token property function on st.user with the metrics decorator

@lukasmasuch lukasmasuch added the status:product-approved Community PR is approved by product team label Dec 15, 2025
@kmcgrady
Copy link
Collaborator

Confirmed that the changes conform to the spec. This currently will not work on community cloud. This is due to security and filtering out unexpected cookies. I have a change under review, and we can deploy the latest changes to work in production.

- expose_tokens = ["id", "access"] -> ["id", "access"]
"""
auth_section = get_secrets_auth_section()
expose_tokens = auth_section.get("expose_tokens")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: If I'm reading this correctly, this only reads from the top-level [auth] config section. However, the docs in TokensProxy mentions that it can be set from [auth] and [auth.<provider>] (eg: [auth.google]) config blocks. Which one should be correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe having it for the provider does not make as much sense here (at least for now), but I will confirm with @jrieke We load the user info and token info on browser connection, and we don't store any context on the provider, so I expect that the expose_tokens to be a global. I think we can consider storing the provider in the cookie as well in order to determine which provider section to look into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider is indeed useful to store (logout needs it, for instance) but it makes sense to put that into userinfo rather than under tokens

Copy link
Collaborator

@jrieke jrieke Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fine to me! Might be a tiny bit nicer to be able to set it per provider, but if that's hard, totally fine to just keep it as a global setting and wait until someone creates an issue. I don't think that's a super common use case.

Comment on lines 221 to 222
The main cookie always exists and contains the first chunk.
Additional chunks are stored as cookie_name_1, cookie_name_2, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This docstring seems to be out of line with what is actually implemented. It looks like if there are multiple cookies then the first cookie is just the count.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct.

Comment on lines +85 to +90
set_cookie_with_chunks(
self._set_single_cookie,
self._create_signed_value,
TOKENS_COOKIE_NAME,
tokens,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: It seems like we're always setting this now. Shouldn't we only set this if expose_tokens is configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We will need the id token and refresh token internally for proper logout/refresh so the cookie should be set irrespective of expose_tokens.

See the two other PRs linked above that are waiting for this to get merged.


# Maximum size for a single cookie chunk (conservatively set to allow for overhead)
# Overhead seems to be around 110 bytes, but depends on Tornado internals, so having a safe 3x margin
cookie_space = 3800 - len(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: How sure are we about this 3800 magic number? Would it make more sense to actually compute this instead so that we are both more efficient and never going over the max size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a 3X margin, but I think I have an idea to make this cleaner.

kmcgrady pushed a commit to kmcgrady/streamlit that referenced this pull request Dec 19, 2025
available_for_base64_value = available_for_signed_value - signing_overhead

# Convert from base64 space to raw value space (base64 has 4/3 expansion ratio)
chunk_size = (available_for_base64_value * 3) // 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we need guard this potentially being a negative number?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You thinking just raising an error here? I figure if it's negative that means there's an error in the server signing (or our logic of sending the right function). Happy to add an error, but it might be a little overkill

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose raising an error makes most sense. I believe that it would output malformed tokens if we didn't raise here, which seems like a worse failure state.

@sfc-gh-kmcgrady sfc-gh-kmcgrady merged commit 4d13e83 into streamlit:develop Dec 22, 2025
38 of 40 checks passed
@sfc-gh-kmcgrady
Copy link
Contributor

@velochy I know it's been a while! But everything got approved and I successfully merged it in! Thank you again for your patience here!

@jrieke
Copy link
Collaborator

jrieke commented Dec 24, 2025

Amazing, thank you so much everyone and especially @velochy! This is (I think) the most complex external contribution we ever got, so huge kudos to you. We'd love to send you some swag as a thank you, feel free to fill out the form here, and we'll get something nice on the way: https://forms.gle/RZCUTKCrg7aHyqH78

@velochy
Copy link
Contributor Author

velochy commented Dec 26, 2025

Amazing, thank you so much everyone and especially @velochy! This is (I think) the most complex external contribution we ever got, so huge kudos to you. We'd love to send you some swag as a thank you, feel free to fill out the form here, and we'll get something nice on the way: https://forms.gle/RZCUTKCrg7aHyqH78

Wow. Would not have thought this would be the case as I don't feel the contribution was particularly big.
Thank you to @kmcgrady and @jrieke for getting it over the finish line. I see a fair bit of work still went into it after I thought I was done :)

I'll try to find some time over the holidays to rebase the other two PR-s and review them in light of the comments here. I really do feel both are important for the OAuth functionality to be fully usable. But yes, right now, the ball is back in my court :)

Meanwhile, happy holidays for everyone involved, and thank you all - @jrieke @kmcgrady @sfc-gh-bnisco @lukasmasuch @pallagj

kmcgrady added a commit that referenced this pull request Jan 12, 2026
…2) (#12693)

## Describe your changes

Currently, st.logout ignores end_session_endpoint (ESE) in the OIDC
configuration.
This can cause issues, as some OIDC providers keep their own cookies and
skip the login screen if they have an active session.

This change:
a) Writes the name of provider into the session cookie (so it would be
clear which provider the user is currently logged into)
b) Makes st.logout redirect to ESE if one is provided

This is a revised version of
#11901 to adress the issues
raised in
#12144 (comment)
Most notably:
1) client_id is added to ESE request (original issue, with attempted fix
at #12182)
2) id_token_hint is also added (addressing
#12144 (comment))
3) ESE request redirect target it changed to /oauth2callback (adressing
#12144 (comment))

This is a follow-up to #12044

## GitHub Issue Link (if applicable)

Closes #11900

## Testing Plan

- unit tests added both for a case where ESE is provided as well as when
not

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.

---------

Co-authored-by: Vincent Donato <vincent@streamlit.io>
Co-authored-by: Ken McGrady <ken.mcgrady@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR status:product-approved Community PR is approved by product team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option to Return Access Token in st.login