-
Notifications
You must be signed in to change notification settings - Fork 4k
Add session scoping to caches. #13482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a "scope" parameter through the caching stack. Create session-scoped caches as appropriate. Add methods to clear caches by session ID. Call those methods when sessions expire.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds session-scoped caching to st.cache_data and st.cache_resource, allowing cache entries to be scoped either globally (default) or per-session. Session-scoped caches are automatically cleared when sessions disconnect or shut down, enabling resource cleanup and per-session initialization patterns.
Key changes:
- Adds a new
scopeparameter ("global" or "session") to both caching decorators - Refactors cache storage from flat dictionaries to nested session-to-function-key mappings
- Implements
clear_session()methods to clean up session-specific caches - Integrates cache clearing into the session disconnect and shutdown lifecycle
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/streamlit/runtime/caching/cache_utils.py | Adds CacheScope type alias and get_session_id_or_throw() helper function; updates CachedFuncInfo to accept scope parameter |
| lib/streamlit/runtime/caching/cache_data_api.py | Refactors DataCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method |
| lib/streamlit/runtime/caching/cache_resource_api.py | Refactors ResourceCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method |
| lib/streamlit/runtime/app_session.py | Adds clear_session_caches() method and integrates it into shutdown and disconnect flows |
| lib/streamlit/runtime/websocket_session_manager.py | Calls clear_session_caches() when sessions disconnect |
| lib/tests/streamlit/runtime/caching/common_cache_test.py | Adds comprehensive tests for session-scoped cache lookup, clearing, and invalid scope handling |
| lib/tests/streamlit/runtime/caching/cache_utils_test.py | New test file for get_session_id_or_throw() utility function |
| lib/tests/streamlit/runtime/app_session_test.py | Updates shutdown tests to verify cache clearing and adds test for clear_session_caches() method |
SummaryThis PR adds a new
This feature addresses several GitHub issues (#8545, #6703) by enabling session-scoped resource management and disconnect hooks. Code QualityThe implementation is clean and follows existing codebase patterns well: Strengths:
Minor Issues:
Test CoverageThe test coverage is comprehensive and well-structured: Covered scenarios:
Tests follow best practices:
Potential additions (optional):
Backwards Compatibility✅ Fully backwards compatible:
Security & RiskNo security concerns identified:
Low regression risk:
Note on documented behavior: Recommendations
VerdictAPPROVED: This is a well-implemented feature that addresses a real user need for session-scoped caching. The code quality is high, test coverage is comprehensive, and the implementation follows existing patterns in the codebase. The minor issues noted (unused test function, missing return type annotation) are not blockers and can be addressed in a follow-up if desired. This is an automated AI review. Please verify the feedback and use your judgment. |
|
|
||
| if session_caches is not None: | ||
| for cache in session_caches.values(): | ||
| cache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage not closed when session caches are cleared
The clear_session method in DataCaches calls cache.clear() but does not call cache.storage.close() afterwards. This is inconsistent with clear_all() (which calls both clear() and storage.close() in its fallback path) and get_cache() (which calls storage.close() when replacing a cache). For disk-persisted session-scoped caches, this could result in file handles not being released when sessions are disconnected, leading to a resource leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updating.
| """Clear all cache_resource caches.""" | ||
| _resource_caches.clear_all() | ||
|
|
||
| def clear_session(self, session_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is clear_session supposed to be exposed to users via the public API? I think there isn't even an official API to retrieve the session ID. In case it should be exposed, it would be good to add a gather_metrics decorator to track its usage. cc @jrieke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its also cleaner to support this via a flag on the .clear method, e.g. clear(scope="session")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd casually considered this suitable for a public API - but I hadn't really thought through the fact that session ID isn't public info in an app. I don't think this is especially useful for a user - even clear has pretty limited utility - so it should likely just be made private.
I'll pull this into a helpful function at the module level instead, so that it's not an exported symbol by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API lifted to a non-public namespace.
This actually matches the scope of this function better, since it's operating on a module-level object ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, if we wanted to give users the ability to clear caches, it could just be done through the AppSession method.
Clear backing store for session cache for the data cache.
sfc-gh-jkinkead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
|
|
||
| if session_caches is not None: | ||
| for cache in session_caches.values(): | ||
| cache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updating.
| """Clear all cache_resource caches.""" | ||
| _resource_caches.clear_all() | ||
|
|
||
| def clear_session(self, session_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API lifted to a non-public namespace.
This actually matches the scope of this function better, since it's operating on a module-level object ...
lukasmasuch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
## Describe your changes Add session-level scoping to `st.cache_data` and `st.cache_resource`. Add the "scope" parameter through the caching stack. Create session-scoped caches as appropriate. Add methods to clear caches by session ID. Call those methods when sessions are disconnected. **Note**: This clears caches on disconnect _and_ shutdown. In the current websocket session manager, sessions only appear to be shut down when the backend process terminates - and so disconnection is the only hook that's actually invoked in the typical session lifecycle. I'm not sure if this is a bug or a design choice. Either way, like the docs note, this means that session caches might populate multiple times for a single user session in some edge-cases. I think this is fine. ## GitHub Issue Link (if applicable) - Fix for streamlit#8545. An `on_release` hook for a session-scoped resource can be used for disconnect hooks. - Implements one of the suggested fixes for streamlit#6703. ## Testing Plan See unit tests. --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.
Describe your changes
Add session-level scoping to
st.cache_dataandst.cache_resource.Add the "scope" parameter through the caching stack.
Create session-scoped caches as appropriate.
Add methods to clear caches by session ID. Call those methods when sessions are disconnected.
Note: This clears caches on disconnect and shutdown. In the current websocket session manager, sessions only appear to be shut down when the backend process terminates - and so disconnection is the only hook that's actually invoked in the typical session lifecycle. I'm not sure if this is a bug or a design choice. Either way, like the docs note, this means that session caches might populate multiple times for a single user session in some edge-cases. I think this is fine.
GitHub Issue Link (if applicable)
on_releasehook for a session-scoped resource can be used for disconnect hooks.Can be used to implement Session State convenience function for initialization #10089. A session-scopedReading this more closely, the issue is for a helper function to make this easy, not for means to init at session start.@st.cache_dataor@st.cache_resourcefunction invoked at the start of a script will work as an init function.Testing Plan
See unit tests.
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.