Skip to content

Conversation

@sfc-gh-jkinkead
Copy link
Contributor

@sfc-gh-jkinkead sfc-gh-jkinkead commented Dec 29, 2025

Describe your changes

Add session-level scoping to st.cache_data and st.cache_resource.

Add the "scope" parameter through the caching stack.

Create session-scoped caches as appropriate.

Add methods to clear caches by session ID. Call those methods when sessions are disconnected.

Note: This clears caches on disconnect and shutdown. In the current websocket session manager, sessions only appear to be shut down when the backend process terminates - and so disconnection is the only hook that's actually invoked in the typical session lifecycle. I'm not sure if this is a bug or a design choice. Either way, like the docs note, this means that session caches might populate multiple times for a single user session in some edge-cases. I think this is fine.

GitHub Issue Link (if applicable)

Testing Plan

See unit tests.


Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

Add a "scope" parameter through the caching stack.

Create session-scoped caches as appropriate.

Add methods to clear caches by session ID. Call those methods when sessions expire.
Copilot AI review requested due to automatic review settings December 29, 2025 20:50
@snyk-io
Copy link
Contributor

snyk-io bot commented Dec 29, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13482/streamlit-1.52.2-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13482.streamlit.app (☁️ Deploy here if not accessible)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds session-scoped caching to st.cache_data and st.cache_resource, allowing cache entries to be scoped either globally (default) or per-session. Session-scoped caches are automatically cleared when sessions disconnect or shut down, enabling resource cleanup and per-session initialization patterns.

Key changes:

  • Adds a new scope parameter ("global" or "session") to both caching decorators
  • Refactors cache storage from flat dictionaries to nested session-to-function-key mappings
  • Implements clear_session() methods to clean up session-specific caches
  • Integrates cache clearing into the session disconnect and shutdown lifecycle

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
lib/streamlit/runtime/caching/cache_utils.py Adds CacheScope type alias and get_session_id_or_throw() helper function; updates CachedFuncInfo to accept scope parameter
lib/streamlit/runtime/caching/cache_data_api.py Refactors DataCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method
lib/streamlit/runtime/caching/cache_resource_api.py Refactors ResourceCaches to use nested dictionaries for session scoping; adds scope parameter to decorator and clear_session() method
lib/streamlit/runtime/app_session.py Adds clear_session_caches() method and integrates it into shutdown and disconnect flows
lib/streamlit/runtime/websocket_session_manager.py Calls clear_session_caches() when sessions disconnect
lib/tests/streamlit/runtime/caching/common_cache_test.py Adds comprehensive tests for session-scoped cache lookup, clearing, and invalid scope handling
lib/tests/streamlit/runtime/caching/cache_utils_test.py New test file for get_session_id_or_throw() utility function
lib/tests/streamlit/runtime/app_session_test.py Updates shutdown tests to verify cache clearing and adds test for clear_session_caches() method

@lukasmasuch lukasmasuch added ai-review If applied to PR or issue will run AI review workflow security-assessment-completed Security assessment has been completed for PR change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Dec 31, 2025
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Dec 31, 2025
@github-actions
Copy link
Contributor

Summary

This PR adds a new scope parameter to st.cache_data and st.cache_resource decorators, enabling session-level caching alongside the existing global caching. Key changes include:

  • New scope parameter: Accepts "global" (default) or "session" values
  • Session-scoped cache storage: Session caches are stored separately using the session ID as a key dimension
  • Automatic cleanup: Session caches are cleared when sessions disconnect or shut down, with on_release callbacks invoked
  • New clear_session() API method: Added to both cache APIs for programmatic session cache clearing

This feature addresses several GitHub issues (#8545, #6703) by enabling session-scoped resource management and disconnect hooks.

Code Quality

The implementation is clean and follows existing codebase patterns well:

Strengths:

  • Consistent implementation across both cache_data and cache_resource APIs
  • Proper type annotations using CacheScope type alias
  • Thread-safe implementation with appropriate locking in DataCaches and ResourceCaches
  • Clear error messages for invalid scope values
  • Well-documented with comprehensive docstrings explaining behavior and edge cases

Minor Issues:

  1. Unused function in test file (lib/tests/streamlit/runtime/caching/cache_utils_test.py, lines 26-29):

    def function_for_testing(
        pos_one: int, pos_two: int, _scope: str, pos_three: int
    ) -> str:
        """Dummy function for testing function caches."""
        return f"{pos_one}-{pos_two}-{_scope}-{pos_three}"

    This function is defined but never used in the test file.

  2. Missing return type annotation on clear_session methods (cache_data_api.py line 693, cache_resource_api.py line 603):
    For consistency with other methods, consider adding explicit -> None return type annotation.

  3. Double cache clearing on shutdown: clear_session_caches() is called both in app_session.py:shutdown() (line 277) and in _handle_scriptrunner_event_on_event_loop() (line 706) when shutdown is requested. While this is idempotent and safe, it may cause redundant work. This appears intentional to ensure cleanup in all scenarios (sync and async paths).

Test Coverage

The test coverage is comprehensive and well-structured:

Covered scenarios:

  • ✅ Session scope cache lookups with multiple sessions (test_session_scope_handles_lookup)
  • ✅ Clearing individual session caches without affecting others (test_session_scope_handles_clear)
  • ✅ Bad scope parameter validation (test_bad_scope_raises_exception)
  • get_session_id_or_throw function behavior (GetSessionIdOrThrowTest)
  • clear_session_caches method in AppSession (test_clear_session_caches)
  • ✅ Shutdown behavior calling clear_session (test_shutdown)
  • ✅ Parameterized tests covering both cache_data and cache_resource

Tests follow best practices:

  • Uses @parameterized.expand to reduce code duplication
  • Uses proper mocking with patch.object
  • Tests include docstrings describing their purpose
  • Tests verify cache isolation between sessions

Potential additions (optional):

  • Test for on_release callback being invoked when session cache is cleared (for cache_resource)
  • Integration test for interaction between session-scoped and global-scoped caches with the same function key
  • Test for behavior when persist="disk" is combined with scope="session" (edge case)

Backwards Compatibility

Fully backwards compatible:

  • The scope parameter defaults to "global", preserving existing behavior
  • No breaking changes to existing APIs or method signatures
  • Existing user code will continue to work without modification
  • Global cache behavior is unchanged

Security & Risk

No security concerns identified:

  • Memory is properly managed with cache cleanup on session disconnect
  • Thread safety is maintained with appropriate locking
  • No new attack vectors introduced

Low regression risk:

  • Changes are additive and well-isolated
  • The default behavior (scope="global") is unchanged
  • Edge cases are documented (e.g., reconnected sessions may repopulate cache)

Note on documented behavior:
The docstring correctly notes that "disconnected sessions can reconnect - so it is possible for the cache to populate multiple times in a single session for the same key." This is appropriate documentation of expected behavior.

Recommendations

  1. Remove unused test function: Delete function_for_testing from cache_utils_test.py (lines 26-29) as it serves no purpose.

  2. Add return type annotation (optional): Add -> None to clear_session methods in both API classes for consistency:

    def clear_session(self, session_id: str) -> None:
  3. Consider adding typing test (optional): Add a test case in lib/tests/streamlit/typing/cache_types.py to verify the scope parameter is properly typed:

    @st.cache_data(scope="session")
    def cached_data_fn_with_scope(arg1: int) -> bool:
        return True
  4. Consider adding example (optional): Adding a usage example for session-scoped caching in the docstring would help users understand the feature:

    >>> @st.cache_resource(scope="session", on_release=lambda x: x.close())
    ... def get_session_connection():
    ...     return create_connection()

Verdict

APPROVED: This is a well-implemented feature that addresses a real user need for session-scoped caching. The code quality is high, test coverage is comprehensive, and the implementation follows existing patterns in the codebase. The minor issues noted (unused test function, missing return type annotation) are not blockers and can be addressed in a follow-up if desired.


This is an automated AI review. Please verify the feedback and use your judgment.


if session_caches is not None:
for cache in session_caches.values():
cache.clear()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage not closed when session caches are cleared

The clear_session method in DataCaches calls cache.clear() but does not call cache.storage.close() afterwards. This is inconsistent with clear_all() (which calls both clear() and storage.close() in its fallback path) and get_cache() (which calls storage.close() when replacing a cache). For disk-persisted session-scoped caches, this could result in file handles not being released when sessions are disconnected, leading to a resource leak.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updating.

"""Clear all cache_resource caches."""
_resource_caches.clear_all()

def clear_session(self, session_id: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is clear_session supposed to be exposed to users via the public API? I think there isn't even an official API to retrieve the session ID. In case it should be exposed, it would be good to add a gather_metrics decorator to track its usage. cc @jrieke

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its also cleaner to support this via a flag on the .clear method, e.g. clear(scope="session")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd casually considered this suitable for a public API - but I hadn't really thought through the fact that session ID isn't public info in an app. I don't think this is especially useful for a user - even clear has pretty limited utility - so it should likely just be made private.

I'll pull this into a helpful function at the module level instead, so that it's not an exported symbol by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API lifted to a non-public namespace.

This actually matches the scope of this function better, since it's operating on a module-level object ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if we wanted to give users the ability to clear caches, it could just be done through the AppSession method.

Clear backing store for session cache for the data cache.
Copy link
Contributor Author

@sfc-gh-jkinkead sfc-gh-jkinkead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!


if session_caches is not None:
for cache in session_caches.values():
cache.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updating.

"""Clear all cache_resource caches."""
_resource_caches.clear_all()

def clear_session(self, session_id: str) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API lifted to a non-public namespace.

This actually matches the scope of this function better, since it's operating on a module-level object ...

Copy link
Collaborator

@lukasmasuch lukasmasuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@sfc-gh-jkinkead sfc-gh-jkinkead merged commit 9b42523 into develop Jan 5, 2026
42 checks passed
@sfc-gh-jkinkead sfc-gh-jkinkead deleted the jkinkead-session-scoped-caches branch January 5, 2026 21:30
majiayu000 pushed a commit to majiayu000/streamlit that referenced this pull request Jan 9, 2026
## Describe your changes

Add session-level scoping to `st.cache_data` and `st.cache_resource`.

Add the "scope" parameter through the caching stack.

Create session-scoped caches as appropriate.

Add methods to clear caches by session ID. Call those methods when
sessions are disconnected.

**Note**: This clears caches on disconnect _and_ shutdown. In the
current websocket session manager, sessions only appear to be shut down
when the backend process terminates - and so disconnection is the only
hook that's actually invoked in the typical session lifecycle. I'm not
sure if this is a bug or a design choice. Either way, like the docs
note, this means that session caches might populate multiple times for a
single user session in some edge-cases. I think this is fine.

## GitHub Issue Link (if applicable)

- Fix for streamlit#8545. An
`on_release` hook for a session-scoped resource can be used for
disconnect hooks.
- Implements one of the suggested fixes for
streamlit#6703.

## Testing Plan

See unit tests.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants