Skip to content

st.cache_resource alternative for non-threadsafe object (that can't be serialized / used with st.cache_data) #6703

@JacobHayes

Description

@JacobHayes

Checklist

  • I have searched the existing issues for similar issues.
  • I added a very descriptive title to this issue.
  • I have provided sufficient information below to help reproduce this issue.

Summary

I'm trying to use the Google Drive API, which has a quite ugly runtime generated client. When using st.cache_resource with the client, I get a variety of SSL errors (eg: DECRYPTION_FAILED_OR_BAD_RECORD_MAC, BLOCK_CIPHER_PAD_IS_WRONG, BAD_RECORD_TYPE, etc) when changing input triggers a page rerender or with multiple tabs open. I think the client keeps a connection or something open that is not thread/multiprocess safe, similar to described here or here. Given this non-thread safety, st.cache_resource is not the right fit.

I think this same connection (or whatever) also prevents the client from being pickled (TypeError: cannot pickle '_cffi_backend.FFI' object), which means I also can't use it with st.cache_data.


What I'd like is something between the two: for an object to be cached once per thread/process (or "sessions"?). The streamlit caching docs highlight mutation and concurrency issues, but don't really give any guidance for thread unsafe things.

I would have expected this to be a bit more common of an issue (perhaps things like SQLAlchemy's engine's built in thread pooling cover a lot of ground), but couldn't really find any similar issues.

Reproducible Code Example

import streamlit as st
from google.oauth2 import service_account
from googleapiclient.discovery import Resource, build


def get_google_drive_service() -> Resource:
    scopes = [
        "https://www.googleapis.com/auth/drive",
        "https://www.googleapis.com/auth/spreadsheets",
        "https://www.googleapis.com/auth/forms",
    ]
    parent_creds = service_account.Credentials.from_service_account_info(
        {"service account info": "private"}, scopes=scopes
    )
    return build("drive", "v3", credentials=parent_creds.with_subject("some.email@example.com"), cache_discovery=False)


gdrive = st.cache_resource(get_google_drive_service())

Steps To Reproduce

Get some credentials to create a client as described here and fill them in the snippet above. Then, open a couple different tabs (perhaps add some inputs to trigger rerendering - I can update if there's no obvious solution / this repro is actually worth looking into).

Expected Behavior

There to be an argument to st.cache_resource OR alternative function that supports thread unsafe objects.

Current Behavior

No response

Is this a regression?

  • Yes, this used to work in a previous version.

Debug info

  • Streamlit version: 1.20.0
  • Python version: 3.10.9
  • Operating System: macOS 13.3.1 (arm64)
  • Browser: Chrome
  • Virtual environment:

Additional Information

No response

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature:cacheRelated to `st.cache_data` and `st.cache_resource`type:enhancementRequests for feature enhancements or new features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions