Skip to content

Conversation

@sfc-gh-tszerszen
Copy link
Contributor

📚 Context

Please describe the project or issue background here

This PR shows custom st.experimental_memo exception for snowflake.snowpark.table.Table and pyspark.sql.dataframe.DataFrame, additionally to snowflake.snowpark.dataframe.DataFrame.

  • What kind of change does this PR introduce?

    • Bugfix
    • Feature
    • Refactoring
    • Other, please describe:

🧠 Description of Changes

  • Add bullet points summarizing your changes here

    • This is a breaking API change
    • This is a visible (user-facing) change

Revised:

Insert screenshot of your updated UI/code here

Current:

Insert screenshot of existing UI/code here

🧪 Testing Done

  • Screenshots included
  • Added/Updated unit tests
  • Added/Updated e2e tests

🌐 References

Does this depend on other work, documents, or tickets?

  • Issue: Closes #XXXX

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@sfc-gh-tszerszen sfc-gh-tszerszen added the security-assessment-completed Security assessment has been completed for PR label Nov 30, 2022
@sfc-gh-tszerszen sfc-gh-tszerszen self-assigned this Nov 30, 2022
@sfc-gh-tszerszen sfc-gh-tszerszen force-pushed the improve-exception-on-snowpark-pyspark-integration branch from 329ad8c to 9d07678 Compare November 30, 2022 11:55
Copy link
Contributor

@tconkling tconkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, but needs a test. (Looks like we didn't already have a test for the snowflake.snowpark.dataframe.DataFrame case, but we should have).

It might be useful to pull this set of type names into a tuple at the top of the file, e.g.:

# We show a special "UnevaluatedDataFrame" warning for cached funcs
# that attempt to return one of these unserializable types:
UNEVALUATED_DATAFRAME_TYPES = (
  "snowflake.snowpark.table.Table", etc
)

So that we can just iterate them here, and also in the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't part of the PR, but can we move this class declaration outside the function - to the top of the file, or wherever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍 It's moved now into cache_errors.py file 👍

@sfc-gh-tszerszen sfc-gh-tszerszen force-pushed the improve-exception-on-snowpark-pyspark-integration branch from 9d07678 to e261a1a Compare December 3, 2022 10:06
@sfc-gh-tszerszen
Copy link
Contributor Author

@tconkling thank your for your review 🥇 This PR is now updated, can you please re-review it? 🙇

@sfc-gh-tszerszen sfc-gh-tszerszen force-pushed the improve-exception-on-snowpark-pyspark-integration branch from e261a1a to 18f5a19 Compare December 3, 2022 10:16
Copy link
Contributor

@tconkling tconkling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the tests!

Before merging, can we get some clarity on which exception types should actually be caught to handle the UnevaluatedDataFrameError?

class UnevaluatedDataFrameError(StreamlitAPIException):
pass

except (TypeError, RuntimeError, CacheError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly broad - is it actually possible for any of these errors to be raised from an uncollected dataframe? (My assumption is that it's really just a CacheError that could be raised from a pickling failure?)

Copy link
Contributor Author

@sfc-gh-tszerszen sfc-gh-tszerszen Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've introduced 3 types of exceptions here:

  • Snowpark triggers TypeError
  • Spark triggers RunTime error in the Testing environment, and CacheError when run on normal build

Is that ok?

Copy link
Contributor

@tconkling tconkling Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to narrow this down - what are the operations that cause the TypeError and RuntimeError errors?

in get_or_create_cached_value, we’re catching exceptions thrown by cache.write_result(value_key, return_value, messages). The st.experimental_memo implementation of cache.write_result uses the pickle utility to get the bytes to write to the cache, and pickling errors get re-raised as CacheError:

try:
    pickled_entry = pickle.dumps(multi_cache_results)
except pickle.PicklingError as exc:
    raise CacheError(f"Failed to pickle {key}") from exc

If we get TypeError and RuntimeError thrown from write_result, then it seems like it must be happening elsewhere in the function?

@sfc-gh-tszerszen sfc-gh-tszerszen merged commit e5bbc80 into develop Dec 6, 2022
tconkling added a commit that referenced this pull request Dec 7, 2022
* develop:
  Singleton: `ttl` and `max_entries` support (#5821)
  Bump express from 4.17.1 to 4.18.2 in /frontend (#5823)
  Bump qs from 6.5.2 to 6.5.3 in /frontend (#5822)
  Bump decode-uri-component from 0.2.0 to 0.2.2 in /frontend (#5802)
  Improve st.experimental_memo exception on Snowpark/PySpark integration (#5791)
@sfc-gh-kmcgrady sfc-gh-kmcgrady deleted the improve-exception-on-snowpark-pyspark-integration branch October 5, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants