-
Notifications
You must be signed in to change notification settings - Fork 104
Performance issues for queries returning larger arrays #310
Description
We recently upgraded the google-cloud-spanner library to the latest version (1.17.1 -> 3.3.0) as it includes a fix for a bug we were experiencing. Unfortunately, we are now experiencing some large performance issues when executing queries that return some rows with large arrays: one specific query went from 4 to 17 seconds. This results in longer calculation times in our pipelines.
We investigated the issue and found that the performance drop was a result of the _merged_values taking longer using the latest version of the library. This function parses the protobuf using _parse_value_pb and timing the parsing of the query mentioned above on both versions of the library gives the following result:
| proto type | # calls | avg execution time v1.17.1 (s) | avg execution time v3.3.0 (s) |
|---|---|---|---|
| BOOL | 672 | 4.00 x 10^-6 | 3.87 x 10^-6 |
| INT64 | 384 | 7.28 x 10^-6 | 7.14 x 10^-6 |
| FLOAT64 | 960 | 5.11 x 10^-6 | 4.40 x 10^-6 |
| TIMESTAMP | 480 | 8.12 x 10^-5 | 8.70 x 10^-5 |
| DATE | 672 | 4.57 x 10^-5 | 4.21 x 10^-5 |
| STRING | 2112 | 4.46 x 10^-6 | 3.93 x 10^-6 |
| ARRAY | 1920 | 4.43 x 10^-3 | 3.96 x 10^-2 |
As you can see, parsing an array has become a lot slower which is an issue if it is called quite often. This seems similar to another issue in this repo but the performance fixes mentioned over there are already merged.
Environment details
- OS type and version: macOs Big Sur (version 11.2.3)
- Python version: Python 3.7.5
- pip version: pip 21.0.1
- google-cloud-spanner version: 3.3.0
Steps to reproduce
Execute a query that returns one or more array columns with the arrays containing a large number of elements (for example: 999)