Skip to content

insert_rows_from_dataframe fails when NaN values are present in DataFrame #169

@tswast

Description

@tswast

Environment details

  • OS type and version: macOS Catalina (10.15.5)
  • Python version: python --version: Python 3.7.3
  • pip version: pip --version: pip 20.0.2
  • google-cloud-bigquery version: pip show google-cloud-bigquery
Name: google-cloud-bigquery
Version: 1.24.0
Summary: Google BigQuery API client library
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: /Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages
Requires: google-auth, six,

Steps to reproduce

  1. Create a dataframe containing NaN values.

    Pandas often uses these values as a NULL indicator, such as from the result of an outer join with a missing row in either the left or right dataframe.

  2. Attempt to upload this dataframe using the streaming API insert_rows_from_dataframe.

Code example

from google.cloud import bigquery
import pandas


client = bigquery.Client()

table = bigquery.Table("swast-scratch.my_dataset.nan_test")
table.schema = [
    bigquery.SchemaField("grp_col", "STRING"),
    bigquery.SchemaField("str_col", "STRING"),
    bigquery.SchemaField("int_col", "INTEGER"),
    bigquery.SchemaField("float_col", "FLOAT"),
]
client.create_table(table, exists_ok=True)

df1 = pandas.DataFrame({
        "grp_col": ["a", "b"],
        "str_col": ["a string", "b string"],
    }
)
df2 = pandas.DataFrame({
        "grp_col": ["b", "c"],
        "int_col": [1, 2],
        "float_col": [0.25, 0.5],
    }
)
merged = df1.merge(df2, how="outer", on="grp_col")
print(merged)

errors = client.insert_rows_from_dataframe(table, merged)
print(errors)

Stack trace

$ python upload_df_with_nan.py
  grp_col   str_col  int_col  float_col
0       a  a string      NaN        NaN
1       b  b string      1.0       0.25
2       c       NaN      2.0       0.50
Traceback (most recent call last):
  File "upload_df_with_nan.py", line 30, in <module>
    errors = client.insert_rows_from_dataframe(table, merged)
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2471, in insert_rows_from_dataframe
    result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2425, in insert_rows
    return self.insert_rows_json(table, json_rows, **kwargs)
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2569, in insert_rows_json
    timeout=timeout,
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 556, in _call_api
    return call()
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
    on_error=on_error,
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/Users/swast/miniconda3/envs/scratch/lib/python3.7/site-packages/google/cloud/_http.py", line 423, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/swast-scratch/datasets/my_dataset/tables/nan_test/insertAll: Invalid JSON payload received. Unexpected token.
string", "int_col": NaN, "float_col": Na

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions