-
Notifications
You must be signed in to change notification settings - Fork 322
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Environment details
Using the google-cloud-bigquery client with version 1.23.1
Python 3.7 (on linux and macos)
Steps to reproduce
- Using client.list_row with max_result and start_index induce wrong data to be pulled when
the client needs to use more than one page.
He then issued a second call with 'nextPageToken' and 'startIndex' wich seems to be incompatible.
Code example
def table_to_df_iterator(project_id, dataset_id, table_id) -> iter:
table_full_id = project_id + "." + dataset_id + "." + table_id
client = get_client()
index = 0
while True:
offset = BATCH_SIZE_ROWS * index
df = client.list_rows(table_full_id, max_results=BATCH_SIZE_ROWS,
start_index=offset).to_dataframe()
if df.empty:
break
logging.info(f"Offset is at {offset} got a dataframe of size {len(DataFrame.index)}")
yield df
index += 1Trace
DEBUG:google.cloud.bigquery.table:Started reading table 'samsung-global-dashboard.1_Raw.Facebook_SEUK_VIDEO_20190101' with tabledata.list.
DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/samsung-global-dashboard/datasets/1_Raw/tables/Facebook_SEUK_VIDEO_20190101/data?maxResults=100000&startIndex=100000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/samsung-global-dashboard/datasets/1_Raw/tables/Facebook_SEUK_VIDEO_20190101/data?pageToken=BEP6ZNORN4AQAAASAUIIBAEAAUNAQCEG6ADBBIENAYQP777777777777P4VAA%3D%3D%3D&maxResults=87354&startIndex=100000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/samsung-global-dashboard/datasets/1_Raw/tables/Facebook_SEUK_VIDEO_20190101/data?pageToken=BEP6ZNORN4AQAAASAUIIBAEAAUNAQCEG6ADBBOVKAUQP777777777777P4VAA%3D%3D%3D&maxResults=74708&startIndex=100000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/samsung-global-dashboard/datasets/1_Raw/tables/Facebook_SEUK_VIDEO_20190101/data?pageToken=BEP6ZNORN4AQAAASAUIIBAEAAUNAQCEG6ADBBVGHAQQP777777777777P4VAA%3D%3D%3D&maxResults=62062&startIndex=100000 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/samsung-global-dashboard/datasets/1_Raw/tables/Facebook_SEUK_VIDEO_20190101/data?pageToken=BEP6ZNORN4AQAAASAUIIBAEAAUNAQCEG6ADBB3XEAMQP777777777777P4VAA%3D%3D%3D&maxResults=49416&startIndex=100000 HTTP/1.1" 200 None
Idea to fix
Make the second call use an updated startIndex instead of 'nextPageToken'
Thanks!
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.