26,301 questions
Advice
0
votes
2
replies
73
views
Replace 0 with NULL in a FLOAT64 column during a SELECT statement
I am trying to clean a dataset in BigQuery where missing entries were uploaded as 0. I want these to be NULL so they don't affect my AVG() calculations.
My attempt:
SELECT
Date,
NULLIF(Sales, 0) ...
Best practices
0
votes
5
replies
123
views
How to use concat
I'm analyzing wind speed and visibility data from 2023 in BigQuery, but the dataset wasn’t cleaned and missing values were entered as zeroes.
I'm trying to update those zeroes to NULL before running ...
Advice
0
votes
5
replies
135
views
Best SQL Function for Text Filtering
I am examining a large data set on sports activities amongst a population of people and need to find all entries of the word "soccer."
What is the best SQL function to filter all entries ...
Advice
0
votes
2
replies
111
views
Explain this SQL Query
I am learning SQL for my data analytics course and I have come across a query in which I have questions about. Can someone explain how the the query works to me.
SELECT
usertype,
CONCAT (...
Best practices
0
votes
1
replies
53
views
Minimum IAM role on target GCP project for GA4 Analytics Admin API CreateBigQueryLink?
We are building a B2B SaaS platform that programmatically creates BigQuery export links from customer GA4 properties to our GCP project using the Analytics Admin API.
API call:
POST https://...
1
vote
1
answer
181
views
Bigquery storage API `to_arrow_iterable` returns only 8 rows at a time
I have this code to retrieve millions of rows from my BigQuery query results:
query_job = client.query(
query,
)
storage_client = bigquery_storage....
Advice
0
votes
3
replies
90
views
How to remember all SQL codes effectively?
I'm currently taking the Google Data Analytics certificate, and the SQL code is a bit overwhelming. How can I remember them so whenever there is a need for help, I can do the job. Any advice or tips ...
0
votes
0
answers
108
views
BigQuery Storage Write API: "context deadline exceeded" only on low-frequency table
Problem
I'm using the BigQuery Storage Write API (Go managedwriter package) to upload data to three tables with very different ingestion rates:
Table
Frequency
Record Size
A
~10 records/sec
Several KB
...
0
votes
0
answers
87
views
GA4 BigQuery: events_intraday_YYYYMMDD returns ~2x more distinct user IDs than finalized events_YYYYMMDD partition
Some context:
I'm connecting mobile game data to BQ using Firebase/GA4 with BigQuery export. Every session has a USER_ID set in user_properties at client load. I have pipelines that run daily at 6 AM ...
-7
votes
1
answer
155
views
Sort data in BigQuery using order [closed]
I am learning BigQuery in Coursera. I am trying to sort a table but I am confused about how to use ORDER BY.
My table has columns like name and age. I want to sort the age from highest to lowest. What ...
Best practices
1
vote
3
replies
103
views
Difference between CONCAT with II and CONCAT with +
What is the difference between CONCAT with II and CONCAT with + in SQL?
Which should be used and when?.
If possible can anyone please explain with example.
Advice
1
vote
1
replies
111
views
Efficiently processing 1 year of daily historical files using dataflow
I have an Apache Beam pipeline (running on Dataflow) that normally performs a daily batch load from Cloud Storage to BigQuery. The source team has provided 1 year of historical data that needs to be ...
Best practices
0
votes
4
replies
118
views
SQL Query as a table
How to save query as a table in SQL on BigQuery?
I am doing data analysis and want to save a query as a table to reuse and save my time.
I was running a query in BigQuery platform for the practice and ...
Tooling
0
votes
3
replies
80
views
I would like to change values that were input as zero since they were missing into null values to avoid getting errors
However, you’re working with the newest data and it hasn’t been cleaned yet. Missing values were incorrectly entered as zeroes, and you need to change them to null values before you look for trends. ...
1
vote
0
answers
75
views
Why is clustering on _id not reducing bytes scanned in BigQuery for a table synced via Datastream?
I have a BigQuery table (approx. 140 GB in size) that is synchronized from MongoDB via a Google Cloud Datastream. I have set the _id column as the clustering column. However, when I run a query ...