Problem
When using the BigQuery source with AI agents, there is no way to enforce a per-query scan limit at the config level. The existing governance options (writeMode, allowedDatasets, maxQueryResultRows) control what and how much result the agent gets, but not how expensive a query can be.
An agent can accidentally run SELECT * FROM on a multi-TB table, and nothing in the toolbox config prevents it. The dry_run parameter on bigquery-execute-sql is opt-in — the agent has to remember to use it. This is guidance, not enforcement.
BigQuery's native maximumBytesBilled already solves this at the job level — if a query would scan more than the limit, it fails before executing and costs nothing.
Proposed Solution
Add an optional maximumBytesBilled field to the BigQuery source config:
sources:
my-bigquery-source:
kind: bigquery
project: my-project-id
writeMode: blocked
allowedDatasets:
- my_dataset
maxQueryResultRows: 50
maximumBytesBilled: 10737418240 # 10 GB in bytes
When set, the toolbox should pass this value to QueryJobConfig.maximum_bytes_billed on every query job submitted through this source. Queries that exceed the limit fail with a clear error before scanning any data.
Why This Matters
- Cost governance for AI agents: Agents make autonomous query decisions. Config-level enforcement is the only reliable safeguard — description-level guidance is best-effort.
- Consistency with existing governance model:
writeMode enforces write safety at config level. allowedDatasets enforces data access at config level. maximumBytesBilled would enforce cost safety at config level — completing the governance trifecta.
- Zero behavior change for existing users: The field is optional and defaults to no limit (current behavior).
Alternatives Considered
- BigQuery custom daily quotas: Project-level or per-user daily byte limits via Cloud Console. Works but is a blunt instrument — one expensive query burns the quota for all subsequent queries that day.
- Agent instructions in tool descriptions: Telling the agent to use
dry_run=true first. Unreliable — depends on the agent following instructions correctly every time.
Problem
When using the BigQuery source with AI agents, there is no way to enforce a per-query scan limit at the config level. The existing governance options (
writeMode,allowedDatasets,maxQueryResultRows) control what and how much result the agent gets, but not how expensive a query can be.An agent can accidentally run
SELECT * FROMon a multi-TB table, and nothing in the toolbox config prevents it. Thedry_runparameter onbigquery-execute-sqlis opt-in — the agent has to remember to use it. This is guidance, not enforcement.BigQuery's native
maximumBytesBilledalready solves this at the job level — if a query would scan more than the limit, it fails before executing and costs nothing.Proposed Solution
Add an optional
maximumBytesBilledfield to the BigQuery source config:When set, the toolbox should pass this value to
QueryJobConfig.maximum_bytes_billedon every query job submitted through this source. Queries that exceed the limit fail with a clear error before scanning any data.Why This Matters
writeModeenforces write safety at config level.allowedDatasetsenforces data access at config level.maximumBytesBilledwould enforce cost safety at config level — completing the governance trifecta.Alternatives Considered
dry_run=truefirst. Unreliable — depends on the agent following instructions correctly every time.