Skip to content

feat(bigquery): add maximumBytesBilled to source config for query cost protection #2719

@paulodearaujo

Description

@paulodearaujo

Problem

When using the BigQuery source with AI agents, there is no way to enforce a per-query scan limit at the config level. The existing governance options (writeMode, allowedDatasets, maxQueryResultRows) control what and how much result the agent gets, but not how expensive a query can be.

An agent can accidentally run SELECT * FROM on a multi-TB table, and nothing in the toolbox config prevents it. The dry_run parameter on bigquery-execute-sql is opt-in — the agent has to remember to use it. This is guidance, not enforcement.

BigQuery's native maximumBytesBilled already solves this at the job level — if a query would scan more than the limit, it fails before executing and costs nothing.

Proposed Solution

Add an optional maximumBytesBilled field to the BigQuery source config:

sources:
  my-bigquery-source:
    kind: bigquery
    project: my-project-id
    writeMode: blocked
    allowedDatasets:
      - my_dataset
    maxQueryResultRows: 50
    maximumBytesBilled: 10737418240  # 10 GB in bytes

When set, the toolbox should pass this value to QueryJobConfig.maximum_bytes_billed on every query job submitted through this source. Queries that exceed the limit fail with a clear error before scanning any data.

Why This Matters

  • Cost governance for AI agents: Agents make autonomous query decisions. Config-level enforcement is the only reliable safeguard — description-level guidance is best-effort.
  • Consistency with existing governance model: writeMode enforces write safety at config level. allowedDatasets enforces data access at config level. maximumBytesBilled would enforce cost safety at config level — completing the governance trifecta.
  • Zero behavior change for existing users: The field is optional and defaults to no limit (current behavior).

Alternatives Considered

  • BigQuery custom daily quotas: Project-level or per-user daily byte limits via Cloud Console. Works but is a blunt instrument — one expensive query burns the quota for all subsequent queries that day.
  • Agent instructions in tool descriptions: Telling the agent to use dry_run=true first. Unreliable — depends on the agent following instructions correctly every time.

Metadata

Metadata

Assignees

Labels

priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.product: bigqueryBigQuerytype: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions