Prerequisites
What are you trying to do that currently feels hard or impossible?
Currently, there is no way to query or analyze data lineage using MCP Toolbox. For data engineers and analysts interacting with LLM agents, it is extremely difficult or impossible to perform impact analysis or track data flows (e.g., from source tables to target tables) programmatically through the MCP toolbox.
AI agents cannot answer contextual lineage questions such as:
- "What are the upstream sources for this BigQuery table?"
- "If I modify this table column, what downstream assets will be affected?"
- "Which process created this data flow link, and what was its origin and display name?"
This makes data governance, debugging, and impact analysis very manual and friction-heavy when using AI assistants.
Suggested Solution(s)
Implement a new native Google Cloud Data Lineage integration in MCP Toolbox, containing:
- Data Lineage Source (
datalineage):
- A new source wrapper that connects to the Data Lineage API (
cloud.google.com/go/datacatalog/lineage/apiv1).
- Implements a robust
SearchLineageStreaming accumulator that executes graph traversal and collects all streaming link results synchronously.
- Datalineage Search Tool (
datalineage-search-lineage):
- Exposes a native search tool with full support for search parameters:
locations, root_entities (with Column-Level Lineage / CLL wildcard support), direction (UPSTREAM/DOWNSTREAM), and limits (max_depth, max_results, max_process_per_link).
- Implements proper validation for tools.
- E2E Black-Box Integration Tests:
- Creates temporary synthetic lineage processes, runs, and events on the fly.
- Starts the server and validates manifests, upstream searches, casing, and process details under eventual consistency with robust polling backoff.
- Documentation:
- Complete Hugo-based reference guide and connection setup guide.
Additional Details
- MCP Toolbox version: latest / dev
- Database/Service: Google Cloud Data Lineage API
- Deployment: local / CI
Prerequisites
What are you trying to do that currently feels hard or impossible?
Currently, there is no way to query or analyze data lineage using MCP Toolbox. For data engineers and analysts interacting with LLM agents, it is extremely difficult or impossible to perform impact analysis or track data flows (e.g., from source tables to target tables) programmatically through the MCP toolbox.
AI agents cannot answer contextual lineage questions such as:
This makes data governance, debugging, and impact analysis very manual and friction-heavy when using AI assistants.
Suggested Solution(s)
Implement a new native Google Cloud Data Lineage integration in MCP Toolbox, containing:
datalineage):cloud.google.com/go/datacatalog/lineage/apiv1).SearchLineageStreamingaccumulator that executes graph traversal and collects all streaming link results synchronously.datalineage-search-lineage):locations,root_entities(with Column-Level Lineage / CLL wildcard support),direction(UPSTREAM/DOWNSTREAM), and limits (max_depth,max_results,max_process_per_link).Additional Details