Databricks

Connect your AI agents to Databricks.

Authentication: OAuth or personal access token. See Magic Link for the runtime auth flow, or Application credentials to bring your own OAuth app.

Sample use cases

  • List the 10 most recent failed jobs in the prod workspace and surface the error.
  • Run the nightly_etl job and notify #data-eng when it finishes.
  • Show me clusters with idle time >2 hours so we can shut them down.

Available Tools

list_clusters

List all clusters in the Databricks workspace. Returns cluster IDs, names, states, and configurations. Use this to find cluster IDs for other operations.

get_cluster

Get detailed information about a specific Databricks cluster including state, configuration, and resource allocation. Use list_clusters first to find the cluster ID.

create_cluster

Create a new Databricks cluster. Requires cluster name, Spark version, and node type. Specify num_workers for fixed size or autoscale_min/max_workers for autoscaling.

start_cluster

Start a terminated Databricks cluster. The cluster must be in TERMINATED state. Use list_clusters to find clusters and their states.

terminate_cluster

Terminate a running Databricks cluster. This stops the cluster but preserves its configuration for restarting. Use list_clusters to find cluster IDs.

list_jobs

List jobs in the Databricks workspace with optional name filter and pagination. Returns job IDs, names, and settings. Use page_token from response for next page.

get_job

Get detailed information about a specific Databricks job including tasks, schedule, and configuration. Use list_jobs first to find the job ID.

create_job

Create a new Databricks job with one or more tasks. Each task needs a task_key and type (notebook_task, spark_python_task, sql_task, etc). Supports scheduling with cron expressions.

delete_job

Permanently delete a Databricks job. This also cancels any active runs. Use list_jobs to find the job ID.

run_job_now

Trigger an immediate run of a Databricks job. Optionally pass notebook_params or python_named_params to override defaults. Use list_jobs to find the job ID.

list_job_runs

List job runs in the Databricks workspace. Filter by job_id, active_only, or completed_only. Supports offset/limit pagination. Returns run IDs, states, and timing info.

get_job_run

Get detailed information about a specific job run including state, timing, and task details. Use list_job_runs to find the run ID.

cancel_job_run

Cancel an active job run. The run must be in PENDING or RUNNING state. Use list_job_runs with active_only=true to find cancellable runs.

get_job_run_output

Get the output of a completed job run including notebook results, SQL output, logs, and error traces. Use list_job_runs to find the run ID.

execute_sql_statement

Execute a SQL statement on a Databricks SQL warehouse. Returns results synchronously within wait_timeout (default 10s) or a statement_id for async polling via get_sql_statement.

get_sql_statement

Get the status and results of a SQL statement execution. Use this to poll for results of async statements started with execute_sql_statement.

cancel_sql_statement

Cancel a running SQL statement execution. Use get_sql_statement first to verify the statement is still in PENDING or RUNNING state.

list_sql_warehouses

List all SQL warehouses in the Databricks workspace. Returns warehouse IDs, names, sizes, and states. Use this to find warehouse IDs for SQL execution.

get_sql_warehouse

Get detailed information about a specific SQL warehouse including state, size, cluster count, and active sessions. Use list_sql_warehouses to find the warehouse ID.

create_sql_warehouse

Create a new Databricks SQL warehouse. Requires a name and cluster_size (T-shirt sizing from 2X-Small to 4X-Large). Optionally configure autoscaling and auto-stop.

start_sql_warehouse

Start a stopped SQL warehouse. The warehouse must be in STOPPED state. Use list_sql_warehouses to find warehouses and their states.

stop_sql_warehouse

Stop a running SQL warehouse. This deallocates compute resources. Use list_sql_warehouses to find warehouses and their states.

list_workspace

List objects in a Databricks workspace directory. Returns notebooks, directories, files, repos, and libraries at the given path. Use ’/’ for the root directory.

get_workspace_object_status

Get metadata about a workspace object including type, language (for notebooks), and timestamps. Use list_workspace to find valid paths.

delete_workspace_object

Delete a workspace object (notebook, file, or directory). For non-empty directories, set recursive=true. Use list_workspace to find valid paths.

ask_genie

Ask a natural language question about your data using Databricks Genie. Provide either space_id or space_name to identify the Genie room.

create_vector_search_endpoint

Create a vector search endpoint to host vector search indexes. An endpoint must exist before creating indexes on it.

create_vector_search_index

Create a vector search index on an endpoint. Use DELTA_SYNC to auto-sync from a Delta table, or DIRECT_ACCESS for manual vector upserts.

delete_vector_search_endpoint

Delete a vector search endpoint. All indexes on the endpoint must be deleted first.

delete_vector_search_index

Delete a vector search index. This permanently removes the index and its data. Use query_vector_index to verify the index before deleting.

execute_sql_statement_readonly

Execute a read-only SQL statement on a Databricks SQL warehouse (SELECT, WITH, SHOW, DESCRIBE, EXPLAIN, VALUES, LIST).

export_notebook

Export a notebook’s content from the workspace. Returns base64-encoded content in the specified format (SOURCE, HTML, JUPYTER, DBC).

get_cluster_events

Get events for a cluster to diagnose issues. Returns creation, termination, resizing, errors, and driver events. Filter by time range or event type.

get_current_user

Get the current authenticated Databricks user and their workspace home directory. Returns user ID, username, display name, and home path (e.g.

get_genie_message

Get the status and results of a Genie message. Use this to poll for results when ask_genie returns status EXECUTING_QUERY.

get_table_info

Get detailed information about a table including column definitions, types, and properties. Provide the full three-level name (catalog.schema.table).

import_notebook

Create or overwrite a notebook in the workspace. Content must be base64-encoded. For SOURCE format, specify the language (PYTHON, SCALA, SQL, R).

list_catalogs

List all catalogs in Unity Catalog. Returns catalog names, owners, types, and descriptions.

list_genie_spaces

List available Genie spaces (rooms). Returns space IDs, names, and descriptions. Use this to find a space before asking questions with ask_genie.

list_schemas

List schemas within a Unity Catalog catalog. Returns schema names, owners, and descriptions. Use list_catalogs first to find valid catalog names.

list_serving_endpoints

List all model serving endpoints in the workspace. Returns endpoint names, states, served models, and configuration.

list_tables

List tables within a Unity Catalog schema. Returns table names, types, formats, and owners.

query_serving_endpoint

Query a model serving endpoint for predictions or chat completions. Automatically detects Foundation Model API endpoints (chat, completions, embedding…

query_vector_index

Query a vector search index using text or a vector. Returns the most similar documents with scores. Supports filtering and column selection.

rename_workspace_file

Rename or move a workspace file or notebook to a new path. Returns destination_url for the new location.

repair_run

Re-run failed tasks in a completed job run without re-running succeeded ones. Set rerun_all_failed_tasks=true to retry all failures, or specify indivi…

run_notebook

Submit a one-time notebook run. Requires either warehouse_id (SQL warehouse) or existing_cluster_id (cluster) for compute.