Databricks
Authentication: OAuth or personal access token. See Magic Link for the runtime auth flow, or Application credentials to bring your own OAuth app.
Sample use cases
- List the 10 most recent failed jobs in the
prodworkspace and surface the error. - Run the
nightly_etljob and notify #data-eng when it finishes. - Show me clusters with idle time >2 hours so we can shut them down.
Available Tools
List all clusters in the Databricks workspace. Returns cluster IDs, names, states, and configurations. Use this to find cluster IDs for other operations.
Get detailed information about a specific Databricks cluster including state, configuration, and resource allocation. Use list_clusters first to find the cluster ID.
Create a new Databricks cluster. Requires cluster name, Spark version, and node type. Specify num_workers for fixed size or autoscale_min/max_workers for autoscaling.
Start a terminated Databricks cluster. The cluster must be in TERMINATED state. Use list_clusters to find clusters and their states.
Terminate a running Databricks cluster. This stops the cluster but preserves its configuration for restarting. Use list_clusters to find cluster IDs.
List jobs in the Databricks workspace with optional name filter and pagination. Returns job IDs, names, and settings. Use page_token from response for next page.
Get detailed information about a specific Databricks job including tasks, schedule, and configuration. Use list_jobs first to find the job ID.
Create a new Databricks job with one or more tasks. Each task needs a task_key and type (notebook_task, spark_python_task, sql_task, etc). Supports scheduling with cron expressions.
Permanently delete a Databricks job. This also cancels any active runs. Use list_jobs to find the job ID.
Trigger an immediate run of a Databricks job. Optionally pass notebook_params or python_named_params to override defaults. Use list_jobs to find the job ID.
List job runs in the Databricks workspace. Filter by job_id, active_only, or completed_only. Supports offset/limit pagination. Returns run IDs, states, and timing info.
Get detailed information about a specific job run including state, timing, and task details. Use list_job_runs to find the run ID.
Cancel an active job run. The run must be in PENDING or RUNNING state. Use list_job_runs with active_only=true to find cancellable runs.
Get the output of a completed job run including notebook results, SQL output, logs, and error traces. Use list_job_runs to find the run ID.
Execute a SQL statement on a Databricks SQL warehouse. Returns results synchronously within wait_timeout (default 10s) or a statement_id for async polling via get_sql_statement.
Get the status and results of a SQL statement execution. Use this to poll for results of async statements started with execute_sql_statement.
Cancel a running SQL statement execution. Use get_sql_statement first to verify the statement is still in PENDING or RUNNING state.
List all SQL warehouses in the Databricks workspace. Returns warehouse IDs, names, sizes, and states. Use this to find warehouse IDs for SQL execution.
Get detailed information about a specific SQL warehouse including state, size, cluster count, and active sessions. Use list_sql_warehouses to find the warehouse ID.
Create a new Databricks SQL warehouse. Requires a name and cluster_size (T-shirt sizing from 2X-Small to 4X-Large). Optionally configure autoscaling and auto-stop.
Start a stopped SQL warehouse. The warehouse must be in STOPPED state. Use list_sql_warehouses to find warehouses and their states.
Stop a running SQL warehouse. This deallocates compute resources. Use list_sql_warehouses to find warehouses and their states.
List objects in a Databricks workspace directory. Returns notebooks, directories, files, repos, and libraries at the given path. Use ’/’ for the root directory.
Get metadata about a workspace object including type, language (for notebooks), and timestamps. Use list_workspace to find valid paths.
Delete a workspace object (notebook, file, or directory). For non-empty directories, set recursive=true. Use list_workspace to find valid paths.
Ask a natural language question about your data using Databricks Genie. Provide either space_id or space_name to identify the Genie room.
Create a vector search endpoint to host vector search indexes. An endpoint must exist before creating indexes on it.
Create a vector search index on an endpoint. Use DELTA_SYNC to auto-sync from a Delta table, or DIRECT_ACCESS for manual vector upserts.
Delete a vector search endpoint. All indexes on the endpoint must be deleted first.
Delete a vector search index. This permanently removes the index and its data. Use query_vector_index to verify the index before deleting.
Execute a read-only SQL statement on a Databricks SQL warehouse (SELECT, WITH, SHOW, DESCRIBE, EXPLAIN, VALUES, LIST).
Export a notebook’s content from the workspace. Returns base64-encoded content in the specified format (SOURCE, HTML, JUPYTER, DBC).
Get events for a cluster to diagnose issues. Returns creation, termination, resizing, errors, and driver events. Filter by time range or event type.
Get the current authenticated Databricks user and their workspace home directory. Returns user ID, username, display name, and home path (e.g.
Get the status and results of a Genie message. Use this to poll for results when ask_genie returns status EXECUTING_QUERY.
Get detailed information about a table including column definitions, types, and properties. Provide the full three-level name (catalog.schema.table).
Create or overwrite a notebook in the workspace. Content must be base64-encoded. For SOURCE format, specify the language (PYTHON, SCALA, SQL, R).
List all catalogs in Unity Catalog. Returns catalog names, owners, types, and descriptions.
List available Genie spaces (rooms). Returns space IDs, names, and descriptions. Use this to find a space before asking questions with ask_genie.
List schemas within a Unity Catalog catalog. Returns schema names, owners, and descriptions. Use list_catalogs first to find valid catalog names.
List all model serving endpoints in the workspace. Returns endpoint names, states, served models, and configuration.
List tables within a Unity Catalog schema. Returns table names, types, formats, and owners.
Query a model serving endpoint for predictions or chat completions. Automatically detects Foundation Model API endpoints (chat, completions, embedding…
Query a vector search index using text or a vector. Returns the most similar documents with scores. Supports filtering and column selection.
Rename or move a workspace file or notebook to a new path. Returns destination_url for the new location.
Re-run failed tasks in a completed job run without re-running succeeded ones. Set rerun_all_failed_tasks=true to retry all failures, or specify indivi…
Submit a one-time notebook run. Requires either warehouse_id (SQL warehouse) or existing_cluster_id (cluster) for compute.