Databricks connector
Automate Data Pipelines and AI Workflows with Databricks Integrations
Connect Databricks to your data stack and business tools to run end-to-end analytics, ML model deployment, and real-time data workflows.

What can you do with the Databricks connector?
Databricks is the unified data intelligence platform where engineering, analytics, and AI teams run their most demanding workloads — Delta Lake ETL pipelines, large-scale model training, and everything in between. Integrating Databricks with your CRM, data warehouse, BI tools, and operational systems lets you build automated pipelines that cut out manual handoffs and get insights to the people making decisions. With tray.ai, you can trigger Databricks jobs, sync query results downstream, and drop AI model outputs directly into business processes — no custom orchestration code required.
Automate & integrate Databricks
Automating Databricks business process or integrating Databricks data is made easy with tray.ai
Use case
Automated ETL Pipeline Orchestration
Trigger Databricks notebooks and jobs automatically when new data lands in cloud storage, a database, or an upstream SaaS tool. Instead of scheduling jobs on fixed cron intervals, you can build event-driven pipelines that process data the moment it arrives and route results to downstream systems like Snowflake, BigQuery, or Redshift.
Use case
ML Model Deployment and Inference Automation
Automate the full lifecycle from model training to production inference by connecting Databricks MLflow with your serving infrastructure and operational applications. When a new model version is registered and passes evaluation thresholds, tray.ai can promote it through staging environments and notify downstream consumers.
Use case
Real-Time Analytics Sync to Business Applications
Run Databricks SQL queries on a schedule or in response to business events and push fresh analytics directly into the tools your teams already use — dashboards, spreadsheets, CRM records, or custom apps. No more waiting for an analyst to manually pull and distribute the same report.
Use case
Data Quality Monitoring and Alerting
Connect Databricks Delta Live Tables and data quality checks to your incident response and communication tools so issues are caught and routed before they reach downstream consumers. tray.ai can read job run outcomes, flag row count anomalies or expectation failures, and turn them into actionable alerts.
Use case
Customer Data Enrichment Pipelines
Use Databricks as the computation engine for enriching CRM and product data at scale, then write results back to operational systems automatically. Combine behavioral event data, transaction history, and third-party signals in Delta Lake to produce enriched customer profiles that sync back to Salesforce, Segment, or your data warehouse.
Use case
Automated Reporting and Executive Dashboards
Schedule Databricks SQL queries and notebooks to produce reports that are automatically formatted and distributed to stakeholders via email, Slack, or collaboration tools. Stop asking analysts to manually export data and assemble the same recurring report every week.
Use case
AI Agent Data Retrieval and Context Injection
Use Databricks as the real-time data backend for AI agents built on tray.ai, so agents can query Delta Lake tables, retrieve vector search results, or run feature lookups mid-conversation. Your proprietary data stays in Databricks — no raw database credentials exposed to agent runtimes.
Build Databricks Agents
Give agents secure and governed access to Databricks through Agent Builder and Agent Gateway for MCP.
Agent Tool
Run SQL Query
Execute SQL queries against Databricks SQL warehouses to retrieve, transform, or aggregate data. Good for on-demand analytics or pulling specific datasets to inform downstream decisions.
Data Source
Fetch Query Results
Retrieve the results of previously executed SQL statements or notebook runs to use as context in agent reasoning. Agents can pull structured data outputs from Databricks pipelines directly into their responses.
Agent Tool
Trigger a Job Run
Start a Databricks job or workflow on demand, letting an agent kick off data pipelines, ETL processes, or ML model training in response to business events or user requests.
Data Source
Monitor Job Run Status
Check the status and progress of active or completed Databricks job runs. An agent can report on pipeline health, alert on failures, or wait for a job to finish before moving on.
Data Source
List and Search Clusters
Retrieve information about available Databricks clusters, including their state, configuration, and resource usage. Useful for capacity planning or routing workloads to the right compute resources.
Agent Tool
Start or Terminate a Cluster
Programmatically start or shut down Databricks clusters to manage compute costs and availability. An agent can automate cluster lifecycle based on workload schedules or cost thresholds.
Data Source
Read Delta Table Metadata
Fetch schema, partition details, and table statistics from Delta Lake tables registered in the Databricks Unity Catalog. Helps agents understand data structure before querying or transforming datasets.
Agent Tool
Submit a Notebook Run
Execute a specific Databricks notebook with configurable parameters, letting agents trigger ad-hoc data exploration, reporting, or model inference workflows programmatically.
Data Source
Query Unity Catalog for Data Assets
Search and retrieve metadata about tables, views, and schemas stored in Databricks Unity Catalog. Agents can discover available datasets and trace data lineage across the lakehouse.
Data Source
Retrieve Model Serving Predictions
Call a deployed Databricks Model Serving endpoint to get real-time ML model predictions. Agents can run model inference directly inside automated decision-making workflows.
Agent Tool
Create or Update a Job
Define or modify Databricks job configurations, including schedules, task dependencies, and cluster settings. Useful for agents that automate pipeline management or need to adjust orchestration logic on the fly.
Agent Tool
Upload Files to DBFS or Volumes
Write files or datasets to Databricks File System (DBFS) or Unity Catalog Volumes, letting agents stage data for downstream pipeline consumption or archive processed outputs.
Data Source
Fetch Experiment and Run Metrics from MLflow
Pull MLflow experiment runs, metrics, and model parameters tracked within Databricks. Agents can compare model performance and surface findings for data science teams without anyone having to dig through the UI manually.
Get started with our Databricks connector today
If you would like to get started with the tray.ai Databricks connector today then speak to one of our team.
Databricks Challenges
What challenges are there when working with Databricks and how will using Tray.ai help?
Challenge
Orchestrating Multi-Step Pipelines Across Databricks and Downstream Systems
Databricks is excellent at compute but wasn't designed to orchestrate the full round-trip of data moving between cloud storage, transformation jobs, and the operational SaaS tools that need the output. Teams end up writing brittle custom scripts or Airflow DAGs just to move results from a completed job into Salesforce or Slack.
How Tray.ai Can Help:
tray.ai has a visual workflow builder where you can model the entire pipeline — triggering a Databricks job, waiting for completion, handling success and failure branches, and routing results to any connected system — without managing orchestration infrastructure or writing boilerplate API code.
Challenge
Handling Asynchronous Job Completion and Long-Running Notebooks
Databricks jobs can take seconds or hours to complete, which makes it genuinely hard to build integrations that wait for results before proceeding. Polling logic, timeout handling, and partial failure states require significant engineering work when done outside a dedicated integration platform.
How Tray.ai Can Help:
tray.ai's Databricks connector has built-in job run polling with configurable intervals and timeout thresholds, so workflows automatically wait for job completion, detect terminal states, and branch on success or failure — no custom polling code needed.
Challenge
Securely Passing Credentials and Parameters Between Systems
Triggering parameterized Databricks jobs from external systems means securely injecting runtime parameters like file paths, account IDs, or date ranges. Teams often hardcode these values or store credentials insecurely in scripts, which creates real maintenance and security problems over time.
How Tray.ai Can Help:
tray.ai stores all API tokens and credentials in an encrypted credential store and lets you map dynamic values from upstream workflow steps directly into Databricks job parameters at runtime, so sensitive data never appears in plaintext configuration files.
Challenge
Keeping Business Systems in Sync with Databricks-Computed Data
Data teams compute genuinely useful signals — churn scores, product segments, revenue forecasts — inside Databricks, then struggle to push those outputs into the CRM, marketing, and support tools where GTM teams can actually do something with them. The result is a persistent gap between the best available data and the systems driving customer decisions.
How Tray.ai Can Help:
tray.ai handles the last-mile delivery of Databricks outputs by reading from Delta tables or job results and writing directly to CRM fields, marketing platform attributes, or any API-connected tool — on a schedule or in real time — without requiring data engineers to build custom connectors.
Challenge
Monitoring Data Pipeline Health Across Complex Workflows
When a Databricks job fails at 3 AM, teams often don't find out until analysts notice missing data the next morning. Without automated alerting wired into incident response tools, pipeline failures go undetected and data quality issues cascade into reporting errors that affect real business decisions.
How Tray.ai Can Help:
tray.ai monitors Databricks job run outcomes continuously and routes failure events to PagerDuty, Slack, or Jira the moment they occur — with full context including error messages, affected jobs, and run IDs — so data engineering teams can respond before downstream consumers are impacted.
Talk to our team to learn how to connect Databricks with your stack
Find the tray.ai connector with one of the 700+ other connectors in the tray.ai connector library to integrate your stack.
Integrate Databricks With Your Stack
The Tray.ai connector library can help you integrate Databricks with the rest of your stack. See what Tray.ai can help you integrate Databricks with.
Start using our pre-built Databricks templates today
Start from scratch or use one of our pre-built Databricks templates to quickly solve your most common use cases.
Template
Databricks Job Failure → Slack Alert + Jira Ticket
Monitors Databricks job run status and, when a run fails or is cancelled, posts a structured alert to a designated Slack channel and creates a Jira issue with job name, run ID, error message, and a direct link to the Databricks run page.
Steps:
- Poll Databricks Jobs API on a schedule or receive webhook trigger when a job run completes
- Check run result state for FAILED or CANCELLED outcomes and extract error details
- Post a formatted Slack message with job name, cluster ID, error trace, and run URL
- Create a Jira bug ticket with full run context and assign it to the on-call data engineering team
Connectors Used: Databricks, Slack, Jira
Template
Salesforce New Contact → Databricks Enrichment Job → Write Back Scores
When a new contact is created in Salesforce, triggers a Databricks notebook that computes lead scoring, firmographic enrichment, and propensity-to-buy signals, then writes the results back to custom Salesforce fields on the same contact record.
Steps:
- Trigger on Salesforce new contact creation event via tray.ai Salesforce connector
- Start a Databricks notebook run, passing the contact email and account ID as parameters
- Poll for job completion and retrieve output scores from Databricks DBFS or a Delta table
- Update the original Salesforce contact with lead score, segment, and enrichment fields
Connectors Used: Salesforce, Databricks
Template
Scheduled Databricks SQL → Google Sheets Dashboard Refresh
Runs a Databricks SQL query on a configurable schedule and writes the latest results into a designated Google Sheet, replacing stale data so stakeholders always have current metrics without analyst intervention.
Steps:
- Trigger the workflow on a daily or weekly schedule using tray.ai's built-in scheduler
- Execute a parameterized Databricks SQL statement and retrieve the result set
- Clear existing rows in the target Google Sheet tab and write fresh results with headers
- Post a Slack notification confirming the sheet has been updated with a direct link
Connectors Used: Databricks, Google Sheets
Template
New S3 File Upload → Trigger Databricks Ingestion Notebook
Watches for new files landing in an S3 bucket and automatically triggers the appropriate Databricks ingestion notebook, passing the file path, schema, and metadata as parameters so the pipeline processes data immediately without waiting for a scheduled run.
Steps:
- Detect new object creation events in a specified S3 bucket and prefix via AWS S3 trigger
- Extract file name, path, size, and timestamp metadata from the S3 event payload
- Submit a Databricks job run with the file path injected as a widget parameter
- Notify the data engineering Slack channel when ingestion completes or post an alert on failure
Connectors Used: AWS S3, Databricks, Slack
Template
Databricks ML Model Registered → Promote to Production + Notify Team
Listens for new model versions registered in Databricks MLflow Model Registry and, when a model passes a defined accuracy threshold, automatically transitions it to the Production stage and notifies the ML engineering team via Slack and updates a Confluence tracking page.
Steps:
- Poll Databricks MLflow Model Registry for new model versions in the Staging state
- Retrieve evaluation metrics for the new version and compare against defined thresholds
- Transition the model version to Production via Databricks MLflow API if thresholds are met
- Post a Slack announcement to the ML team channel with model name, version, and metrics
- Update the Confluence model changelog page with promotion details and timestamp
Connectors Used: Databricks, Slack, Confluence
Template
Databricks Batch Inference Results → HubSpot Contact Property Sync
After a Databricks batch scoring job completes, reads the inference output from a Delta table and syncs predicted scores, segments, and propensity values back to the corresponding HubSpot contact records, so sales and marketing teams can act on AI-generated signals.
Steps:
- Trigger when a Databricks inference job run reaches a completed state
- Query the output Delta table to retrieve email-to-score mappings from the latest run
- Paginate through results and batch upsert contact properties in HubSpot using email as the key
- Log sync statistics and any failed records to a Databricks audit table for reconciliation
Connectors Used: Databricks, HubSpot



