Connectors / Databases · Connector

Automate Data Pipelines and AI Workflows with Databricks Integrations

Connect Databricks to your data stack and business tools to run end-to-end analytics, ML model deployment, and real-time data workflows.

Book a demo See all connectors

What can you do with the Databricks connector?

Databricks is the unified data intelligence platform where engineering, analytics, and AI teams run their most demanding workloads — Delta Lake ETL pipelines, large-scale model training, and everything in between. Integrating Databricks with your CRM, data warehouse, BI tools, and operational systems lets you build automated pipelines that cut out manual handoffs and get insights to the people making decisions. With tray.ai, you can trigger Databricks jobs, sync query results downstream, and drop AI model outputs directly into business processes — no custom orchestration code required.

View connector documentation

Automate & integrate Databricks

Automating Databricks business processes or integrating Databricks data is made easy with Tray.ai.

Learn about Intelligent iPaaS →

Use case

Automated ETL Pipeline Orchestration

Trigger Databricks notebooks and jobs automatically when new data lands in cloud storage, a database, or an upstream SaaS tool. Instead of scheduling jobs on fixed cron intervals, you can build event-driven pipelines that process data the moment it arrives and route results to downstream systems like Snowflake, BigQuery, or Redshift.

Reduce data latency by replacing time-based schedules with event-driven pipeline triggers
Automatically pass parameters and configurations to Databricks jobs from upstream workflow context
Route job outputs and error states to Slack, PagerDuty, or ticketing systems without custom code

Use case

ML Model Deployment and Inference Automation

Automate the full lifecycle from model training to production inference by connecting Databricks MLflow with your serving infrastructure and operational applications. When a new model version is registered and passes evaluation thresholds, tray.ai can promote it through staging environments and notify downstream consumers.

Trigger retraining jobs automatically when data drift or performance degradation is detected
Sync model registry state changes to CI/CD pipelines, Jira tickets, or Confluence documentation
Push batch inference results from Databricks directly into Salesforce, HubSpot, or your product database

Use case

Real-Time Analytics Sync to Business Applications

Run Databricks SQL queries on a schedule or in response to business events and push fresh analytics directly into the tools your teams already use — dashboards, spreadsheets, CRM records, or custom apps. No more waiting for an analyst to manually pull and distribute the same report.

Automatically refresh Salesforce or HubSpot fields with propensity scores computed in Databricks
Push aggregated metrics into Google Sheets or Notion for non-technical stakeholders
Trigger downstream workflows in Marketo or Pardot based on Databricks-computed customer segments

Use case

Data Quality Monitoring and Alerting

Connect Databricks Delta Live Tables and data quality checks to your incident response and communication tools so issues are caught and routed before they reach downstream consumers. tray.ai can read job run outcomes, flag row count anomalies or expectation failures, and turn them into actionable alerts.

Create PagerDuty incidents or Jira tickets automatically when Databricks data quality checks fail
Post structured Slack alerts with job run details, affected tables, and row counts when anomalies occur
Pause or gate downstream pipeline steps until data quality is confirmed, preventing bad data propagation

Use case

Customer Data Enrichment Pipelines

Use Databricks as the computation engine for enriching CRM and product data at scale, then write results back to operational systems automatically. Combine behavioral event data, transaction history, and third-party signals in Delta Lake to produce enriched customer profiles that sync back to Salesforce, Segment, or your data warehouse.

Automatically enrich new CRM records by triggering Databricks jobs when contacts are created in Salesforce
Sync computed lifetime value, churn probability, and product affinity scores back to HubSpot contact properties
Keep Segment user profiles updated with Databricks-computed traits on a rolling basis

Use case

Automated Reporting and Executive Dashboards

Schedule Databricks SQL queries and notebooks to produce reports that are automatically formatted and distributed to stakeholders via email, Slack, or collaboration tools. Stop asking analysts to manually export data and assemble the same recurring report every week.

Generate and distribute formatted PDF or CSV reports from Databricks results on any schedule
Post weekly KPI summaries directly to Slack channels with visual formatting and trend indicators
Automatically update Google Slides or PowerPoint decks with the latest metrics from Databricks queries

Build Databricks Agents

Give agents secure and governed access to Databricks through Agent Builder and Agent Gateway for MCP.

Agent Builder Agent Gateway for MCP Browse Agent Hub

Run SQL Query

Agent Tool

Execute SQL queries against Databricks SQL warehouses to retrieve, transform, or aggregate data. Good for on-demand analytics or pulling specific datasets to inform downstream decisions.

Fetch Query Results

Data Source

Retrieve the results of previously executed SQL statements or notebook runs to use as context in agent reasoning. Agents can pull structured data outputs from Databricks pipelines directly into their responses.

Trigger a Job Run

Agent Tool

Start a Databricks job or workflow on demand, letting an agent kick off data pipelines, ETL processes, or ML model training in response to business events or user requests.

Monitor Job Run Status

Data Source

Check the status and progress of active or completed Databricks job runs. An agent can report on pipeline health, alert on failures, or wait for a job to finish before moving on.

List and Search Clusters

Data Source

Retrieve information about available Databricks clusters, including their state, configuration, and resource usage. Useful for capacity planning or routing workloads to the right compute resources.

Start or Terminate a Cluster

Agent Tool

Programmatically start or shut down Databricks clusters to manage compute costs and availability. An agent can automate cluster lifecycle based on workload schedules or cost thresholds.

Read Delta Table Metadata

Data Source

Fetch schema, partition details, and table statistics from Delta Lake tables registered in the Databricks Unity Catalog. Helps agents understand data structure before querying or transforming datasets.

Submit a Notebook Run

Agent Tool

Execute a specific Databricks notebook with configurable parameters, letting agents trigger ad-hoc data exploration, reporting, or model inference workflows programmatically.

Query Unity Catalog for Data Assets

Data Source

Search and retrieve metadata about tables, views, and schemas stored in Databricks Unity Catalog. Agents can discover available datasets and trace data lineage across the lakehouse.

Retrieve Model Serving Predictions

Data Source

Call a deployed Databricks Model Serving endpoint to get real-time ML model predictions. Agents can run model inference directly inside automated decision-making workflows.

Create or Update a Job

Agent Tool

Define or modify Databricks job configurations, including schedules, task dependencies, and cluster settings. Useful for agents that automate pipeline management or need to adjust orchestration logic on the fly.

Upload Files to DBFS or Volumes

Agent Tool

Write files or datasets to Databricks File System (DBFS) or Unity Catalog Volumes, letting agents stage data for downstream pipeline consumption or archive processed outputs.

Fetch Experiment and Run Metrics from MLflow

Data Source

Pull MLflow experiment runs, metrics, and model parameters tracked within Databricks. Agents can compare model performance and surface findings for data science teams without anyone having to dig through the UI manually.

Ready to solve your Databricks integration challenges?

See how Tray.ai makes it easy to connect, automate, and scale your workflows.

Book a demo Talk to sales

Challenges Tray.ai solves

Common obstacles when integrating Databricks — and how Tray.ai handles them.

Challenge

Orchestrating Multi-Step Pipelines Across Databricks and Downstream Systems

Databricks is excellent at compute but wasn't designed to orchestrate the full round-trip of data moving between cloud storage, transformation jobs, and the operational SaaS tools that need the output. Teams end up writing brittle custom scripts or Airflow DAGs just to move results from a completed job into Salesforce or Slack.

How Tray.ai helps

tray.ai has a visual workflow builder where you can model the entire pipeline — triggering a Databricks job, waiting for completion, handling success and failure branches, and routing results to any connected system — without managing orchestration infrastructure or writing boilerplate API code.

Challenge

Handling Asynchronous Job Completion and Long-Running Notebooks

Databricks jobs can take seconds or hours to complete, which makes it genuinely hard to build integrations that wait for results before proceeding. Polling logic, timeout handling, and partial failure states require significant engineering work when done outside a dedicated integration platform.

How Tray.ai helps

tray.ai's Databricks connector has built-in job run polling with configurable intervals and timeout thresholds, so workflows automatically wait for job completion, detect terminal states, and branch on success or failure — no custom polling code needed.

Challenge

Securely Passing Credentials and Parameters Between Systems

Triggering parameterized Databricks jobs from external systems means securely injecting runtime parameters like file paths, account IDs, or date ranges. Teams often hardcode these values or store credentials insecurely in scripts, which creates real maintenance and security problems over time.

How Tray.ai helps

tray.ai stores all API tokens and credentials in an encrypted credential store and lets you map dynamic values from upstream workflow steps directly into Databricks job parameters at runtime, so sensitive data never appears in plaintext configuration files.

Templates

Pre-built Databricks workflows you can deploy in minutes.

Browse all templates

Databricks Job Failure → Slack Alert + Jira Ticket

Databricks

Slack

Jira

Monitors Databricks job run status and, when a run fails or is cancelled, posts a structured alert to a designated Slack channel and creates a Jira issue with job name, run ID, error message, and a direct link to the Databricks run page.

Salesforce New Contact → Databricks Enrichment Job → Write Back Scores

Salesforce

Databricks

When a new contact is created in Salesforce, triggers a Databricks notebook that computes lead scoring, firmographic enrichment, and propensity-to-buy signals, then writes the results back to custom Salesforce fields on the same contact record.

Scheduled Databricks SQL → Google Sheets Dashboard Refresh

Databricks

Google Sheets

Runs a Databricks SQL query on a configurable schedule and writes the latest results into a designated Google Sheet, replacing stale data so stakeholders always have current metrics without analyst intervention.

New S3 File Upload → Trigger Databricks Ingestion Notebook

AWS S3

Databricks

Slack

Watches for new files landing in an S3 bucket and automatically triggers the appropriate Databricks ingestion notebook, passing the file path, schema, and metadata as parameters so the pipeline processes data immediately without waiting for a scheduled run.

Databricks ML Model Registered → Promote to Production + Notify Team

Databricks

Slack

Confluence

Listens for new model versions registered in Databricks MLflow Model Registry and, when a model passes a defined accuracy threshold, automatically transitions it to the Production stage and notifies the ML engineering team via Slack and updates a Confluence tracking page.

Databricks Batch Inference Results → HubSpot Contact Property Sync

Databricks

HubSpot

After a Databricks batch scoring job completes, reads the inference output from a Delta table and syncs predicted scores, segments, and propensity values back to the corresponding HubSpot contact records, so sales and marketing teams can act on AI-generated signals.

How Tray.ai makes this work