Skip to content
Databricks logo Snowflake logo

Connectors / Integration

Connect Databricks and Snowflake to Unify Your Data Stack

Automate data pipelines between your lakehouse and cloud data warehouse to power faster analytics, ML, and business intelligence.

Databricks + Snowflake integration

Databricks and Snowflake cover most of what a modern data organization needs — large-scale data engineering on one side, governed SQL analytics on the other. Data science teams lean on Databricks for machine learning, feature engineering, and Spark-based transformations. BI teams depend on Snowflake's high-performance SQL warehouse for reporting and sharing. Connecting the two eliminates silos, cuts manual data movement, and lets processed insights flow automatically to where decisions actually get made.

When Databricks and Snowflake run separately, you end up with duplicated data, stale reports, and expensive manual handoffs between engineering and analytics. Connecting them through tray.ai lets you automate the full lifecycle — raw ingestion and transformation in Databricks, governed query-ready tables in Snowflake — without writing and maintaining custom ETL scripts. You get real-time or scheduled syncs of ML model outputs, aggregated metrics, and cleansed datasets straight into Snowflake schemas, where analysts, dashboards, and downstream tools can use them immediately. Engineering overhead drops, data stays fresh, and every team works from the same source of truth.

Automate & integrate Databricks + Snowflake

Automating Databricks and Snowflake business processes or integrating data is made easy with Tray.ai.

databricks
snowflake

Use case

Sync ML Model Outputs from Databricks to Snowflake

After training and scoring models in Databricks, teams need their predictions, scores, and feature outputs available to business users in Snowflake. tray.ai automates the transfer of model inference results from Databricks Delta tables directly into Snowflake target schemas on a scheduled or event-driven basis, so sales, marketing, and operations teams can act on ML-generated insights without waiting on manual data exports.

  • Eliminate manual CSV exports of model outputs between platforms
  • Make ML predictions available to BI tools connected to Snowflake within minutes of scoring
  • Maintain a full audit trail of when model results were synced and to which tables
databricks
snowflake

Use case

Automate ETL Pipelines from Snowflake to Databricks for Feature Engineering

Data science teams frequently need raw or semi-processed data from Snowflake loaded into Databricks to build features for machine learning models. tray.ai can trigger Databricks notebook runs or Delta Live Table pipelines whenever new data lands in Snowflake, creating a clean upstream-to-downstream workflow without engineers manually kicking off jobs or writing bespoke scheduling scripts.

  • Automatically trigger Databricks jobs when Snowflake tables are updated
  • Reduce feature engineering pipeline latency with event-driven automation
  • Decrease dependency on manual engineering intervention for routine data prep
databricks
snowflake

Use case

Write Aggregated Databricks Metrics Back to Snowflake for Reporting

Databricks jobs that produce aggregated KPIs, summary statistics, or transformed datasets can have their results written back to Snowflake so that existing BI tools and dashboards always reflect the latest numbers. tray.ai handles this write-back process — schema mapping, table upserts, and error notifications — without custom code, so analytics teams can trust that Snowflake always has the freshest processed data.

  • Keep Snowflake reporting tables current without manual refresh cycles
  • Support upserts and incremental loads to avoid full table rewrites
  • Alert data teams via Slack or email when write-back jobs fail or produce anomalies
databricks
snowflake

Use case

Replicate Reference and Lookup Tables from Snowflake into Databricks

Databricks workloads often depend on reference data — product catalogs, customer segments, currency tables — that lives in Snowflake. tray.ai automates scheduled replication of these lookup tables into Databricks so that notebooks and pipelines always join against current reference data, preventing stale enrichment from silently corrupting model training or transformation logic.

  • Keep Databricks enrichment data in sync with Snowflake master records
  • Schedule replication to run before nightly batch jobs automatically
  • Receive alerts when reference tables fail to sync on time
databricks
snowflake

Use case

Orchestrate Cross-Platform Data Quality Checks

Keeping data consistent between Databricks and Snowflake is a persistent headache for data engineering teams running parallel pipelines. tray.ai orchestrates automated reconciliation checks — comparing row counts, checksums, or aggregate values across both platforms — and routes discrepancy alerts to the right team channels, so you get visibility into pipeline health without building custom monitoring infrastructure.

  • Catch data drift between platforms before it impacts downstream reports
  • Route quality alerts directly to Slack, PagerDuty, or email
  • Log reconciliation results to a central audit table in Snowflake or Databricks
databricks
snowflake

Use case

Trigger Databricks Job Runs Based on Snowflake Data Events

Many data workflows require Databricks processing to kick off only when specific conditions are met in Snowflake — a new batch of records arriving, a table exceeding a row threshold, or a status flag being updated. tray.ai monitors Snowflake for these conditions and automatically triggers the corresponding Databricks job or workflow. Your pipelines react to actual data availability instead of running on a fixed clock, which cuts unnecessary compute spend.

  • Eliminate over-scheduled Databricks jobs that run on empty datasets
  • Reduce Databricks compute costs by running jobs only when data is ready
  • Enable event-driven architecture between your lakehouse and data warehouse

Challenges Tray.ai solves

Common obstacles when integrating Databricks and Snowflake — and how Tray.ai handles them.

Challenge

Managing Authentication Across Two Enterprise Platforms

Databricks and Snowflake use distinct authentication mechanisms — Databricks relies on personal access tokens and service principals, while Snowflake uses key-pair authentication, OAuth, or username-password with MFA. Rotating credentials for both platforms in custom scripts is error-prone, and when tokens expire mid-pipeline, you typically find out after data has already stopped moving.

How Tray.ai helps

tray.ai provides a secure, centralized credential store for both Databricks and Snowflake connections. Authentication is configured once per connector and reused across all workflows, with no credentials embedded in pipeline code. When tokens need rotation, only the tray.ai connector configuration needs updating — every workflow using it picks up the change automatically.

Challenge

Handling Schema Evolution and Mismatches Between Platforms

Databricks Delta tables and Snowflake schemas evolve independently as teams add columns, change data types, or rename fields. Custom ETL scripts moving data between the two frequently break when schemas drift, causing silent data loss or failed loads that are difficult to diagnose.

How Tray.ai helps

tray.ai's visual data mapper lets teams explicitly define and maintain column mappings between Databricks and Snowflake schemas. When a source schema changes, the workflow surfaces a clear mapping error rather than silently dropping or misrouting data. Teams can update mappings in the tray.ai UI without rewriting pipeline code.

Challenge

Orchestrating Job Dependencies Across Platform Boundaries

Many data pipelines require Databricks jobs to finish before Snowflake tables are loaded, or Snowflake queries to complete before Databricks notebooks are triggered. Building these cross-platform dependencies using each platform's native scheduler in isolation leads to hard-coded wait times, race conditions, and fragile cron-based coupling.

How Tray.ai helps

tray.ai acts as a cross-platform orchestration layer, letting teams build conditional, event-driven workflows that wait for job completion signals from one platform before triggering actions on the other. Native branching, retry logic, and status polling replace brittle time-based scheduling with reliable dependency management.

Templates

Pre-built workflows for Databricks and Snowflake you can deploy in minutes.

Databricks Job Completion → Write Results to Snowflake Table

Databricks Databricks
Snowflake Snowflake

This template monitors for Databricks job run completions and automatically writes the resulting Delta table data into a specified Snowflake target table, handling schema mapping and upsert logic.

Snowflake New Records → Trigger Databricks Notebook Run

Snowflake Snowflake
Databricks Databricks

This template polls a Snowflake table for new or updated rows on a schedule and automatically triggers a Databricks notebook or workflow job, passing relevant parameters such as record IDs or date ranges.

Scheduled Snowflake-to-Databricks Reference Data Sync

Snowflake Snowflake
Databricks Databricks

This template runs on a nightly schedule to copy reference and lookup tables from Snowflake into Databricks Delta tables, so that all batch jobs and ML pipelines use up-to-date enrichment data.

Databricks ML Scoring → Push Predictions to Snowflake

Databricks Databricks
Snowflake Snowflake

After a Databricks model scoring job finishes, this template extracts prediction results and loads them into a Snowflake table so that downstream BI tools and operational applications can immediately consume ML outputs.

Cross-Platform Data Reconciliation and Alerting

Databricks Databricks
Snowflake Snowflake

This template runs automated reconciliation checks between Databricks and Snowflake tables on a scheduled basis, comparing row counts and aggregate metrics, and routes discrepancy alerts to the responsible data team.

Snowflake Data Export → Databricks Feature Store Ingestion

Snowflake Snowflake
Databricks Databricks

This template automates the extraction of curated datasets from Snowflake and loads them into the Databricks Feature Store, so data science teams always train models on the latest available features.

Ship your Databricks + Snowflake integration.

We'll walk through the exact integration you're imagining in a tailored demo.