Databricks + Snowflake

Connect Databricks and Snowflake to Unify Your Data Stack

Automate data pipelines between your lakehouse and cloud data warehouse to power faster analytics, ML, and business intelligence.

Why integrate Databricks and Snowflake?

Databricks and Snowflake cover most of what a modern data organization needs — large-scale data engineering on one side, governed SQL analytics on the other. Data science teams lean on Databricks for machine learning, feature engineering, and Spark-based transformations. BI teams depend on Snowflake's high-performance SQL warehouse for reporting and sharing. Connecting the two eliminates silos, cuts manual data movement, and lets processed insights flow automatically to where decisions actually get made.

Automate & integrate Databricks & Snowflake

Use case

Sync ML Model Outputs from Databricks to Snowflake

After training and scoring models in Databricks, teams need their predictions, scores, and feature outputs available to business users in Snowflake. tray.ai automates the transfer of model inference results from Databricks Delta tables directly into Snowflake target schemas on a scheduled or event-driven basis, so sales, marketing, and operations teams can act on ML-generated insights without waiting on manual data exports.

Use case

Automate ETL Pipelines from Snowflake to Databricks for Feature Engineering

Data science teams frequently need raw or semi-processed data from Snowflake loaded into Databricks to build features for machine learning models. tray.ai can trigger Databricks notebook runs or Delta Live Table pipelines whenever new data lands in Snowflake, creating a clean upstream-to-downstream workflow without engineers manually kicking off jobs or writing bespoke scheduling scripts.

Use case

Write Aggregated Databricks Metrics Back to Snowflake for Reporting

Databricks jobs that produce aggregated KPIs, summary statistics, or transformed datasets can have their results written back to Snowflake so that existing BI tools and dashboards always reflect the latest numbers. tray.ai handles this write-back process — schema mapping, table upserts, and error notifications — without custom code, so analytics teams can trust that Snowflake always has the freshest processed data.

Use case

Replicate Reference and Lookup Tables from Snowflake into Databricks

Databricks workloads often depend on reference data — product catalogs, customer segments, currency tables — that lives in Snowflake. tray.ai automates scheduled replication of these lookup tables into Databricks so that notebooks and pipelines always join against current reference data, preventing stale enrichment from silently corrupting model training or transformation logic.

Use case

Orchestrate Cross-Platform Data Quality Checks

Keeping data consistent between Databricks and Snowflake is a persistent headache for data engineering teams running parallel pipelines. tray.ai orchestrates automated reconciliation checks — comparing row counts, checksums, or aggregate values across both platforms — and routes discrepancy alerts to the right team channels, so you get visibility into pipeline health without building custom monitoring infrastructure.

Use case

Trigger Databricks Job Runs Based on Snowflake Data Events

Many data workflows require Databricks processing to kick off only when specific conditions are met in Snowflake — a new batch of records arriving, a table exceeding a row threshold, or a status flag being updated. tray.ai monitors Snowflake for these conditions and automatically triggers the corresponding Databricks job or workflow. Your pipelines react to actual data availability instead of running on a fixed clock, which cuts unnecessary compute spend.

Use case

Sync Databricks Unity Catalog Metadata to Snowflake for Governance

Organizations managing data governance across Databricks Unity Catalog and Snowflake need a consistent view of datasets, owners, and lineage metadata. tray.ai can extract catalog metadata from Databricks and sync relevant records — table definitions, tags, ownership, and descriptions — into Snowflake governance tables or external catalogs, so you're not relying on engineers to manually keep separate documentation in sync.

Get started with Databricks & Snowflake integration today

Databricks & Snowflake Challenges

What challenges are there when working with Databricks & Snowflake and how will using Tray.ai help?

Challenge

Managing Authentication Across Two Enterprise Platforms

Databricks and Snowflake use distinct authentication mechanisms — Databricks relies on personal access tokens and service principals, while Snowflake uses key-pair authentication, OAuth, or username-password with MFA. Rotating credentials for both platforms in custom scripts is error-prone, and when tokens expire mid-pipeline, you typically find out after data has already stopped moving.

How Tray.ai Can Help:

tray.ai provides a secure, centralized credential store for both Databricks and Snowflake connections. Authentication is configured once per connector and reused across all workflows, with no credentials embedded in pipeline code. When tokens need rotation, only the tray.ai connector configuration needs updating — every workflow using it picks up the change automatically.

Challenge

Handling Schema Evolution and Mismatches Between Platforms

Databricks Delta tables and Snowflake schemas evolve independently as teams add columns, change data types, or rename fields. Custom ETL scripts moving data between the two frequently break when schemas drift, causing silent data loss or failed loads that are difficult to diagnose.

How Tray.ai Can Help:

tray.ai's visual data mapper lets teams explicitly define and maintain column mappings between Databricks and Snowflake schemas. When a source schema changes, the workflow surfaces a clear mapping error rather than silently dropping or misrouting data. Teams can update mappings in the tray.ai UI without rewriting pipeline code.

Challenge

Orchestrating Job Dependencies Across Platform Boundaries

Many data pipelines require Databricks jobs to finish before Snowflake tables are loaded, or Snowflake queries to complete before Databricks notebooks are triggered. Building these cross-platform dependencies using each platform's native scheduler in isolation leads to hard-coded wait times, race conditions, and fragile cron-based coupling.

How Tray.ai Can Help:

tray.ai acts as a cross-platform orchestration layer, letting teams build conditional, event-driven workflows that wait for job completion signals from one platform before triggering actions on the other. Native branching, retry logic, and status polling replace brittle time-based scheduling with reliable dependency management.

Challenge

Monitoring Pipeline Failures and Alerting the Right Teams

When data pipelines between Databricks and Snowflake fail — due to job timeouts, API rate limits, authentication errors, or data quality issues — ops and engineering teams often have no centralized view of what went wrong and must check each platform's native logs separately. By the time someone notices, bad data has frequently already propagated downstream.

How Tray.ai Can Help:

tray.ai provides workflow-level error handling with configurable retry policies and failure branches. When a Databricks-to-Snowflake pipeline step fails, tray.ai can automatically send alerts to Slack, PagerDuty, or email, log failure details to a designated Snowflake audit table, and optionally retry the failed step with exponential backoff — no custom monitoring code required.

Challenge

Scaling Data Volumes Without Breaking Pipeline Logic

Integration logic that works fine on thousands of rows can fail or time out on millions — which is a common reality when syncing Databricks query outputs or Snowflake analytical datasets. Custom scripts often lack built-in pagination, chunked writes, or bulk load support, making them brittle at production data volumes.

How Tray.ai Can Help:

tray.ai supports paginated data reads, looping constructs for batch processing, and bulk write operations to handle large data volumes between Databricks and Snowflake without timeouts. Workflows can be configured to process data in configurable chunk sizes, enabling reliable syncs at scale while staying within API rate limits for both platforms.

Start using our pre-built Databricks & Snowflake templates today

Start from scratch or use one of our pre-built Databricks & Snowflake templates to quickly solve your most common use cases.

Databricks & Snowflake Templates

Find pre-built Databricks & Snowflake solutions for common use cases

Browse all templates

Template

Databricks Job Completion → Write Results to Snowflake Table

This template monitors for Databricks job run completions and automatically writes the resulting Delta table data into a specified Snowflake target table, handling schema mapping and upsert logic.

Steps:

  • Poll Databricks Jobs API for completed job runs or receive a webhook on job completion
  • Read output data from the specified Databricks Delta table using a SQL query
  • Map source columns to destination Snowflake schema fields
  • Execute a Snowflake MERGE or INSERT INTO statement to load data with upsert support
  • Log sync status and row count to a Snowflake audit table and send a Slack notification

Connectors Used: Databricks, Snowflake

Template

Snowflake New Records → Trigger Databricks Notebook Run

This template polls a Snowflake table for new or updated rows on a schedule and automatically triggers a Databricks notebook or workflow job, passing relevant parameters such as record IDs or date ranges.

Steps:

  • Query Snowflake on a scheduled interval for records added or updated since the last run
  • Check if new records exist; branch the workflow to exit gracefully if none are found
  • Prepare job parameters including record IDs, timestamps, or batch identifiers
  • Submit a Databricks job run via the Jobs API with the constructed parameters
  • Monitor job run status and alert via email or Slack if the run fails

Connectors Used: Snowflake, Databricks

Template

Scheduled Snowflake-to-Databricks Reference Data Sync

This template runs on a nightly schedule to copy reference and lookup tables from Snowflake into Databricks Delta tables, so that all batch jobs and ML pipelines use up-to-date enrichment data.

Steps:

  • Trigger on a nightly cron schedule before batch processing windows begin
  • Execute a SELECT query against each configured Snowflake reference table
  • Write results to corresponding Databricks Delta tables using the Databricks API or DBFS
  • Validate row counts on both sides and flag discrepancies to the data engineering team
  • Update a sync log table in Snowflake with timestamp, table name, and row counts

Connectors Used: Snowflake, Databricks

Template

Databricks ML Scoring → Push Predictions to Snowflake

After a Databricks model scoring job finishes, this template extracts prediction results and loads them into a Snowflake table so that downstream BI tools and operational applications can immediately consume ML outputs.

Steps:

  • Detect Databricks scoring job completion via API poll or webhook trigger
  • Query the Databricks Delta output table to retrieve scored records and prediction values
  • Transform and flatten nested fields to match the Snowflake destination schema
  • Bulk load prediction records into Snowflake using staged file upload or INSERT batches
  • Notify analytics stakeholders via Slack or email that fresh predictions are available

Connectors Used: Databricks, Snowflake

Template

Cross-Platform Data Reconciliation and Alerting

This template runs automated reconciliation checks between Databricks and Snowflake tables on a scheduled basis, comparing row counts and aggregate metrics, and routes discrepancy alerts to the responsible data team.

Steps:

  • Run a COUNT and SUM aggregate query against the source table in Databricks
  • Run the equivalent query against the corresponding table in Snowflake
  • Compare results and calculate variance percentage between the two platforms
  • If variance exceeds the configured threshold, post an alert to a Slack channel or PagerDuty
  • Write reconciliation results — including timestamp, table names, and variance — to an audit table

Connectors Used: Databricks, Snowflake

Template

Snowflake Data Export → Databricks Feature Store Ingestion

This template automates the extraction of curated datasets from Snowflake and loads them into the Databricks Feature Store, so data science teams always train models on the latest available features.

Steps:

  • Trigger on a schedule or on demand via a webhook from an upstream data pipeline
  • Execute a parameterized SELECT query in Snowflake to extract the feature dataset
  • Stage the result set as a Parquet file in cloud object storage accessible to Databricks
  • Call the Databricks Feature Store API to ingest the staged file into the appropriate feature table
  • Send a confirmation notification with row count and feature table name to the data science team

Connectors Used: Snowflake, Databricks