How to Connect Databricks to Excel
Getting your data from a powerful platform like Databricks into a familiar tool like Excel doesn’t have to be a multi-day technical project. While exporting CSV files is always an option, there's a much better way to create a live, refreshable connection. This article will walk you through the exact steps to connect Databricks directly to Excel using an ODBC driver, enabling you to pull data for analysis, build reports, and create visualizations without constantly downloading static files.
Why Connect Databricks to Excel in the First Place?
Databricks is an incredible platform for processing, transforming, and analyzing enormous datasets. It's fast, scalable, and built for data science and engineering workflows. Excel, on the other hand, is the go-to tool for business users, financial analysts, and marketing teams everywhere. So why bridge the gap?
Familiarity and Accessibility: Almost everyone in a business setting knows their way around a spreadsheet. Connecting Databricks to Excel allows non-technical team members to interact with massive datasets using the tool they're most comfortable with - pivot tables, formulas, and charts.
Ad-Hoc Analysis: Sometimes you just need to quickly probe a subset of your data to answer a specific question. Instead of waiting for a formal report, you can pull the relevant data into Excel and start slicing and dicing it immediately.
Simple Reporting and Dashboards: For quick updates or straightforward reports, Excel is more than enough. You can build simple dashboards for stakeholders who don't need access to a full BI platform but want to see key performance indicators sourced from a reliable Databricks table.
Live, Refreshable Data: The biggest advantage over exporting files is the live connection. Once set up, you can refresh your spreadsheet with a single click to pull the most up-to-date data from Databricks. No more downloading a new CSV every Monday morning and rebuilding your pivot tables from scratch.
Connecting the two platforms gives you the best of both worlds: the raw processing power of Databricks and the universal accessibility of Excel.
What You’ll Need Before You Start
Before you dive in, let’s gather the ingredients. The process is straightforward, but it helps to have all your information ready to go. You will need:
Databricks Account Credentials: You’ll need to be able to log into your Databricks workspace.
Connection Details for Your Cluster or SQL Warehouse: Every Databricks cluster or SQL warehouse has specific connection details. You can find these within your Databricks workspace. They typically include:
Server Hostname: The URL for your Databricks instance.
Port: This is usually 443.
HTTP Path: The unique path to your specific compute resource.
A Personal Access Token (PAT): Think of a PAT as a secure password that applications like Excel can use to access Databricks. You generate this token inside your Databricks workspace under User Settings. Make sure to copy it somewhere safe immediately after creating it, as you won't be able to see it again.
The 64-bit Databricks ODBC Driver: This is a small piece of software that acts as a translator between Excel and Databricks. You can download this directly from the Databricks website. Be sure to download the 64-bit version, as modern versions of Microsoft 365 and Excel are 64-bit.
Note: To confirm your Excel version, navigate from within your copy of Microsoft Excel by selecting File → Account, then click About Excel.
Step 1: Get Your Databricks Connection Details and PAT
First, you need to grab the connection information and create an access token from your Databricks account. These details are essential for establishing the connection.
Find Your Hostname and HTTP Path
Log into your Databricks workspace.
Navigate to the SQL warehouse (or cluster) you want to connect to. Typically, you can find this under SQL in the left sidebar, then select SQL Warehouses.
Click on the specific warehouse you want to connect with.
Select the Connection details tab.
Here, you'll find the Server hostname and HTTP path. Copy these values into a safe place like a notepad - you'll need them in a moment.
Generate a Personal Access Token (PAT)
In Databricks, click your username in the top-right corner and select User Settings from the dropdown.
Switch to the Developer tab.
Next to Access tokens, click the Manage button.
Click the blue Generate new token button.
Give your token a descriptive comment (e.g., "Excel Connection") and set a lifetime for it. For security, don’t set it to last forever. A 90-day lifetime is often a good starting point.
Click Generate.
Important: Copy the generated token immediately and store it securely. You will not be able to see this token again once you close the window.
Step 2: Install and Configure the ODBC Driver
With your connection details and PAT in hand, it's time to install the ODBC driver and set up a Data Source Name (DSN). A DSN is simply a saved configuration that tells your computer how to connect to a specific data source - in this case, your Databricks warehouse.
Download and Install a Driver: Visit the Databricks JDBC/ODBC Drivers page and be sure to select and install the 64-bit version to align with modern versions of the MS Excel app.
Open the ODBC Data Source Administrator: Open your Windows Start Menu and simply type "ODBC." You should see an application called ODBC Data Sources (64-bit). Click to open it.
Create the DSN:
In the ODBC Data Source Administrator, select the User DSN tab (this makes the connection available only for your user profile, which is perfect for most cases).
Click the Add... button.
A new window will appear. Scroll through the list and select the Simba Spark ODBC Driver. This is the driver you just installed. Then click Finish.
Configure the Connection: This is where you'll plug in the details you gathered from Databricks.
Data Source Name: Give your connection a memorable, friendly name, like
Databricks Sales DataorMy Databricks Warehouse Link. You will select this name later in Excel.Host(s): Paste the Server Hostname you copied from Databricks here.
Port: Enter 443.
Authentication & → Mechanism: Select
OAuth 2.0from the dropdown list.HTTP Options → HTTP Path: Paste the HTTP path you copied from Databricks here.
IMPORTANT: Don't proceed quite yet. You'll now want to select the Advanced tab, and under properties, check the box labeled Enable user based authentication?
Now return to the previous screen to fill out the last few details. Select the authentication mechanism to OAuth 2.0. You may then proceed by selecting Login and inputting
tokenfor username and the PAT which you generated above for the password field.
Note: A DSN is simply a stored name that allows Windows an alias to link to the detailed login information from an outside account.
Close out all open dialog screens by selecting Save and you should be good to go! Be certain to follow these final steps from within Excel in order to complete the connection process.
Step 3: Connect and Pull Data into Excel
Now for the fun part! With all the configuration done, pulling the data into Excel takes just a few clicks.
Open a new or existing workbook in Microsoft Excel.
Go to the Data tab on the Ribbon.
Click Get Data → From Other Sources → From ODBC.
In the "From ODBC" window, click the Data Source Name (DSN) dropdown menu. You should see the friendly name you created earlier (e.g.,
Databricks Sales Data). Select it.Click the arrow next to Advanced options. This part is especially useful if you only need a specific dataset. In the box labeled
SQL statement, you can enter a specific SQL query (likeSELECT * FROM a_catalog.some_schema.your_table LIMIT 100) to pull exactly the data you need. If you leave this blank, Excel will let you browse all available tables visually.Click OK.
A Navigator window will appear. It's an interactive explorer of your Databricks environment. You can click down through your catalogs and schemas to find the specific table or view you want to import.
Select a table, and a preview of the data will appear on the right.
At the bottom of the window, you have two choices:
Load: Click this to pull the entire table directly into a new worksheet as a formatted Excel table. Use this for smaller tables or when you need all the raw data.
Transform Data: Click this to open Excel's Power Query Editor before loading the data. Power Query is a powerful tool where you can filter rows, remove or rename columns, join tables, and perform a variety of data-cleaning tasks. This is highly recommended for large tables, as you can dramatically reduce the amount of data loaded into the worksheet, which helps keep Excel running smoothly.
And that's it! Your Databricks data is now flowing directly into your Excel spreadsheet.
Tips for Managing Your Live Data Connection
You’ve established the connection, but here are a few tips to make the experience better.
Refreshing is Your Best Friend: The most powerful part of this connection is its ability to refresh. Any time you want the latest data, simply go to the Data tab and click Refresh All. Excel will communicate with Databricks and update your tables without any manual effort.
Be Mindful of Data Volume: Databricks can handle petabytes, but Excel cannot. Avoid trying to load millions of rows directly into a worksheet. If you're working with a large table, always use the Transform Data (Power Query) option to filter and aggregate your data before it gets loaded. Your computer's memory will thank you.
Use SQL for Precision: The "Advanced options" in the ODBC connection process where you can write a SQL statement is incredibly useful. For experienced users, writing a query is much faster than clicking through the Power Query interface to filter and select columns. It lets the powerful Databricks engine do the heavy lifting.
Final Thoughts
Connecting Databricks directly to Excel via the ODBC driver turns your spreadsheet from a static reporting tool into a dynamic analysis console. By following the steps to configure the driver and set up a DSN, you empower your team to query and analyze data in real-time, all within a familiar environment, eliminating tedious manual exports for good.
While this direct connection is a significant step up from manual CSV pulls, creating reports across multiple sources or for teams unfamiliar with DSNs can still introduce friction. That's why we created Graphed . We provide a central hub where you can connect all of your data sources in seconds, and then use simple, natural language to ask questions, create dashboards, and generate reports automatically. Instead of configuring drivers and working in spreadsheets, you can just ask, "Show me last month's sales by region from Databricks," and get an interactive chart instantly.