How to Query Google Analytics 4 Data in BigQuery
Ditching the limitations of the standard Google Analytics 4 interface is the first step toward true data mastery. By connecting GA4 to BigQuery, you gain access to your raw, unsampled event data, opening up a world of advanced analysis that simply isn't possible otherwise. This article will guide you through connecting your properties, understanding the data structure, and writing your first custom SQL queries from scratch.
Why Connect GA4 to BigQuery?
The standard GA4 reporting interface is excellent for top-level metrics and simple reports. But when you need to answer complex, business-specific questions, you’ll quickly hit a wall. Connecting to BigQuery, Google’s cloud data warehouse, provides several powerful advantages:
- Access to Raw Data: Every single event fired from your site or app is sent to BigQuery. This means you can work with unsampled data, providing a more accurate and granular view of user behavior.
- Data Ownership: Your GA4 data in BigQuery belongs to you. You can retain it for as long as you need, well beyond GA4's maximum 14-month data retention limit.
- Break Down Data Silos: Your Google Analytics data becomes immensely more valuable when you combine it with other information. In BigQuery, you can join your website behavior data with customer information from your CRM (like Salesforce or HubSpot), transaction data from your sales platform (like Shopify), or cost data from ad platforms (like Facebook Ads). This creates a single source of truth for your customer journey.
- Perform Advanced Analysis: With your raw data in a powerful SQL environment, you can perform sophisticated analyses like complex customer segmentation, custom attribution modeling, churn prediction with BigQuery ML, and deep-dive funnel analysis that’s impossible in the GA4 UI.
Step-by-Step: Setting up the GA4 to BigQuery Export
The process of linking GA4 to BigQuery is surprisingly straightforward. Before you begin, you'll need two things: Admin access to your Google Analytics 4 property and access to a Google Cloud Platform (GCP) project with billing enabled. Don’t worry - BigQuery has a generous free tier, and you won’t incur any costs unless your data processing surpasses the monthly free limits, which is unlikely when you're just starting out.
Step 1: Navigate to the BigQuery Links in GA4
In your GA4 property, go to the Admin section by clicking the gear icon in the bottom-left corner. In the 'Product Links' column, you'll see an option for BigQuery Links. Click it.
Step 2: Create a New Link
You’ll see a list of any existing links. Click the blue Link button in the top right corner. Then, click Choose a BigQuery project. This will show you a list of all GCP projects your Google account has access to. Select the project you want to use and click Confirm.
Step 3: Configure Your Export Settings
Next, you’ll configure the data export. You’ll need to:
- Select data streams: Choose which web or app data streams you want to export. Most users will just have one web stream to select here.
- Choose export frequency: You have two options, and you can enable both:
Step 4: Review and Submit
Finally, review your settings. Once you're sure everything is correct, click the Submit button. Your link will be created, and GA4 will begin exporting data to your BigQuery project. Be patient - it can take up to 24 hours for the first full day's worth of data to appear in BigQuery.
Understanding the GA4 Data Structure in BigQuery
Once data starts flowing, you’ll see new datasets and tables in your BigQuery project. The structure can be a bit confusing at first because it's designed to be efficient, not necessarily user-friendly.
If you enabled the "Daily" export, you'll see a table named events_YYYYMMDD for each day. For example, the data for October 26th, 2023, will be in a table called events_20231026. If you enabled the "Streaming" export, you’ll see a table called events_intraday_YYYYMMDD, which is continuously updated.
The real challenge comes from understanding the table schema. Instead of having columns like page_url, form_name, or link_text, GA4 stores most of this contextual information inside a single, complex column called event_params.
This column is what's known as a "repeated record," which is basically an array (a list) of key/value pairs. For any given event, like a page_view, the event_params column will contain several entries:
- One entry where the key is 'page_location' and the value is the page URL.
- One entry where the key is 'page_title' and the value is the page title.
- One entry where the key is 'ga_session_id' and the value is the session identifier.
To access the information inside this structure, you need to use a special SQL function: UNNEST().
Writing Your First GA4 Queries in BigQuery
Let's move on to the practical part: writing SQL queries. The goal here isn't to become a SQL expert overnight. It's to learn a few core patterns that will empower you to answer your most common business questions. Remember to replace your-project.your_dataset.events_YYYYMMDD with the actual name of your project, dataset, and the specific daily table you want to query.
Query 1: Counting Total Events on a Single Day
This is the simplest query you can run. It’s a great way to confirm that your data is flowing in correctly. It simply counts the total number of rows (events) in a given day's table.
SELECT
COUNT(*) AS total_events
FROM
`your-project.your_dataset.events_20231026`Query 2: Unlocking Parameters with UNNEST()
This is the most critical concept to master. The UNNEST() function takes your event_params array and flattens it into individual rows that you can query. Let's write a query to find the top 10 most viewed pages for a given day.
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url,
COUNT(*) AS pageviews
FROM
`your-project.your_dataset.events_20231026`
WHERE
event_name = 'page_view'
GROUP BY
page_url
ORDER BY
pageviews DESC
LIMIT 10Let's break that down:
- The subquery
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')is doing the heavy lifting. It unnests the parameters for each event, finds the one where thekeyispage_location, and pulls out its string value. WHERE event_name = 'page_view'ensures we're only looking at page view events.- The rest is standard SQL: grouping by the extracted URL, counting the occurrences, and ordering to find the top 10.
Query 3: Querying Across Multiple Days
Querying a single day at a time is limiting. BigQuery allows you to use a wildcard (*) character in your table name to query across multiple days, weeks, or months.
Let's find the total number of users and sessions who visited the site for the entire month of October 2023.
SELECT
DATE(TIMESTAMP_MICROS(event_timestamp)) AS event_date,
COUNT(DISTINCT user_pseudo_id) AS total_users,
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) AS total_sessions
FROM
`your-project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20231001' AND '20231031'
GROUP BY
event_date
ORDER BY
event_dateKey elements in this query:
events_*tells BigQuery to look at all tables that start with "events_".WHERE _TABLE_SUFFIX BETWEEN '20231001' AND '20231031'is how you filter the date ranges when using a wildcard. It's much more efficient than using aWHEREclause on theevent_datecolumn.- We create a unique session ID by concatenating the user's ID and the session ID, then count the distinct values.
Common Pitfalls and Best Practices
As you venture into BigQuery, keep these tips in mind to save time, money, and frustration.
- Manage Your Costs: BigQuery charges you based on the amount of data processed by your query, not the amount of data returned. Always use the
_TABLE_SUFFIXto limit your date range, and only select the columns you actually need. Avoid usingSELECT *on large tables. - Check the Validator: Before running a query, look at the green checkmark in the upper-right corner of the editor. It will estimate how much data the query will process, giving you a chance to catch overly expensive queries before you run them.
- Mind Your Data Types: Parameters in the
valuefield have different data types (string_value,int_value,double_value,float_value). Make sure you’re querying the right one. For example,ga_session_idis an integer, so you have to usevalue.int_valueto retrieve it. - Timestamp Conversion: The
event_timestampfield is stored in microseconds since the Unix epoch. Use theTIMESTAMP_MICROS()function to convert it into a human-readable timestamp.
Final Thoughts
Connecting GA4 to BigQuery is like upgrading from a point-and-shoot camera to a full-fledged DSLR. You move from appreciating pre-canned reports to having complete creative control over your raw data. By learning a few fundamental SQL queries and understanding the BigQuery data structure, you can unlock insights that are simply inaccessible within the GA4 interface.
We built Graphed because we believe gaining these deep insights shouldn't require you to become a SQL expert. While the process above is powerful, it comes with a steep learning curve. We created a way to get the same raw data analysis without the complexities. You can connect your data sources like Google Analytics just once, then ask questions in simple, natural language. Graphed automatically translates your requests into the correct queries and instantly builds live, interactive dashboards, saving you hours of wrestling with SQL and giving you back time to act on the insights.
Related Articles
How to Enable Data Analysis in Excel
Enable Excel's hidden data analysis tools with our step-by-step guide. Uncover trends, make forecasts, and turn raw numbers into actionable insights today!
What SEO Tools Work with Google Analytics?
Discover which SEO tools integrate seamlessly with Google Analytics to provide a comprehensive view of your site's performance. Optimize your SEO strategy now!
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.