How to Schedule Google Analytics 4 Data Backfill in BigQuery

Cody Schneider7 min read

Setting up the Google Analytics 4 export to BigQuery opens up a world of advanced data analysis, but many users quickly discover a frustrating limitation: the data export only starts from the day you establish the link. It doesn't include your historical data. This article will show you exactly how to perform a data backfill and, more importantly, how to schedule it so you can be confident your dataset is as complete as possible.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Why Is a GA4 BigQuery Backfill Necessary?

When you first link GA4 to BigQuery, Google creates daily tables with your event data moving forward. It’s a great start, but it leaves you with an incomplete picture. You can't analyze long-term trends, compare this year's performance to last year's, or build complete user journey models without that historical context. This is what makes a manual backfill, and eventually a scheduled one, so valuable.

Backfilling lets you:

  • Achieve a Complete Dataset: Pull in data from the weeks or months before you set up the initial export, creating a unified dataset for comprehensive analysis.
  • Improve Trend Analysis: A longer time horizon allows you to spot seasonal trends, measure the long-term impact of campaigns, and make more accurate forecasts.
  • Fix Data Gaps: Even with an active link, GA4 data processing can sometimes have slight delays. A scheduled backfill of recent days can help ensure any straggling data is accounted for.

By scheduling a backfill, you automate the process of keeping your historical data up-to-date, ensuring your reports and dashboards are always built on the most complete information available.

Getting Started: Prerequisites for a Backfill

Before you jump in, you’ll want to have a few things in place. Think of this as getting your tools ready before you start the project to save headaches later.

You will need:

  1. A Linked GA4 and BigQuery Project: This process assumes you’ve already gone through the initial GA4 property settings to link to an active Google Cloud Platform (GCP) project with BigQuery enabled.
  2. Sufficient Permissions: In your GCP project, your user account needs the right permissions to manage data transfers. Key roles include BigQuery Data Editor or higher, Cloud Functions Developer, and Cloud Scheduler Admin for the automation steps.
  3. Billing Enabled: BigQuery, Cloud Functions, and Cloud Scheduler are paid services. While there are generous free tiers, you must have an active billing account linked to your project for any of this to work. Costs are typically low for this kind of process but are important to be aware of.

How to Manually Backfill GA4 Data in BigQuery

First, let's walk through the manual process. This is useful for one-time historical data pulls and serves as the foundation for the automated schedule we'll build later.

  1. Open the Google Cloud Console and navigate to BigQuery.
  2. In the BigQuery navigation menu, find and click on "Data transfers".
  3. You should see your existing Google Analytics transfer configuration listed. It will be named something like "Google Analytics GA4" followed by your property ID. Click on its name to go to the transfer details page.
  4. In the top right corner of the transfer details page, you will see a button labeled "SCHEDULE BACKFILL". Click it.
  5. A pop-up will appear allowing you to select a start and end date for your backfill. Select the date range for the historical data you wish to import. Note that there are limits to how far back you can go, typically correlating with when data collection in your GA4 property began.
  6. Click "Ok". BigQuery will queue up the transfer jobs to import the data for each day in your selected range. You can monitor the progress on the "RUN HISTORY" tab.

This is great for a one-off import, but nobody wants to do this every morning to ensure yesterday's data is fully synced. That's where automation comes in.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

How to Schedule GA4 Backfills with Cloud Functions and Cloud Scheduler

To fully automate this process, we'll use two powerful GCP services together: Cloud Functions (to execute the backfill script) and Cloud Scheduler (to run that function on a daily schedule). This might sound technical, but we’ll break it down into manageable steps.

Step 1: Get Your Transfer Configuration Name

Before writing any code, you need a specific piece of information: the full resource name of your data transfer configuration. This is how your script will tell BigQuery which transfer to run.

Go back to the transfer details page you visited in the manual steps. Look at the URL in your browser's address bar. It will look something like this:

console.cloud.google.com/bigquery/transfers/locations/us/configs/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx?project=your-project-name

The full name you need is everything after transfers/ but before the ?.

So, from that example, the full name is: projects/your-project-name/locations/us/transferConfigs/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Pro-tip: You'll likely just need to copy the project name, location (e.g., us), and config ID (xxxxxxxx...) from the URL and construct the name yourself formatted as above. Copy this entire string and save it in a text editor for a later step.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Step 2: Create a Cloud Function to Trigger the Backfill

A Cloud Function is a small piece of code that runs in response to a trigger — in our case, an HTTP request from Cloud Scheduler.

  1. In the Google Cloud Console, navigate to Cloud Functions.
  2. Click "CREATE FUNCTION". Configure the initial settings:
  3. Click "Next" to move to the code editor. Set the following:
  4. In the requirements.txt file, add the necessary Google Cloud client library:
# requirements.txt
google-cloud-bigquery-datatransfer
  1. In the main.py file, replace the boilerplate code with the following Python script. Remember to replace the placeholder in the parent variable with your unique Transfer Configuration Name from Step 1.
# main.py
from google.cloud import bigquery_datatransfer_v1
from datetime import datetime, timedelta, time

def schedule_daily_backfill(request):
    """
    HTTP Cloud Function to trigger a backfill for the previous day
    or a specific date from the request payload.
    """
    transfer_client = bigquery_datatransfer_v1.DataTransferServiceClient()

    # --- IMPORTANT: PASTE YOUR TRANSFER CONFIG PARENT NAME HERE ---
    parent = "projects/your-project-id/locations/us/transferConfigs/your-transfer-id"
    
    # Check for a specific date in the request first
    request_json = request.get_json(silent=True)
    if request_json and 'run_date' in request_json:
        # Expected date format: "YYYY-MM-DD"
        target_date_str = request_json['run_date']
        try:
            target_date = datetime.strptime(target_date_str, '%Y-%m-%d')
        except ValueError:
            return f"Invalid date format. Please use YYYY-MM-DD.", 400
    else:
        # Default to yesterday if no date is provided
        utc_now = datetime.utcnow()
        target_date = utc_now - timedelta(days=1)
    
    # We need to set the time component for the request
    run_time = datetime.combine(target_date, time(0, 0))

    start_manual_transfer_request = bigquery_datatransfer_v1.StartManualTransferRunsRequest(
        parent=parent,
        requested_run_time=run_time  
    )

    try:
        response = transfer_client.start_manual_transfer_runs(
            request=start_manual_transfer_request
        )
        run_count = len(response.runs)
        print(f"Successfully started {run_count} manual transfer run(s).")
        return f"Successfully started backfill for {target_date.strftime('%Y-%m-%d')}.", 200
    except Exception as e:
        print(f"Error starting transfer run: {e}")
        return f"Error starting transfer run: {str(e)}", 500
  1. Click "DEPLOY" and wait for the function to be created. It can take a few minutes. Once deployed, find and copy its Trigger URL.
GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Step 3: Create a Cloud Scheduler Job to Automate the Function

Finally, we need to create a scheduled "cron job" to call our function automatically.

  1. In the Google Cloud Console, navigate to Cloud Scheduler.
  2. Click "CREATE JOB".
  3. Configure the job settings:
  4. Under "Configure the execution," set the following:
  5. Click "CREATE". Your job is now active!

You can test it by clicking the "RUN NOW" button in the Cloud Scheduler list. Then, check the Run History back in the BigQuery Data Transfers interface to confirm a new manual run was successfully triggered for yesterday.

Final Thoughts

Manually starting a backfill for historical GA4 data is a solid first step, but automating the process with Cloud Functions and Scheduler transforms your setup into a resilient and reliable data pipeline. Taking the time to build this automation ensures your datasets are always as rich and complete as possible, empowering you to conduct deeper and more accurate analysis.

While this Google Cloud setup gives you powerful control, it also requires managing code, monitoring logs, and wrangling different tools to get your data in one place. We built Graphed to remove that friction completely. We connect directly to your Google Analytics account via a simple, one-click integration, handling the entire data pipeline automatically. We pull your entire historical data back-catalog without you having to touch a line of code or step inside the Google Cloud Console, allowing you to go from connecting your sources to building real-time dashboards in seconds.

Related Articles