How to Enforce Data Retention Policies in Looker

Keeping every piece of data forever isn't just impractical - it's expensive and can introduce serious security and compliance risks. Setting up clear data retention policies is essential, but enforcing them within a powerful BI tool like Looker requires a specific approach. This guide will show you exactly how to manage data access and implement retention policies, focusing on where the work needs to happen: in your database and within Looker's caching and PDT settings.

Graphed

Your AI Data Analyst to Create Live Dashboards

Connect your data sources and let AI build beautiful, real-time dashboards for you in seconds.

Why Data Retention Policies Matter

Before jumping into the "how," it’s important to understand the "why." A well-defined data retention policy isn't about arbitrarily deleting data. It's a strategic plan that offers several significant benefits for your business.

Compliance and Legal Requirements: Regulations like GDPR in Europe and CCPA in California mandate that personal data should only be kept for as long as necessary. Failing to comply can result in hefty fines and damage to your brand's reputation.
Cost Management: Data storage isn't free. The more data you warehouse in platforms like BigQuery, Snowflake, or Redshift, the more you pay. Purging old, irrelevant data is a straightforward way to control escalating cloud costs.
Improved Performance: Querying a massive, cluttered database is slower and less efficient. A leaner database leads to faster queries in Looker, which means your dashboards load quicker and your team can get answers without drumming their fingers on the desk.
Reduced Security Risk: Every piece of data you store is a potential liability in the event of a data breach. Minimizing your data footprint reduces the amount of sensitive information that could be exposed.

Understanding Looker's Role in Data Retention

Here’s a common point of confusion: Looker does not store your core business data. It’s a modeling and visualization layer that sits on top of your existing database or data warehouse. When you run a query in Looker, it sends SQL code to your database (like BigQuery or Snowflake) and then visualizes the results that the database sends back.

This means the primary enforcement of your data retention policy - the actual deletion of old records - must happen directly in your source database. Looker will simply reflect whatever data is available there.

However, Looker does save some data temporarily for performance reasons. This is where your focus should be within the Looker platform itself. Specifically, you need to manage:

Looker's Query Cache: To speed things up, Looker temporarily stores the results of recent queries. If you delete data from your database but Looker serves an answer from its cache, users might see stale information.
Persistent Derived Tables (PDTs): These are special tables that Looker materializes (builds and writes) in your database on a set schedule. If you have PDTs that summarize old data, they won't automatically update just because you deleted the source records.

The goal is to align Looker’s caching and PDT refresh schedules with your database's cleanup schedule so that Looks and dashboards always reflect the current state of your data.

AI for Data Analysis

Crash Course

Graphed

Free PDF Guide

AI for Data Analysis Crash Course

Learn how to get AI to do data analysis for you — the best tools, prompts, and workflows to go from raw data to insights without writing a single line of code.

Setting the Policy in Your Data Warehouse

Since your database holds the data, the process starts there. Enforcing a retention policy involves defining it, scripting the deletion logic, and automating the process.

Step 1: Define Your Policy

First, decide on your rules. Don't do this in a vacuum, involve stakeholders from legal, marketing, and operations. Ask fundamental questions:

What data needs a retention rule? (e.g., user PII, event logs, marketing campaign results).
How long should different types of data be kept? (e.g., "delete user-level web events after 18 months," "anonymize customer PII 30 days after an account is closed").
What is the legal or business justification for this timeframe? Having this documented is crucial for compliance audits.

Step 2: Implement Deletion Logic at the Database Level

Once you have a defined policy, you need to create a script that executes it. This is usually a SQL script containing a DELETE statement. For example, if your policy is to delete user session data older than two years, your pseudo-SQL might look like this:

DELETE FROM web_analytics.sessions
WHERE session_start_timestamp < DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR)

Some data warehouses offer features to make this easier. Google BigQuery, for instance, allows you to set a "partition expiration," which automatically deletes data partitions after a specified time. This is far more efficient than running a massive DELETE query.

Step 3: Schedule a Regular Cleanup Job

Manually running your deletion script is a recipe for forgetting. You need to automate this process. Nearly every modern data warehouse and workflow management tool provides a way to schedule jobs. This could be:

A daily or weekly stored procedure.
A cron job that executes your SQL script.
A scheduled task within a cloud service like Google Cloud Scheduler or AWS Lambda.

Graphed

Your AI Data Analyst to Create Live Dashboards

Connect your data sources and let AI build beautiful, real-time dashboards for you in seconds.

Managing Data and Caching within Looker

Now that your database is being regularly cleaned, you need to make sure Looker doesn't hold onto old results. You'll do this in your LookML project by configuring datagroups.

What is a Datagroup?

A datagroup in Looker is a powerhouse for managing caching. It's an object you define in a LookML model file that sets a caching policy. You can then apply this datagroup policy to Explores, models, and PDTs to ensure they all refresh on the same schedule.

Controlling Looker Queries with Datagroups

Let's say your data warehouse purges old data every night at 2 AM. You want Looker to clear its cache shortly after to fetch fresh results. You can define a datagroup that triggers this refresh.

In your LookML model file (e.g., my_model.model.lkml), you would add:

datagroup: hourly_refresh {
  max_cache_age: "1 hour"
  sql_trigger: SELECT MAX(id) FROM etl_log
}

Here's what that does:

max_cache_age sets a hard limit. In this example, Looker will not serve cached results older than 1 hour, no matter what.
sql_trigger is the star of the show. Looker runs this query periodically. If the result of the query value is different from the last time it ran, the datagroup is triggered, and Looker invalidates all cached data associated with this datagroup.

To apply this to your Explores, you would set a persist_with parameter:

explore: orders {
  persist_with: hourly_refresh
}

explore: users {
  persist_with: hourly_refresh
}

Now, any dashboards or Looks built from the orders or users Explores will adhere to the hourly_refresh cache policy.

AI for Data Analysis

Crash Course

Graphed

Free PDF Guide

AI for Data Analysis Crash Course

Learn how to get AI to do data analysis for you — the best tools, prompts, and workflows to go from raw data to insights without writing a single line of code.

Handling Persistent Derived Tables (PDTs)

PDTs that summarize data over long periods can easily become stale after your cleanup scripts run. You can tie their rebuild schedule to your datagroup as well.

In your PDT's view file, you would use datagroup_trigger:

view: user_lifetime_value {
  derived_table: {
    datagroup_trigger: hourly_refresh
    sql: SELECT user_id, SUM(order_total) as total_revenue FROM orders GROUP BY 1
  }
}

With this configuration, whenever the hourly_refresh datagroup is triggered, Looker will not only clear the cache but also rebuild the user_lifetime_value table from scratch, ensuring it's always based on the current data in your warehouse.

Best Practices for a Healthy Data Environment

Implementing the steps above is a solid start. To maintain a truly clean and compliant environment, consider these additional practices.

Document Everything: Your data retention policy should live in a central, accessible document. It should detail what data is purged, when, and why. This is incredibly important for training new team members and satisfying any compliance requests.
Perform Regular Audits: Periodically verify that your deletion scripts are running correctly and that your Looker caches are behaving as expected. Check your database costs to see if your policies are having the intended financial impact.
Align Teams: Ensure that your marketing, sales, and product teams are aware of the retention policies. If the marketing team knows that user-level granular data is only available for 18 months, they won't try to build a 24-month lookback report and wonder why the data is incomplete.

Final Thoughts

Enforcing a data retention policy in Looker is a two-part process. The most critical part happens in your data warehouse, where you define and automate the deletion of records. The second part, handled within Looker, is about aligning your caching policies and PDT rebuilds to ensure users are always seeing fresh, accurate data that reflects your rules.

Managing LookML files, sql_trigger queries, and datagroups for data governance is powerful, but it's also a reflection of the technical configurations needed to maintain traditional BI tools. At Graphed, we created a different experience. We believe you should be able to get answers from all your marketing and sales platforms instantly, without the engineering overhead. Instead of building complex models, you can connect your data sources in seconds and ask questions in plain English to create live, interactive dashboards that are always up-to-date.

How to Enforce Data Retention Policies in Looker

Your AI Data Analyst to Create Live Dashboards

Why Data Retention Policies Matter

Understanding Looker's Role in Data Retention

AI for Data Analysis Crash Course

Setting the Policy in Your Data Warehouse

Step 1: Define Your Policy

Step 2: Implement Deletion Logic at the Database Level

Step 3: Schedule a Regular Cleanup Job

Your AI Data Analyst to Create Live Dashboards

Managing Data and Caching within Looker

What is a Datagroup?

Controlling Looker Queries with Datagroups

AI for Data Analysis Crash Course

Handling Persistent Derived Tables (PDTs)

Best Practices for a Healthy Data Environment

Final Thoughts

Related Articles

Facebook Ads for Home Cleaners: The Complete 2026 Strategy Guide

Facebook Ads for Pet Grooming: The Complete 2026 Strategy Guide

AI Marketing Apps: The 15 Best Tools to Scale Your Marketing in 2026