As we continue our journey into the world of the GA4 BigQuery export, we’re faced with a common scenario: querying data spanning multiple days. The GA4 export organizes data into individual daily tables, and to analyze trends over time, we’ll need to learn how to query these tables collectively.
This is part 7 of 10 in our series, “Get Started with the GA4 BigQuery Export”. Check out the course overview for a full list of all the topics covered.
Introduction
As we discovered in tutorial 3, the GA4 BigQuery dataset is made up of multiple individual daily tables, each containing data collected on a specific date. This means that if you want to extract data for more than one day, you need to query multiple tables to do so.
In this tutorial, we’ll harness the capabilities of the * wildcard and acquaint you with the versatile _TABLE_SUFFIX parameter, equipping you with the skills to efficiently query across multiple GA4 tables. We’ll also introduce the concept of declaring variables, which not only streamlines date range specifications but also offers versatility across various use cases.
Important Note: In all the queries provided below, make sure to replace the placeholders within the WHERE clause with the specific values for the table you are working with. For example, if you want to extract data for January 31st 2021 from the e-commerce sample dataset, use the following values:
- <project> should be replaced with bigquery-public-data
- <dataset> should be replaced with ga4_obfuscated_sample_ecommerce
- <table> should be replaced with events_20210131
Ensure that you substitute these placeholders with your actual project, dataset, and table names when executing the queries.
Using the * wildcard
Until this point, our sample queries have been directed at individual tables, as demonstrated in the example below, which focuses on the table from January 31, 2021:
SELECT COUNT(*)
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
However, in many scenarios, you’ll need to query data spanning multiple dates. This is where the * wildcard comes into play, enabling you to effortlessly query all tables that adhere to the specified pattern.
Important Note: as you start to query multiple tables, you will be running queries against larger and larger volumes of data. As such, you should be mindful of the potential cost implications of doing so. Check the message in the top right of the query window to see how much data you will be processing with each query.
Remember, with BigQuery you get 1 TB of free queries every month (thats 1,000 GB or 1,000,000 MB) so for many people it’s not something you’ll have to worry too much about, but definitely something to be aware of!
Query all tables
Return the total count of records in all tables:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_*`
Query all tables from 2023
Return the count of records in all tables from 2023:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_2023*`
Query all tables from August 2023
Return the count of records in all tables from August 2023:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_202308*`
Using _TABLE_SUFFIX
The * wildcard can be further extended with the use of the _TABLE_SUFFIX parameter. This addition empowers you to target particular dates or date ranges in your queries:
Static dates
Return a count of records collected between 2023-08-08 and 2023-08-15:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20230808' AND '20230815'
Dynamic dates
Return a count of records collected in the last 7 full days:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 day))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
Static & dynamic dates
Return a count of records collected from the start of 2023 to yesterday:
SELECT COUNT(*)
FROM `<project>.<dataset>.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
Declaring variables
For queries involving static dates, leveraging BigQuery’s variable declaration feature can significantly enhance code readability and maintainability. This approach proves especially valuable when handling lengthy and intricate queries.
Employ the DECLARE function to establish start and end dates, subsequently referencing these variables in the WHERE
clause:
DECLARE start_date STRING DEFAULT '20230101';
DECLARE end_date STRING DEFAULT '20230131';
SELECT COUNT(*)
FROM `<project>.<dataset>.events_*`
WHERE _TABLE_SUFFIX BETWEEN start_date AND end_date
Wrapping up
In this tutorial, we’ve unlocked the power of the * wildcard and introduced you to the versatile _TABLE_SUFFIX parameter in BigQuery. You’re now equipped with the skills to efficiently query across multiple GA4 tables, which is essential when you need to extract data from various dates.
In the next tutorial, we’ll explore how to manipulate date and time data within BigQuery, an essential skill for performing time-based analyses and gaining deeper insights into your GA4 data.