When recreating GA4 metrics in BigQuery, your queries can easily become quite complex due to the way event parameters are stored in the export. In this tutorial I demonstrate an easy 3-step process to reduce this complexity and make your queries easier to read and maintain.
Introduction
The structure of the GA4 BigQuery export provides us with flexibility to collect numerous and varied event parameters with each event. The trade-off of this flexibility however, is increased complexity when to comes to constructing queries against the data.
To demonstrate this problem, let’s walk through the process of calculating 3 common metrics from GA4:
- Sessions
- Engaged sessions
- Engagement rate
How to calculate sessions
To calculate Sessions we need to concatenate the user_pseudo_id with the ga_session_id and then count distinct values. The ga_session_id is stored as an event parameter so we utilise a subquery to extract it from event_params.
How to calculate engaged sessions
To calculate Engaged sessions we again need to concatenate the user_pseudo_id with the ga_session_id and then count distinct values, but only where the engaged_session parameter = 1. This calculation requires 2 subqueries, a CASE statement, a CONCAT and a COUNT function.
How to calculate engagement rate
To calculate Engagement rate we need to divide engaged sessions by sessions. To improve readability we can round the result by 2 decimal places. This requires 3 subqueries, a CASE statement, 2 CONCAT, 2 COUNT and a ROUND function.
I don’t know about you but that query makes my eyes hurt. And whilst I might know what it’s doing now, if I were to revisit it in a few months time it would take me a bit of time to understand what’s going on.
A simpler way to calculate metrics
To reduce this complexity I utilise CTEs and follow a simple 3-step process when constructing any query that involves more than a few basic metrics.
- Extract event parameters into an interim dataset
- Use this dataset to calculate the basic metrics
- Use these basic metrics in the formulas for calculated metrics
What is a CTE?
A CTE – or Common Table Expression – is a a temporary named result set derived from a query within a larger query. It’s basically a way of constructing a dataset which you can then refer to later on within a query.
So you could for instance, select a list of customers IDs within a CTE and then use those IDs to filter the results of another query. It’s a bit like using sub-queries but they are much more readable and easy to maintain.
A CTE is defined with the WITH command and follows the below structure. You can create multiple CTEs and refer to them within each other. You just need to make sure that it’s been defined earlier on in the query to refer to it later on.
WITH name_of_cte AS (
SELECT ...
FROM ...
),
name_of_cte_2 AS (
SELECT ...
FROM ...
),
name_of_cte_3 AS (
SELECT ...
FROM ...
)
Creating an interim dataset
The first step is to create an interim dataset that includes all the fields you’ll need. As part of this step we extract all necessary values from event_params and perform any transformations such as concatenating the user_pseudo_id and ga_session_id to create a unique value.
WITH flattened_data AS (
SELECT
geo.country,
CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS ga_session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged
FROM `<project>.<dataset>.<table>`
),
Calculate basic metrics
In the next step we calculate the basic metrics using the dataset we created in step 1. In this example that means a count of distinct ga_session_id to calculate sessions, and a count of distinct ga_session_id where session_engaged = 1 to calculate engaged sessions.
metrics AS (
SELECT
country,
COUNT(DISTINCT ga_session_id) AS sessions,
COUNT(DISTINCT CASE WHEN session_engaged = '1' THEN ga_session_id END) AS engaged_sessions
FROM flattened_data
GROUP BY country
)
Create calculated metrics
In the final step we use the metrics created in step 2 in the calculations for any dependant metrics. So in this example that just means dividing engaged_sessions by sessions. We can also reference those metrics individually so that they are included in the final results.
SELECT
country,
sessions,
engaged_sessions,
ROUND(engaged_sessions / sessions, 2) AS engagement_rate
FROM metrics
Bringing it all together
The final result will look something like this. Whilst this may end up with a larger query overall, I hope you’ll agree that it looks a lot simpler and is much easier to follow.
In terms of query performance, the size of the data being queried is the same so it’s really just about making things easier to read and maintain in the long term. Your future self will be grateful if you take the time now to structure your queries and reduce their complexity.
If you found this useful check out the following guides and sample queries for more information on calculating GA4 metrics in BigQuery.