Learn how to estimate data for the users who declined cookies on your site based on the events collected from users who accepted.
Introduction
One of GA4’s more advanced features is its ability to collect anonymised events from users who declined cookies and then use this data to model their behaviour. This means GA4 can estimate the total number of users, sessions, or conversions your site would have had if all users had opted in.
To take advantage of this feature, you must first implement consent mode to enable the collection of events from users who decline cookies. These events won’t contain any user identifiers, so you will be unable to tell how many users or sessions there were. However, GA4 can use the data collected from users who opted in to estimate the number of users who opted out.
How does Behavioural Modelling work?
GA4’s behavioural modelling uses machine learning to model the behaviour of users who decline cookies based on the behaviour of similar users who accept cookies.
Google doesn’t share details of how their algorithm works, but here’s a simplified example of how we can model the data ourselves.
Let’s assume that our website generates 1,000 users and 5,000 page views from opted-in users in a single day. That means we have an average of 5 views per user.
If we also had 2,000 page views from opted-out users, we could simply assume that those users also averaged 5 page views each. We’d then estimate that there were an additional 400 users who did not consent:
2,000 page views / 5 views per user = 400 users
Google’s model will be far more sophisticated than this, but this is the basic principle we’ll use to model the data ourselves.
How do we identify consented and unconsented users?
The key column we need to identify the different types of events is privacy_info.analytics_storage. When the user has opted in, the value will be set to “Yes,” and when the user has opted out, the value will be “No.”
When the value is “No”, the user_pseudo_id field will be null, and event parameters such as ga_session_id and ga_session_number will be missing.
How to estimate users and sessions in BigQuery
We’re going to create a very simple model to estimate daily users and sessions based on the average page views of consented users. The below query consists of 3 steps:
- Calculate the number of daily users, sessions and views generated by consented users.
- Calculate the number of daily views generated by unconsented users.
- Estimate the number of unconsented users and sessions using the average views per user and per session from step 1.
Step 1 – calculate consented users and sessions
The first part of our query is a CTE that calculates daily users, sessions, views, and views per user & session for the users who consented to tracking. Remember to replace the placeholder in the FROM clause with your own dataset and table name.
WITH consented_data AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS event_date,
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) AS sessions,
COUNTIF(event_name = 'page_view') AS views,
ROUND(COUNTIF(event_name = 'page_view') / COUNT(DISTINCT user_pseudo_id), 2) AS views_per_user,
ROUND(COUNTIF(event_name = 'page_view') / COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))), 2) AS views_per_session
FROM `<project>.<dataset>.<table>`
WHERE privacy_info.analytics_storage = 'Yes'
GROUP BY 1
),
Step 2 – calculate unconsented views
The next CTE simply calculates daily views for the users who did not consent. Again, just remember to replace the placeholder in the FROM clause with your own dataset and table name.
unconsented_data AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS event_date,
COUNTIF(event_name = 'page_view') AS views
FROM `<project>.<dataset>.<table>`
WHERE privacy_info.analytics_storage = 'No'
GROUP BY 1
)
Step 3 – estimate unconsented users and sessions
In the final step we join the consented and unconsented datasets together, then divide unconsented views by the consented views per user and add to the consented users count to calculate total users.
We do the same for sessions to calculate total sessions, and then also add the views together to calculate total views.
SELECT
c.event_date,
c.users + ROUND(u.views/c.views_per_user, 0) AS total_users,
c.sessions + ROUND(u.views/c.views_per_session, 0) AS total_sessions,
c.views + u.views AS total_views
FROM consented_data c
LEFT JOIN unconsented_data u ON c.event_date = u.event_date
ORDER BY 1
Generating the data
Now put all the steps together and you should be able to generate a results set like below.
Comparing to Google’s modelling
Let’s see how our modelling compares to that within Google Analytics (GA4). To do so you’ll first need to make sure that the reporting identity is set to ‘Blended’ and then create an exploration showing daily users, sessions and views.
I’ve mapped out the users and session totals from BigQuery and GA4 below. You can see that we’ve done a pretty good job at matching users – within about 5% of the total from GA4, and as close as 1.2% on the 27th!
Although we see a bit more of a difference in sessions (as much as 11.2%) the average is still only 7% off, which is not a bad result for such a simple model.
Wrapping up
The chart above shows that our model has done a reasonably good job at estimating the unconsented data. There are however further steps you could take to try and better match to the data from GA4.
You’ll remember that we used views per user and per session as the metrics to estimate the unconsented data. However, it might be that sessions per user would prove to be a better metric.
We used the average views for each day to estimate users and sessions but you could calculate the averages over the past month and feed that into your model. Perhaps mobile and desktop users behave differently, or users from different countries, so you might be better off estimating different types of users separately.
What we’ve created won’t be perfect and there are many different ways you could enhance it but this should provide you with a good starting point to build on.
Remember though, Google is only providing an estimate too. We’ll never know how accurate their model is so take the numbers with a pinch of salt.