Navigating Data Differences Between Weld and Google Analytics 4
With Weld you can integrate your Google Analytics 4 data for easy transformation, modelling, and combination with other data sources to build the valuable business insights you need.
However, as is normal when combining data from different platforms, you might see some differences between the values in your Google Analytics dashboard and the values imported through their Reporting API into WELD. Don't worry, your data is safe and correct! This is a common occurrence, and it's all due to Google's mechanisms for handling data in their GA4 processes.
In this post, we look into the reasons behind the data discrepancies you might encounter, and what to keep in mind when you need to balance precision and efficiency. Let's start by taking a look at some of the approaches Google Analytics takes to handle your data analysis.
Data Sampling
Google Analytics 4 uses data sampling, a process in which only a subset of a dataset is used to estimate the characteristics of the entire dataset. This allows faster data retrieval and processing, due to the smaller amounts of data involved.
>In Google Analytics, data sampling may occur when the number of events used to create a report, exploration, or request exceeds the quota limit for your property.
[[GA4] About data sampling - Analytics Help]
The quota limits for event-level queries are, as of the writing of this post, 10 million for standard Google Analytics properties and 1 billion for Google Analytics 360 properties. If data sampling is being used, this will be indicated by the *data quality* icon in the top right of the different cards and explorations in your Google Analytics 4 dashboard.
The higher the percentage of data used, the more accurate and better quality your results will be.
HyperLogLog
When performing an exact count of distinct items (or *cardinality*) in a large dataset significant amounts of memory and computing resources are needed. Therefore, to reduce heavy memory usage and provide fast results, Google Analytics 4 utilizes the HyperLogLog++ (HLL++) algorithm, an augmented version of the HyperLogLog algorithm.
The HLL++ algorithm estimates the cardinality of several metrics in GA4, giving an approximation of the total. What this means in practice is that the values in your Google Analytics 4 dashboard are provided in a quick and efficient manner, but they are *approximations*. For most cases, the approximation is quite accurate, with a low error rate.
However, when you connect Google Analytics 4 to your WELD account, the values of the same cardinalities will likely differ. This is due to where and how your data is stored and processed through WELD: the destinations we offer have the time and resources to perform the necessary calculations and, consequently, will give you precise results on the distinct counts of session metrics.
You can see the results of HLL++ in your own GA4 dashboard: the totals presented for some of the metrics do not correspond to the sum of the values in the corresponding columns:
As you can see below, the values are different when the same data is explored through WELD's SQL editor. For example, you total session count from the Organic Search channel might show a value of 3959 in GA4, and a total of 3955 in your WELD account.
Considerations
Whenever you are in need of a quick look at your Google Analytics data, the GA4 dashboard will give you fast results, albeit slightly inaccurate, due to the use of both Data Sampling and the HyperLogLog++ algorithm. But, if precision is what you need, having your data connected through WELD will allow you to use the full power of any of our destinations to easily calculate the values of all the metrics you need.
References
- [Unique Count Approximation]
Continue reading
New Feature - AI Context
Our AI assistant, Ed, now lets you include contexts for your prompt, beyond all the useful features it already had!
How to set up your Shopify metrics in Weld
Learn how to set up your Shopify metrics in Weld and get actionable insights from your data.
New Connector Alert - Google My Business Profile
Looking to optimize your Google My Business Profile reporting? With our new ETL connector, you can effortlessly integrate your Google My Business Profile data with all your other data sources. Create a comprehensive view of your business metrics, enhance your analytics, and make more informed decisions with ease!