
Tonight Poplin Data team members Mike and Narbeh are debating the merits of Snowplow Analytics with representatives of Google Analytics and Adobe Analytics at Web Analytics Wednesday. The head-to-head aspect of it is meant to be lighthearted, but it’s forced us to think about some of the ways Snowplow Analytics is a better match for many types of digital analytics problems.
So without further ado, here’s a rundown of features where Snowplow has unique qualities over Google and Adobe Analytics.
Feature | Snowplow Insights |
Google Analytics |
Adobe Analytics |
---|---|---|---|
Real-time | ✔ | Some | Some |
Unsampled data and performance | ✔ | ✗ | ✗ |
Custom structured data models | ✔ | ✗ | ✗ |
Personally Identifiable Information | ✔ | ✗ | ✔ |
SQL interface | ✔ | ✔ | ✗ |
Cross-domain tracking and user identification | ✔ | ✗ | ✔ |
Data sovereignty | ✔ | ✗ | ✗ |
Change goals with historical data | ✔ | ✗ | ✗ |
Accurate time spent metric | ✔ | ✗ | ✗ |
Custom enrichments | ✔ | ✗ | ✔ |
Incoming Webhooks | ✔ | ✗ | ✗ |
High-cardinality | ✔ | (none) | (Low-Traffic) |
IP addresses and user-agent strings | ✔ | ✗ | ✗ |
Real-time
Snowplow Analytics can operate in real-time mode with latency down to about 5 seconds from collection to availability. Google and Adobe Analytics offer this with a subset of dimensions and events, but Snowplow gives you access to the entire enriched event stream, including all the richness of custom structured data models. See also: Custom structured data models.
Unsampled data and performance
Because Snowplow runs in your own cloud infrastructure, you have control over the inevitable price-performance tradeoffs involved in collecting, processing and analysing large event volumes. You can decide to have full, unsampled data available speedily by throwing more infrastructure at the problem, or have it available less quickly to reduce cost. See also: High cardinality.
Custom structured data models
Snowplow’s event model enables customers to use their existing data models and provide it as context to the events without transforming them. Other tools require you to shoehorn them into their flat dimension/metric model.
This is perhaps best illustrated with an example. Let’s say you want to record an action, say a page print, for a recipe on your content site.
You could record this as an event in Google Analytics, but the only context you can provide comes from the URL of the page itself, and potentially some custom dimensions. Custom dimensions can’t hold multiple values, so you can only model something like an ingredients list as a string, which is really uglyonce you get to reporting it. And truncates any string over 150 bytes. Nice!
Adobe Analytics has the concept of List Props and List Vars which would enable this, but you have to shoehorn the result into Adobe’s model. That means no nesting of things, like a list containing other items like recommended recipes.
By contrast, here’s how you would attach a self-describing context for recipes onto a Structured Event in a Snowplow implementation. Note how the model is readable and looks just like how you might store it in your CMS. These contexts can be attached to any kind of event you like, including things like pageviews and social shares.
window.snowplow('trackStructEvent', 'article','print','Chicken Chow Mein recipe', null, null,
[
{
schema: 'iglu:com.poplindata/recipe/jsonschema/1-0-1',
data: {
"title": "Chicken Chow Mein recipe",
"author": ["John Smith", "Mary Jones"],
"category": "recipes",
"cuisine": "Chinese",
"course": "main",
"ingredients": [ "chicken", "noodles", "vegetables" ],
"content_tags": [ "chinese recipes", "chicken", "quick meals"]
}
}
]
);
Personally Identifiable Information
Google take a particularly hard line on Personally Identifiable Information. You can’t send it, and if they find it they’ll delete all your data. Ouch.
Adobe are a bit more lenient but still cautious.
With Snowplow, the data is collected and stored in your own cloud infrastructure. It’s still best practice to be quite careful with PII, and particularly with sensitive information, but that boundary line is up to the customer. That means if PII ends up inside your Snowplow data, it can be controlled under your own policies, not those of a third-party organisation. See also: Data sovereignty.
SQL interface
Snowplow’s data shows up in a database with an SQL interface, running in your own cloud infrastructure. You can connect it with your own systems, create joins with other tables, point your existing BI teams at it and generally use it like it’s your own data, because it is.
You also get to keep the raw event data, unprocessed and ready to re-process or re-examine with tools like AWS Athena or Presto.
Adobe Analytics will provide the raw data for you to ingest into a database, and these days they even give you the header row for that table, but it’s up to you to make sense of the data, stitch together visits and the like.
Google Analytics data, with the premium product, can be pushed automatically to BigQuery, which is great. BigQuery is a very capable SQL data analysis tool, and support for it is becoming much more common than in the past, so Google probably gets a pass on this, althrough currently the latency for data to appear is 4 hours while Snowplow can deliver hourly or possibly faster. See also: Change goals with historical data.
Cross-domain tracking and user identification
Snowplow gives you access to all the identifiers: first-party domain cookie, third-party domain cookie, all the ingredients for fingerprinting, IDFA, IDFV, AdId and as many versions of login information as you have. You can then make up your own mind about how you stitch together user identities and sessions.
Adobe Analytics does third-party domain cookies with fallback to first-party domain cookies. Google Analytics has a feature with extremely narrow utility (basically, if you have a hosted checkout, it might be useful). See also IP addresses and user-agent strings
Data sovereignty
This is simple: you get to decide where your data lives, and the security and access policies. Adobe will charge you extra to store your data in some specific jurisdictions, but you have to trust them on the security and access policies. Google, well it’s in “the cloud”. See also: Personally Identifiable Information.
Change goals with historical data
You’ve got full control of your data, so you can recrunch the numbers any time. It might be a big, slow query, but you can do it. That means if you decide to include an or statement in your goal and funnel after you’ve created it, it’s totally doable.
The Google Analytics premium offering has a beta dynamic funnels report, but those don’t show up as conversions.
Accurate time spent metric
The way Google Analytics and Adobe Analytics (and Nielsen and lots of others) calculates the time spent metric is badly flawed, and have got even worse with modern traffic patterns. Snowplow is better. I’ve dealt with this in detail here.
Custom enrichments
Snowplow allows you to query external databases to automatically enrich your data as it passes through the pipeline. For example, you could look up the current exchange rate from your actual bank to record alongside a transactions’ local value.
Incoming Webhooks
Real-time integrations with external systems can be done through Webhooks, meaning your Snowplow pipeline can have immediate access to things like email opens, logistics state changes, incoming phone calls and the like.
High-cardinality
No (other) and (Low-Traffic) in Snowplow reports. See also: Unsampled data and performance.
IP addresses and user-agent strings
All the raw materials to be able to track users, do custom fingerprinting and analyse platforms are available. Other tools throw away these raw materials. We have customers looking at fraud patterns based on IP addresses. In the past we’ve looked for dark social traffic by finding the Facebook in-app browsers that aren’t sending through accurate Referrer headers. (Hint: look for “FBAN” in the user-agent). See also: Cross-domain tracking and user identification.