Establishing data lineage practices with Google Tag Manager analytics events

In this post we’ll look at how to capture meta data from Google Tag Manager (GTM) from the tags that fire analytics requests. This post focuses on Snowplow specifically, however the same concepts can be easily adapted to Google Analytics or your other tool of choice through custom fields / dimensions / properties.

Although it’s not required to capture this information with each request, I find it’s often useful in tracking down data quality or logic errors in tags. Often it’s easy to focus on what data is being collected whilst not being cognisant of how that data is being collected. This is debatably a more important factor in data quality. If your method for data collection has issues, it doesn’t matter how accurate or comprehensive the data collected is as it may invalidate analysis.

I find that in projects it helps with establishing data lineage – that is documenting what has produced the data. In downstream applications it becomes possible to detect regressions and filter data in container or tag versions that may have sent invalid or incorrect data.

In the following examples we’ll look at how to send the following data points:

  • GTM Container id – the unique ID of the GTM container
  • Container version – a numeric value that increments for each new version of a GTM container
  • Tag id – a unique (to the GTM container) number that identifies a tag within the GTM container / workspace. This id does not change after a tag has been created.
  • Tag fingerprint – A string that changes each tag time a tag is modified

Capturing container information

Thankfully – GTM makes it easy to capture this information via built in variables. In Snowplow we’ll use a global context within the Javascript tracker. In doing so the global context will be attached to every event we fire, this makes things easy as we need to only write the code once.

Navigate to Variables within your GTM container and let’s start by creating enabling two new Built-In Variables:

If you already see Container ID and Container Version in this list you can skip ahead – otherwise select Configure and check the Container ID and Container Version items in the list of variables.

We can access these variables by reference using the familiar syntax – {{Container ID}} and {{Container Version}}.

We don’t need to configure these variables in any way – so let’s go ahead and setup an example page view tag that demonstrates how to capture these properties.

Navigate to Tags and create a new Custom HTML Tag.

  1. Tags > New > Tag Configuration > Custom HTML (if you are using a different product, e.g., GA you can select one of these built in tags).
  2. Make sure you are initialising the Snowplow tracker here if you haven’t already
  3. Create a variable that stores the Container ID and Version as a context
let containerContext = {
    schema: '',
    data: {
        id: "{{Container ID}}",
        version: {{Container Version}}
var globalContexts = [containerContext];
window.snowplow('addGlobalContexts', [globalContexts]);

First we construct a variable to store a self-describing event – in this case containerContext which uses the schema and uses the built in variables we set up earlier in the data object. We then create a globalContexts variable which stores an array of all our global contexts. In this instance we’re only attaching our containerContext but this would be a good place to attach any other information you might want to send attached to every event – such as information about the user. Finally we use the addGlobalContexts method of the Snowplow tracker to recognise these global contexts. Now when ever we fire any event on this page it’ll have the container ID and container version attached!

Selectively applied to certain events (e.g., page views, add to cart events only) You can find out more about using global contexts here. In our instance we’re attaching this to every subsequent event we fire on the page so we won’t be adding any conditional checks or filters.

Capturing tag information

Tag information is decidedly more difficult to capture. GTM does not expose this information internally – in built-in variables or otherwise, however it is accessible via the Google Tag Manger API.


Credits: Thanks to Simo Ahava for his help on brainstorming some ideas on how to make this possible.

Published by Mike Robins

CTO at Poplin Data

Popular posts like this

Mike Robins

Why you need to learn SQL

v0.2.14 of the Snowplow Inspector Released

Poplin Data Retail Week