Paper over your mistakes with data models

Mistakes happen. In the data world, your ugly mistakes live on forever. It’s not just the embarrassment that’s a problem though. Gaps and obvious errors in historical data distract your stakeholders from more important matters. Explaining the anomalies and getting your data users to focus on things you don’t know about is tiring for everyone.

If you’re using a data model on event-level data such as Snowplow data, you can make a large number of data anomalies disappear for the majority of your users. This is where the use of your opinionated data model provides values over the unopinionated underlying raw data.

I messed up

While working on a new version of our content data model to include the content metadata attached to our blog posts, I realised I’d dropped a letter from the content tag Snowplow Analytics. I’d further compounded the error by copying the same mistake across my entire 5-part blog series on data models. Doh!

So now we see this hilariously terrible output every time we report by content tags. A constant reminder of the fact I should pay more attention. (We’ll leave the proliferation of poor content tagging for another day.)

The fix

Fortunately this was really easy to fix! Content metadata information is recorded using a custom schema that ends up in its own table. My view folds this information into the pageview rows. All I need to do is add a really simple replace to fix my typo.

Where the query ordinarily pulls the value straight out, I’ve instead replaced my typo. It’s also important to document the change so that future maintainers understand what bug I’m papering over!

    -- Fix for Simon's stupid content tag typo
    REPLACE(page_tags, 'Analytcs', 'Analytics') AS page_tags,

As simple as that, fixed and my dignity restored.

Popular posts like this

Accurate time spent: A killer feature of Snowplow Analytics

Modelling your Snowplow event data: Part 1

Modelling your Snowplow event data: Part 2 Pageviews