- Data collection
- Data Maturity
- Data Modelling
- Data Ownership
- Data Warehouses
- Open Source
- People and process
- Snowplow Insights
Last updated: 2nd December 2020 For updates, changes and errata please see the Changelog at the end of this post. What is this guide? - Designed as a fair feature comparison between the different products - An up to date...
Download the JDBC driver from AWS and place it in the DataGrip JDBC driver directory. On Linux this was ~/.DataGrip2018.1/config/jdbc-drivers/. File > Data Sources to open the Data Sources panel and click ‘+ > Driver’. Name it AWS Athena. Here’s the confusing bit: skip...
DataGrip is one of the most valuable tools for our engineers for exploring and querying a myriad of different database technologies. DataGrip doesn’t yet come bundled with a BigQuery driver so in this post we’ll explore how to setup a...
A few weeks ago I discoverd Monica Rugati’s fantastic Data Science Hierarchy of Needs. It’s a data science-centric riff on Maslow’s Hierarchy of Needs, a classic concept in pyschology. I’ve found myself using Rugati’s diagram and the concept in conversations with colleagues,...
Mike’s recent post about compressing Snowplow tables works great for atomic.events, with clients seeing compression down to 30% of the original size or so. But what about all your shredded tables? For now you have to manually convert the output from igluctl while we wait for our pull...
A new compression option in Redshift allows you to make big storage savings, up to two-thirds in our tests, over the standard Snowplow setup. This guide shows how it works and how to get it happening. In late 2016 Facebook...
In this tutorial we’ll use Amazon Lambda and Amazon Cloudwatch to set up monitoring for the number of bad rows that are inserted into Elasticsearch over a period of time. This allows us to set an alert for the threshold...