Author Profile

Mike Robins

CTO at Poplin Data

More from Mike Robins:

The importance of owning your own data

As Equifax prepares to pay out as much as $US700 million in compensation for its spectacular 2017 security breach, and cosmetics retailer Sephora apologises for a leak of Asia Pacific customer data, now is a good time to consider the...

Mike Robins

Why you need to learn SQL

Mike Robins, CTO of Poplin Data, explains how SQL helps unlock the secrets of big (& small) data. If you want to work in analytics, you need to know SQL. You wouldn’t work as a carpenter and not know how...

JetBrains DataGrip + Google BigQuery

Setting up a BigQuery datasource in Jetbrains DataGrip

DataGrip is one of the most valuable tools for our engineers for exploring and querying a myriad of different database technologies. DataGrip doesn’t yet come bundled with a BigQuery driver so in this post we’ll explore how to setup a...

Make big data small again with Redshift ZSTD compression

A new compression option in Redshift allows you to make big storage savings, up to two-thirds in our tests, over the standard Snowplow setup. This guide shows how it works and how to get it happening. In late 2016 Facebook...

Decoding Snowplow real-time bad rows (Thrift)

In this tutorial we’ll look at decoding the bad rows data that comes out of Snowplow real time. In the real time pipeline bad rows that are inserted into Elasticsearch (and S3) are stored as base64’d binary serialized Thrift records....

Monitoring Snowplow bad rows using Lambda and Cloudwatch

In this tutorial we’ll use Amazon Lambda and Amazon Cloudwatch to set up monitoring for the number of bad rows that are inserted into Elasticsearch over a period of time. This allows us to set an alert for the threshold...

Newer Entries