Introduction: The Importance of Data Collection When we work with our clients on data and analytics; we strongly believe that the process of data collection is fundamental. Data quality should be the highest priority, and this can only be achieved through a robust and efficient workflow. It might seem obvious, but if you’re not collecting […]
Opportunities to actively engage and learn are vital to our work at Poplin Data so it was great to attend the recent PyCon AU 2019 with other members of the engineering team. The event brought together professionals, students and enthusiast developers to discuss the many joys and challenges of programming in Python, which is literally […]
As Equifax prepares to pay out as much as $US700 million in compensation for its spectacular 2017 security breach, and cosmetics retailer Sephora apologises for a leak of Asia Pacific customer data, now is a good time to consider the advantages of data ownership. Peace of mind around security isn’t the only advantage, there are […]
Mike Robins, CTO of Poplin Data, explains how SQL helps unlock the secrets of big (& small) data
If you want to work in analytics, you need to know SQL. You wouldn’t work as a carpenter and not know how to use a chisel…
A tutorial on setting up the BigQuery driver and datasource in Jetbrains DataGrip
A new compression option in Redshift allows you to make big storage savings, up to two-thirds in our tests, over the standard Snowplow setup. This guide shows how it works and how to get it happening.
Step by step instructions in Python to first decode and then deserialize bad rows data that comes out of Snowplow real-time.
Monitor your Snowplow Analytics bad rows using Amazon Lambda and Amazon Cloudwatch to track the number of bad rows turning up in Elasticsearch over time.