You need to own your data: Open Source vs proprietary data platforms

Key points:

  • Open Source platforms give you control over your data
  • Proprietary vendors can change direction and remove features
  • Opening up the black boxes means you can understand what’s going on
  • Open Source doesn’t mean you have to run it yourself

The businesses really nailing data have built teams, tools and processes that mine their data to find opportunities for improvement, find new markets and gain insights into their own operations. Over time they use these insights to improve the customer experience, enter new markets and optimise operations. They build competitive advantage.

When data is the source of your competitive advantage, you need to own it, control it, know it inside out and be able to evolve it.

What is Open Source?

From a fringe movement in the 1980s and 90s, Free/Libre Open Source Software (FLOSS) now forms the basis for much of the technical infrastructure we rely on including much of what makes up the cloud services of Amazon and Google. As much a philosophical approach as a legal licensing approach, it gives some key benefits to companies that choose to use it.

The primary benefits of Open Source are all around control. The user is in control of what is happening, where it runs and all the choices involved in configuration. There’s nothing to stop you using an external vendor to help you run Open Source software, a choice made by most of our customers with our Snowplow Insights managed service offering they they always keep the option to take it in-house and use your own engineering resources to run it.

Control of features

More control comes from the fact that the software running your critical infrastructure is open. You can read—and modify—the source code that drives it to customise it to your exact requirements. If the upstream project decides to take the software in a direction with which you disagree, you have the option of taking over your own version—called a fork—because you have equal access to the source code.

Control becomes particularly relevant when you’re building data products and platforms into the core infrastructure of your business. If the platform you build on changes direction, removing a feature you need for example, you’re essentially out of luck. Open Source platforms at minimum give you the option of continuing with the earlier functionality, at the cost of having to maintain it yourself.

Ownership and control of data

Open bookWhen you use a proprietary platform, data resides wherever the vendor decides. You may have some influence over the physical location as a customer, perhaps defining the continent, but ultimately the vendor decides on the policies and processes for where and how it is stored. More importantly you have no control over access within the vendor and rely on them acting ethically with your data.

To protect the vendor from privacy risks they typically require that you to never send personally identifiable information, information allowing you to tie behaviour and other attributes to an individual, to their platforms. While this manages their risks it means some forms of analysis are difficult or even impossible to perform.

Snowplow and other open source platforms are typically self-hosted inside your own cloud accounts. This gives you, the customer, total control over what is stored, where it is stored, how long it is retained and who has what level of access.

Opening the black box

Go ahead play me

Key features and calculations in proprietary data platforms are often done for you with no ability to look under the hood and understand how the calculation works. Open Source platforms have the advantage that you can open up the black box, peer inside and learn from what you find. If you come up with a better way, you can even change it!

Open Source managed services: best of both worlds

Running a complex Open Source platform can be quite time consuming, especially with platforms that are evolving rapidly. Snowplow has about 20 updates every year to various components of the stack. That creates a lot of operations work to stay on top of while keeping production services scalable, reliable and efficient.

Companies that specialise in managing Open Source platforms can help you run it without having to spend your own time on it. You benefit from efficiencies of scale gained by managing multiple instances of the same software and can spend your valuable time on data products and outputs.

Poplin Data has a managed service offering of the Snowplow stack called Snowplow Insights, allowing our customers to benefit from our expertise and spend their time on great data.

Control, ownership and openness

Open Source data platforms have a lot of advantages in control, ownership and openness. There can be some complexity which can be managed by working with the right partners. Get in touch with Poplin Data today to talk about data ownership and control.

Popular posts like this

Snowplow Inspector: a debug tool for Snowplow beacons

Snowplow Inspector debug tool now validates schemas

Snowplow Inspector extension updated to colour code and allow filtering