Mike Robins, CTO of Poplin Data, explains how SQL helps unlock the secrets of big (& small) data.
There are plenty of ways to wrangle data though if you want to do it fluently you need to know the lingua franca of most databases: SQL.
It’s by far the most common language for handling data whether you want to analyse, process or transform it in a relational database management system.
4 reasons to learn SQL
- It’s easy to start learning
- You’ll gain a solid understanding of how database tools work
- You’ll develop a broad skill set that’s useful for almost any job involving data management
- You’ll learn good data principles for processing and shaping data because SQL makes you think in a declarative manner.
What is declarative programming?
Declarative programming languages such as SQL have an approach to programming that focuses on what you want to achieve, rather than how to achieve it.
You don’t need to set out the logical steps to get the result.
When you declare what you want the database to do or tell you the tool then chooses the best procedure or algorithm to give it to you.
Example of how SQL handles a declarative process:
- You specify you want the sum of the contents of a certain column
- You have the option to define your query and how you want the result returned
- You don’t have to set a process for how to fetch all the data in the column and then calculate the sum, instead you leave it to the SQL engine to worry about all the things it needs to do under the hood.
Declarative programming forces you to think about more efficient ways to get results.
Say you want to cook a specific soup recipe – minestrone – you wouldn’t just run a query for ‘soup’ in a database of recipes because you’d get a lot of irrelevant results.
To get an accurate result quickly you’ll need to define parameters so you only get minestrone soup recipes or only 5-minute minestrone soup recipes.
Building your skills in SQL gives you a powerful way of thinking about outcomes instead of processes: “Okay what do I really want?”
SQL is popular for a reason
SQL does its job really well and for the most part it’s easy to use. That’s why it’s been an industry standard since the 1970s.
You can get started in SQL without having any programming knowledge, whatsoever.
SQL is easy to start learning
You can grasp the basics of SQL within a day or so.
In reality, no one ever stops learning SQL because you’re constantly trying to improve your results.
The learning curve isn’t steep though if you want to get into the more complex or nuanced stuff you’ll want to find better ways to write your queries.
You will get better at asking better questions, I promise.
SQL is a gentle introduction to databases
If you haven’t studied computer science then SQL is an easy way to learn common database concepts and tools that frankly haven’t changed dramatically for decades.
Learning those concepts will help you write better and faster code to get more things done accurately and efficiently.
You’ll start learning things like Indexing, Compression, and row-based databases versus Columnar databases, which are all useful to know.
When you’re familiar with how SQL works you’ll have learned concepts such as processing, filtering and transforming data that will help you use other data analysis tools, like Hive and Presto (both have an SQL flavour) and actual programming languages like Python and R.
Understanding SQL helps you understand the web better
Most websites are interfaces sitting in front of a database, whether it’s something complex like an ecommerce store or a service directory.
Likewise, a lot of the business tools people use to manage data are interfaces sitting on top of databases. Marc Benioff, CEO of Salesforce, explains this important marriage of interface and databases in his book “Behind the cloud”:
“At the time, companies were paying hundreds of thousands to buy and millions to install applications that were costly and frustrating to maintain. We wanted to take advantage of a new platform – the Internet – to deliver business applications cheaply through a web site that was as easy to use as Amazon.com.”
SQL helps you share data better
Learning SQL will give you a good toolkit for selecting, transforming and creating data.
Even better, it opens up ways to create and share data more efficiently than something like Excel spreadsheets or Google Sheets, which many organisations still use.
So, if you want to make data more shareable – and more useful – learn SQL so you can:
- Create standardized tables and data sources that can be updated across the business
- Create dashboards that work off a single source of truth, which everyone has access to
- Define and track KPIs and other reporting tools for the business in a database, rather than having them in spreadsheets that are emailed around the company
You’ll learn how to retrieve and analyse data quickly
Learning how data is indexed, constructed and modelled can help you do a lot of data modelling, not just within a database, but also thinking about data sources that are offline.
SQL gives you a bunch of ways for modelling data that are critical to retrieving this data quickly, as opposed to just retrieving this data.
I recommend setting up a kind of sandbox environment where you can select things and break or modify them without any real world consequences.
Tap into a wealth of knowledge
There are plenty of quite good courses out there for learning the basics of SQL, though equally useful, there’s a wealth of good documentation and online communities that will help you build your skills.
Because SQL and its variants are used so widely, the documentation has to be good and understandable. They’re reasonably extensive and often the best place to get started.
SQL skills are in high demand
Fluency in the lingua franca of most databases will open doors in your career, including:
- Technical roles managing data
- Developing CRM or web applications
- Data analysis and data science.
If you’re working as an analyst, you’ll quite often get asked very high-level questions, like:
- How is our new marketing campaign performing compared to our previous campaign
- Which option performs best for our customers: A or B?
Your SQL skills will come into play translating those high-level natural language questions into queries to try to find the right answers.
Because you’ll need to confront the data – and all its problems – you’ll also be in a position to help improve how the business creates and handles data. This knowledge can be powerful.
How SQL helps you dig into data anomalies and fix the issue
Being close to the data in a business can be very useful because you can explore how a decision in one area impacts another.
SQL skills are useful for tracing an anomaly back to something like an incorrect input or collection point, or a task left undone.
Here’s an example…
You’re asked to report on web traffic over the last seven days…
…you discover that it mostly looks good, but on Tuesday, we had twenty percent lower traffic than usual.
So the next question is: “Well, why did we have twenty percent lower traffic than usual?”
When you dig into the traffic by querying on Tuesday and separating out a number of different dimensions such as inbound you find it was reduced paid search.
You think: “Why was our paid search traffic down on Tuesday?”
And you query further into paid search and discover that the business had a whole bunch of campaigns running.
The campaign that normally brings the most amount of traffic ran out of budget for that day – meaning the business was no longer getting paid search traffic for that campaign.
So you dig deeper and ask a lot more questions, like:
Why did that campaign run out of spend?
Is it because we were getting so much traffic?
Was it fraudulent traffic?
What was going on that caused us to run out of paid search spend on that day?
Do we need to increase the spend?
Or do we need to look at how we’re portioning the budget for that?
If you want to find answers to those introspective questions learn them with SQL.
Resources for learning SQL
- SQLBolt: An interactive course so that you learn while doing, in the browser
- Select Star SQL: Another interactive course.
- Mode: These folks make a great platform for sharing analysis and have a comprehensive tiered SQL tutorial
- Code Academy: A more structured approach to learning SQL
- Khan Academy: Videos that walk you through SQL from the always amazing Khan.