In my last post, we looked at why organisations often fail to engage data initiatives properly, leaving them stuck in first gear, or worse, spinning in neutral.
We covered the concept of the data supply chain being comprised of a number of links, all of which need to be intact and functioning correctly in order to achieve the desired outcome, which is enhanced value based on data-driven decision making.
Next in the data maturity series, I’d like to take a look at the topic of Data Governance and how it overlays onto this model.
What is Data Governance?
Whilst there are many definitions used across the industry, most agree on the core concepts. This article from Dataversity provides a nice summation:
“Data Governance is a collection of practices and processes which help to ensure the formal management of data assets within an organization. Data Governance often includes other concepts such as Data Stewardship, Data Quality, and others to help an enterprise gain better control over its data assets, including methods, technologies, and behaviours around the proper management of data. It also deals with security and privacy, integrity, usability, integration, compliance, availability, roles and responsibilities, and overall management of the internal and external data flows within an organization.”
These activities can be done explicitly as part of an overall data governance program, but are often done (or not done) by an organisations data practitioners as part of their day-to-day roles in order to achieve their goals and simplify their own tasks.
Within this broad topic of Data Governance, a useful dichotomy can be created by making the distinction between offense and defense, as presented in this HBR article, where the differentiation is made between distinct business objectives and the activities designed to address them.
“Data defense is about minimizing downside risk. Activities include ensuring compliance with regulations (such as rules governing data privacy and the integrity of financial reports), using analytics to detect and limit fraud, and building systems to prevent theft.
Data offense focuses on supporting business objectives such as increasing revenue, profitability, and customer satisfaction. It typically includes activities that generate customer insights (data analysis and modeling, for example) or integrate disparate customer and market data to support managerial decision making through, for instance, interactive dashboards.”
Therefore, defensive efforts focus on ensuring the security and integrity of data flowing through the data pipeline through standardisation and governance, whilst offensive activities tend to focus on how the data can be consumed by customer-focused business functions, and is often more real-time than is defensive work.
Whilst every company needs both offense and defense to succeed, with the two competing for finite resources, we’ll focus on offensive data governance for this post.
Offensive data governance
On the offensive side, the primary outcomes that data consumers require can be simplified down to Find, Understand and Trust. Therefore, the activities needed to support these outcomes from a people, process and technology perspective is how we define offensive data governance.
Finding the data
The task of either browsing all the data available to use, or finding specific data required for a particular usecase is often very difficult within an organisation, and the process can be time-consuming and labour intensive, requiring meetings and multiple interactions.
It is fairly obvious to say that data that cannot be found is not useful, and that any initiative would be derailed by this scenario. It is a heavy contributor to the fact that only a small fraction of data captured by a company is actually used.
The deceivingly difficult role of a Data Catalog is to allow consumers to quickly do data discovery by performing semantic searches over the full data asset. This means providing a useful UI, but also features to tag, document and annotate data so that this metadata can be used to search the catalog. There can also be automation features around profiling, inventorying and even creating semantic relationships between siloed data assets.
Additionally, a Data Dictionary provides a listing of the data and the definitions used for that element. A dictionary is often more technical in nature than the catalog, and usually provides the technical logic used in the definition. There would usually be features to define critical data elements and how elements link to each other, and the data structures used to build them.
Understand the data
Whilst it’s essential to find the data, it is equally important for consumers to be able to understand what the data represents in order to be able to use it. Often, the semantics that tie an organisations data definitions together is locked inside the minds of key employees. This represents a real risk, and many companies have seen critical corporate knowledge leave with the departure of these key employees.
A user being able to understand the business context and meaning of data representations is typically done through a Business Glossary. The question “what is a customer” having multiple definitions within a company depending on the department is the oft cited example that the glossary seeks to solve. Without these core definitions of customer (for example), any derived metrics, such as Customer Lifetime Value, also cannot be defined.
Whilst finding data often relates to an organisations data architecture, understanding the data usually relates to the information architecture, and is a topic we’ll explore in a later post.
Trusting the data
All too often, all the work done upstream by engineers and analysts, is undermined by a lack of trust in the data that is presented to decision makers. For this trust to be earned, it is often a case of ensuring Data Quality.
Data quality is generally defined as data having the properties of completeness, timeliness, consistency, integrity, accuracy and conformity. This involves a design-led process that covers each step in the data supply chain and means that the schemas used for collecting information are fit for purpose right at the outset, reflecting organisational needs and the uses to which it will be put.
In order to really trust the data, decision makers also need the ability trace data, and its transformations, through the various systems. This Data Lineage capability is another topic that we will tackle in a future post.
So what does maturity look like?
While there isn’t a prescriptive answer (yet) on exactly what each level of the maturity curve looks like, hopefully this post has helped to define some tools to self-assess where your organisation stands.
The answer to the question, “Can your users find, understand and trust your data?” is a good starting point for a program that covers offensive data governance. Then, by tying this back to each step of the pipeline, and viewing them through the lense of people, process and technology, you a framework with which this program can be built out.
Ultimately data governance means designing solutions that make sense not only from a technical point of view but also from a business and people perspective, which is something we emphasise at Poplin.