In the realm of business intelligence, data models play an important role in connecting data to decisions. This post, originally published in bipp.io, addresses the four pillars of effective data governance aided by data modelling.
When mathematician Clive Humby coined the phrase, “Data is the new oil”, he emphasized the value of raw data as an incredibly valuable asset. But, like oil, there’s an underlying issue with data. While oil has undoubtedly contributed to the significant achievements of the last 100 years, it’s also been the leading protagonist in deciding geopolitical relationships globally. A similar risk hinges around data governance within the enterprise.
Data governance refers to an overarching framework for regulating the utilization of data. Primarily, it achieves two things: establishing the processes that guard the data throughout its lifecycle and defining the policies for accessing data.
With the help of well-articulated roles and metrics, you can craft a data governance practice to align with your company’s overall business goals. But, before you think about the nuances of data governance, ask yourself: why do I need it?
Data governance tackles the problem of managing data at scale. However, each enterprise’s situation is different, and specific ways of managing data vary widely.
Let’s take a first-principles approach to unearth a possible sweet spot for data governance.
Baby Steps in Data Governance
Historically, the data revolution started with mountains of data. Each belonged to an individual department or a corporate function within a company. This is data at rest. However, inter-departmental data sharing, competitive analysis, or the cross-pollination of ideas led to the silos spilling over. Whatever the reason, the data silos became porous. This caused data to scatter across the organization, which forced a transition from data at rest to data in motion.
Enter Data Bureaucracy
As you can imagine, an unhindered flow of data across the enterprise is good for collaborative business decision-making. However, it can lead to uncontrolled data access without a governance structure, posing a grave threat to data security. To resolve this, companies need to spell out the rules and regulations. These are encapsulated within a process framework for managing the data lifecycle. This framework acts as a comprehensive set of guidelines for handling data generated and consumed within a company.
As a company grows or expands its reach geographically, business intelligence systems become more critical to help shape business decisions. At this stage, the attention shifts from data in motion to data in use. Now the data has a pivotal role to play. But it can only succeed if your data governance practice effectively aids business decisions. Validating this assumption entails auditing your systems that capture data and talking to the stakeholders across the upstream and downstream trail of data generation or consumption to understand any business-impacting anomalies.
If you perform this exercise within your company, the most likely missing link between data governance and business decisions is data integrity.
Data integrity depends on the quality of data. Lack of quality leads to unreliable or inconsistent information, which, in turn, impedes business success, as poor quality data can never provide predictable outcomes. The consequences can be critical, with poor data quality responsible for $15 million of average losses per year for organizations.
Which raises the question - how do you solve the data integrity problem?
Data Model to the Rescue
The solution lies in building a data layer between data governance and business processes. At bipp, we realized the importance of building this layer from the onset. Therefore our BI platform comes built-in with data modeling capabilities.
A data model is based on an artificial language that provides a unified interface to operate on the data in a structured way and is defined by a consistent set of rules. These structures are described in the context of various entities in an enterprise, such as employees, departments, assets, or vendors, and rules are used by multiple actors to interpret the structure.
bippLang is bipp’s data modeling language. It helps data teams use SQL to build a data dictionary based on key dimensions and metrics. This means businesses are making decisions based on the same trusted business logic.
With bippLang, you can represent departments, processes, and assets within your company in the form of custom data representations. Additionally, you can generate dynamic queries for operating on these entities for further processing. bippLang achieves all of this in a data source agnostic way.
Four Pillars Data Model Governance
Two perceived concerns of leveraging data modeling languages are the burden on existing BI process workflows and the need to learn new languages. These problems are theoretically compounded by frequent back and forth interactions between technical data analysts and non-technical business teams. They do not speak the same language as business analytics, and adopting a new data model causes resistance.
In response, bipp designed bippLang as a minimalistic data modeling language. It has a compact vocabulary, and uses an indentation-based key-value-pair syntax for representing hierarchical structure. This small footprint approach guarantees an easy learning curve for people with SQL skills. bipp’s unique visual data modeling approach also helps teams curate data sources without writing code, meeting the needs of all but the most advanced users.
However, imagine if you could blend the data model with the top-down organizational processes in your company. In that case, it forms an effective bridge between data governance and BI practices. And once the data model is established, this data dictionary is reused by all charts and dashboards, with only occasional edits. Thus, the benefits certainly outweigh the perceived concerns.
But how do you do this? Our approach is what we call the four pillars of data model governance. These will help you gauge the effectiveness of data models to connect data management and data definition.
1. Data Coherence
Coherence is achieved when data is captured and retrieved in the same form. Often you capture information about something that is optional and gets left out at a later stage. One example is an employee’s middle name. A data model ensures the data structure representing an employee and the optional fields are permanently stored and retrieved coherently across the company.
With bippLang, you can define your organizational entities as higher-level data abstractions, represented by complex structures, which are in turn built by joining data from various tables. bippLang has a lightweight set of keywords that makes it easy to auto-generate these data abstractions. Moreover, they can be stored and retrieved coherently, creating a single source of truth.
2. Data Consistency
Consistency is the ability to represent the data uniformly across all users and output formats. Consistency is the result of coherence and has a broader impact.
In bippLang, consistency is achieved by defining a data dictionary containing dynamic queries that all users can explore to create ad-hoc reports and visualizations, enabling a single path to enlightenment.
3. Data Compatibility
When it comes to data analysis, a company finds itself dealing with a slew of incompatibilities. There are multiple data formats generated from different tools and processes. The underlying tech stack that runs the BI platform also deals with numerous database vendors. Therefore data compatibility creates friction in the smooth execution of BI workflows.
The bippLang data model comes to the rescue by encapsulating all data queries in a SQL-compliant way. So, for example, the data is portable across all the producers and consumers. bippLang is designed to become the lingua franca for your company’s BI teams, and both technical data analysts and non-technical business users can adapt to it quickly.
4. Data Compliance
Compliance is a derivative of the above three pillars. When the data is stored and retrieved coherently, presented consistently, and compatible across all the BI workflows, compliance is inherently achieved. Additionally, you can extend the data model to meet the statutory and regulatory requirements.
With bippLang, you can easily define and extend the data model in the cloud and use Git-based version control, which provides a single point of contact for change management of your data models.
As the world generates data at a rate of 1.145 trillion MB per day, one thing is indisputable. We need to improve how this new ‘oil’ is refined to extract information and generate insights. The data model, combined with the four pillars of coherence, consistency, compatibility, and compliance, provides the necessary foundation for transforming raw data into trusted, secure, and well-governed content.
Want to see bipp’s data model in action? Sign up for a demo here.