Data Vault and Dimensional Modelling, a happy marriage! (Data Vault Series)
We need ambassadors!
I was at this one day course recently and spoke to a professional working in the business intelligence domain.
He said that he had just followed a two week course for datawarehouse architects.
I asked him, what did they tell about Data Vault?
His answer was shocking: “Not so much ..”
If recently trained datawarehouse architects do not get trained to use Data Vault in a datawarehouse architecture, there definitely is a lot of missionary work to do.
Because the Data Vault can add so much value to your datawarehouse solution.
And more important, it is not a choice, a Kimball dimensional model OR a Linstedt Data Vault. Use both and
combine each strengths!
This is a strong call for everybody who reads this to share this post with at least one person you know working in the business intelligence domain using star schemas (a.k.a. dimensional modelling) directly fed from source systems.
What are the strengths of a Dimensional Model (a.k.a. Star Schema)?
- Because the dimensional model is already around for decades, a lot of front end tools have an excellent support for it.
- A dimensional model is designed for reporting, slicing and dicing, for getting data out. That is what it is good at.
This becomes clear by the following examples:
- Query performance is usually better than of other models, like Data Vault or Relational Datawarehouse, especially for aggregations.
- A star model is an excellent source for Cube databases, like SSAS, especially when using a multidimensional model.
- Date selections and comparison of periods (e.g. Q1 this year versus Q1 Last Year) is easier than in Data Vault or Relational models.
- It enables self service BI. Give business (power) users (direct) access to a dimensional model, and there is a good chance that they can get data out in the format they want.
What are the strengths of a Data Vault?
- A Data Vault is meant to capture all the data all the time. Therefore it stores all data from the source systems, captures the changes in it, and keeps a full history of all changes (with a sidemark, that multiple changes between two load cycles, will be seen as one).
- A Data Vault captures all data exactly as it is stored in the source systems. “The good, the bad and the ugly”with no business rules applied to it (except, in some cases hard business rules like data types). This makes all data auditable.
- Data Vault 2.0 supports cross platform integration with other systems (e.g. Hadoop) and decentralized datawarehouses, for instance in different global regions.
- With Data Vault 2.0 working agile is easy, in sprints, building the system in increments.
- Data Vault has a very good support for Data Integration. Satellites, hanging off the Hub, containing data from different source systems.
So? What are the strengths of a datawarehouse architecture using both?
- You can change your mind! If your business rules change (and it happens!), just reload your dimensional model from the Data Vault with the new rules applied.
- You have one version of the facts . If you have multiple dimensional models, you have not.
- Data history is not like a destroyed monument. With a Data Vault, you capture all data from the source systems, and all changes applied to it. Source systems usually reflect only the current state of the data, so without a change capturing central storage like Data Vault, your history is gone forever!
- Your dimensional model does not have to serve two purposes: data storage and presentation of information. These two tasks can be in conflict, for instance there is currently no report need for all data that you could grap from source systems, so you leave it out of your star schema. But if the question comes tomorrow, you have no data, no history!
- You have full auditibility. Because the (raw) Data Vault stores the data from source systems unmodified, every measure, every dimensional attribute, can be traced back to the source it came from and when. This makes it able to refute claims that the data is not correct. Maybe now it will become clear that the old datawarehouse- or reporting system has lied for years!
How can you use both?
The answer is simple: by using a multi-tier architecture.
Data from source systems is first extracted to a Staging area, before it is moved into the Enterprise Data Warehouse using a Data Vault Model (with or without some optional components).
From there it will be distributed to dimensional models (star schemas) and cubes, and whilst the data is on its way, business rules can be applied.
Multi-tier architecture using both Data Vault and Dimensional Modelling techniques.
Conclusion / Wrap up
I have written this post to create more awareness about using both Data Vault and Dimensional Modelling or Star Schemas in a data warehouse architecture.
I have listed strengths and benefits of Data Vault, Dimensional Modelling and of an architecture using both.
(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.