In this blog we dive into one of the better known and most used data warehouse architectures in the world, also known as the Kimball architecture (or Kimball Lifecycle). Introduced in the late 1990s, but non-the-less still very relevant for today. In the previous blog (read: What is ETL (Extract, Transform, Load)?) the ETL process was described in short which plays a central role in data warehouse design.
The Kimball architecture follows a button-up approach. Which starts with understanding and documenting the most critical business processes, needs and business questions. This includes documenting all available and to be used data sources for an organization. Then the ETL process is designed (more about that in What is ETL (Extract, Transform, Load)?) and put in a denormalized data model (a data model that is designed to reduce the number of relationships and optimize query performance) called the dimensional model.
Kimball and his colleagues used dimensional model as core concept for their architecture, even though it existed for a long time before. It’s a data modeling technique that structures data in a way that makes it intuitive and efficient for business users to analyze and gain insights. There are two main types of tables used, which are fact tables and dimensional tables.
Fact tables are data tables (like in excel) which store numerical data that represents business events or transactions. These are so called facts of a business as they contain the measurable data of a business and its operations such as sales revenue, profit, amount of products sold etc. Each row in the fact table corresponds to one event or transaction. It is advised to store this information in one centrally located place where every business user can access it. Fact tables are tend to have many rows but not an excessive amount of columns. Every fact table as at least two foreign keys (FK, columns that connect the one table to another) that connect to dimension tables primary keys.
Dimension tables as Kimball puts it: “are integral companions to a fact table”. These provide the context and descriptive attributes to the data stored in fact tables, they describe the “who, what, where, when, how and why”. Dimensional tables tend to have fewer rows then fact tables, but many large text columns. The primary use of dimension tables is filtering, grouping and slicing the data during analysis. Here it is of importance that you try to use as little as possible cryptic abbreviations. The quality of the data warehouse design is directly tied to the quality and proportion of the dimension tables, so it is important to get these right.
There are two main forms of database schema (this defines how data is organized within a relational database). Star schema, where the fact table is at the center and dimensional tables function as the points of a star. Star schemas are simple to understand, optimize query performance, and are well-suited for reporting and analysis. The other is a snowflake schema, which can be seen as a variation of the star schema but every dimension is divided into sub-dimensions. While snowflake schemas can help save storage space and maintain data integrity, they can also introduce more complex joins and potentially impact query performance.
Benefits of using dimensional model are among the following: it makes it more user friendly for the business users as it is designed to align with their way of thinking, it helps to optimize query performance by reducing the number of data joins (the amount of data sets which have at least one column the same). It offers flexibility as it is easier to adjust to changing business needs, it is a rather simple architecture allowing for quick development, implementation and maintenance.
From the dimensional model the data flows into the presentation area, which hosts the business intelligence (BI) applications such as analytic apps, data mining and models, reports, dashboards. The main advantages of the Kimball architecture are:
Simplicity and speed, it is easy to design and set up. It is understandable, for both the non-technical as well as technical personnel. As it is less technical, there is also less need of engineers with specialist skills to set up and maintain the data warehouse. Lastly, is relevant, as it is a button-up approach it makes all data integrations relevant to the business needs.
This was a short overview of the Kimball architecture, if you want to know more, it is highly suggested to read his book: “The Data Warehouse Toolkit”. In which he from the basics describes the importance of a good data warehouse structure and all the need to know concepts.
