Clicky

Using the Data Warehouse matrix to plan and prioritize development

The Data Warehouse Matrix provides a framework for overall delivery that can be decomposed into rational, prioritized development sprints, while still conforming to an overall design.

This is part 4 of a 6-part series on Data Warehouse Bus Matrix.

Attempting to deliver the Data Warehouse in a single iteration, Waterfall style, would be very difficult. Today's agile development process demands that the tasks be broken down into smaller iterative phases that incrementally deliver value. In the case of a Data Warehouse, the business value is provided when a Fact is delivered with at least a subset of its associated Dimensions.

The Data Warehouse Matrix provides a framework for overall delivery that can be decomposed into rational development sprints which conform to an overall design.

Decisions about which Facts and Dimensions to prioritize are complex. Below are some of the steps to arrive at these decisions:

example data warehouse matrix
  • First identify high-value dimensions. High-value dimensions are those used in many Facts/Business Processes. In the example above, let's say the high-value dimensions include Staff and Calendar. By concentrating on high-value Dimensions, iterative delivery has a cumulative effect. If the Production fact was the subject of the first development iteration, it also contributes to delivering "Sales" and "Time Recording" facts. To deliver "Sales" in a subsequent iteration would only require the delivery of an additional three dimensions.
  • Second, identify Fact/Business Processes that utilize the high-value dimensions.
  • Third, rank these Facts by their business value vs development complexity. High-value, low complexity Facts that utilize high-value dimensions are your best candidates. In our example above, the Production Fact requires only a single other Dimension "Product". It is relatively less complex than the Sales Fact, and therefore an excellent candidate to deliver value to the business in the first iteration quickly.

Measuring Business Value and Complexity

It's essential to engage the business in determining each of the Fact's business value and priorities. For better or worse, the business usually derives business value from the perceived value of a narrow set of reports a Fact supports. Competing interests between different business functions are inevitable and challenging to say the least. The cumulative benefit of delivering high-value dimensions can help business users understand that delivering on others' priorities also moves the needle closer to their requirements.

Designers measure the complexity of Facts in several ways:

  • The number of dimensions associated with the Fact.
  • The number of source systems and source tables required to load the Fact.
  • The data format of the source system.
  • The data quality of the source system.
  • The accessibility of the source system.
  • The complexity of the transformation logic.
  • The number of measures in the Fact.
  • The type of Fact (accumulating/transaction/periodic).

Designers similarly measure the complexity of Dimensions with a few exceptions.

  • The number of attributes in the Dimension.
  • The number of Type 2 attributes on the Dimension.

To help with this prioritization, I use an enhancement to the Data Warehouse Matrix.

In the image below, I have added a measure of complexity to each of the Dimensions and Facts.

This simple enhancement helps in the prioritization process. It also feeds into the estimation process discussed in later sections.

Shown below is a 4-level complexity scale that usually suffices:

There is a complex interplay between business value, complexity, and cumulative delivery. Designers need to consider all these factors when planning iterations. Due to the cumulative nature of the Data Warehouse delivery, the complexity of any given fact may reduce. For example, in a given iteration, you may have staged the required source data or delivered ¾ of the related Dimensions, making some of the remaining Facts less complex. Also, the business value of any Fact may increase relative to the remaining facts. Designers can take advantage of the cumulative delivery and plan a series of iterations to provide the most value quickly. Essentially if you deliver "Fact A", delivering "Fact B" becomes more straightforward.

One of the phenomena experienced when delivering a Data Warehouse is that early Facts and Dimensions appear to take significant effort and are more costly without providing a lot of value. Development tends to accelerate as more of the Data Warehouse is built. It can be a struggle getting over this low-value high-cost "hump", with the business becoming despondent and taking shortcuts. Understanding and communicating progress via the Data Warehouse Matrix can help.

If you like this article, please subscribe for regular Ideas, Frameworks and Playbooks for Data Warehouse Developers.

You've successfully subscribed to SchemaSight
Great! Next, complete checkout to get full access to all premium content.
Error! Could not sign up. invalid link.
Welcome back! You've successfully signed in.
Error! Could not sign in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.