The largest and most complex aspect of Business Intelligence (BI) is the data warehouse. In this context, the data warehouse is the repository of data generally fed from many sources to keep historical perspectives of an entity’s data. It is a behemoth that is generally expensive, slow to build, complicated in structure and difficult to maintain. How necessary is it? Does a company need the actual, physical data warehouse to have a successful and sustainable business intelligence (BI) program?
There are many design methodologies that take these issues into consideration. There are advantages and disadvantages to both traditional (and non-traditional) methodologies which I do not cover in this post. My goal is to bring up points of view of why and when a data warehouse may or may not be used. What I would like to cover is:
- The Corporate Information Factory (CIF), based on the Inmon approach
- The Kimball Style of data warehousing
- BI using no data warehouse at all
Corporate Information Factory
The Corporate Information Factory methodology, in a nutshell, says there is no way of getting around this inevitable fact of the need for a data warehouse. In order to have a successful and sustainable BI program, a data warehouse is needed. Not only is it needed, it needs to be completely designed, built and populated prior to any further analysis or BI work can be done. This is due to the nature of how business concepts are intertwined within each other necessitating the big picture view. This style also views the architecture process more from the IT/data perspective compared to the business need point of view.
The Kimball methodology of data warehouse design is not as structured and regimented as the Corporate Information Factory. The Kimball data warehouse is the sum of its parts; meaning one area of the business could be designed, developed and deployed providing BI insight while other aspects of the business have not been discussed. This concept will speed the development of the data warehouse compared to the CIF, but the underlying data warehouse can become much more complex as more and more is added to it along with the possibility of rework. This style views the architecture process from the business needs point of view compared to the IT/data perspective.
No Data Warehouse?
What about not using a data warehouse? In the new age of Data as a Service (DaaS), Master Data Management along with Service Oriented Architecture (SOA), why re-store data from disparate systems? Why not store the metadata of where the data is found and attach the business logic to the SOA call? This can be a very powerful way to gain insight into data. The idea that the development of a data warehouse can be done without the data warehouse. There are already tools that will do this. One of them is Qlikview from Qliktech. The basic premise behind this tool is to allow the user to develop the Transform and Load aspects of ETL (Extract Transform and Load) in memory to delivery very quick analytics in a solid visual manner. This tool is not a methodology, but SOA could be used in a larger context with the same principles. This style views the architecture process as something the business could do, but IT does not have to do.
The idea that a data warehouse is necessary for a successful BI implementation is not necessarily true. A data warehouse is not necessary to have analytics or provide a picture of the data you have. I believe it is very questionable to say this process is sustainable to leverage every benefit for BI. The very important aspect of BI that cannot be overcome by SOA, or in-memory analytic tools like Qlikview, is the entire reason the data warehouse first came about.
The decision for building or not building a datawarehouse is all about the history of the data. Not the history that is required by law to be kept like financial data or what in many cases is considered ‘facts’ in the Kimball style. If this were the only history needed, a data warehouse would be less necessary. The type of history that is important is the history that cannot be reproduced within the source systems. This is the history of changes made that are not kept by the source system. In many cases a customer’s address may not be historically important in a transactional/source system so only the most current record is kept. If that history is not kept somewhere (like a data warehouse), analytics of historical purchases of products will not show a true picture of what actually happened. It will only show the picture of what is in the source system at the current point in time. This situation is the quinticential lynchpinn for why a data warehouse should be necessary. The ability to track and keep history that is not kept in the source system is something SOA, or in-memory BI is not capable of reproducing.
If the desired BI capability for the business is operational in nature, a data warehouse will not offer any significant benefit over SOA. This is a short sighted tactical means of looking at data and cannot provide strategic insight, but it certainly could be the best way to answer that need for data given the circumstances. This would not be the end-all-be-all for BI, but it certainly can provide means to start a program.
So does this completely answer the question “Is a data warehouse necessary for BI?” The data warehouse is necessary for a complete and sustainable BI program, but it does not have to be the start of the program. So… of course the answer to that is still…. “It depends…”
CJ · October 14, 2018 at 1:05 pm
This is the theory-based answer, that presumes a technological green field. The reason “building a data warehouse” is a common solution is that for most clients DaaS is not a realistic approach given time, cost, and existing technical constraints.
If business users want information that crosses domains, has integrity, and provides meaningful answers based on all available data, a data warehouse is the least worst option given that data is housed in many existing systems that were not sized for DaaS, not designed or build to integrate seamlessly with other cross-domain systems, and the business can’t wait for those constraints to be addressed.