Currently viewing the category: "BI Best Practices"

The Background

In the last installment of Day-to-Day Data, we discussed that we are all swimming through oceans of data and information every day. While every decision and action can seem virtually meaningless, they can also be extremely important. The trick is to have the ability to find the data that is meaningful and use it to your advantage. Keeping this idea in mind, I wanted to attempt to optimize one of my favorite periods of the day: The Morning Commute.

The Research

Without a doubt, this frustration is something that a great deal of people share. While many, including myself, enjoy what we do for a living, the morning commute can do wonders to counteract that passion. It is because of this that I wanted to streamline this process. Think of it as an operations management ploy, how many bottlenecks can I avoid in my routine? This process may seem pretty straightforward:

  1. Open Web Browser
  2. Navigate to maps.google.com
  3. Type in start and end location
  4. Select fastest route
  5. Voila!

Fortunately, this is not the only step that I took to analyze this issue. It is essential to identify key data points that can be used to analyze the experience. Here are some of the key data points that I ultimately used.

  1. Route
  2. Start Time
  3. End Time
  4. Date

In order to make this data meaningful it’s important to identify why these data points are important. To most, it would seem that route and the total time spent driving are the only pieces of information that matter. However, by inserting Start Time, I will have greater visibility at the depth of my data. With the start time, I can see how the total drive time fluctuates if starting at a different period. If I wanted to dive deeper, I could compare the average speeds with the drive times that would show if I was hitting any traffic.

Below is a basic dashboard created using Tableau. Here I took a sample and tracked this data each work day for 2 work weeks. The first week of the experiment, I followed Route 1, changing my start time each day, and then repeated the same for Route 2 in Week 2.

The Morning Commute Graph

From this dashboard, with this very small sample of data I am able to see a variety of facts concerning my commute. Immediately I have a better view of:

  • How Route 1 compares to Route 2
  • Average time of travel
  • Shortest times of travel
  • How Start Time impacted total drive time.

The Analysis

It is important to note that this sample size is indeed very small. Given weeks, months, or years of information will produce more accurate analysis overall.  In the end, the data accurately represented what I had thought to be true, with a few added surprises.

Looking at my data I can visually come to a variety of conclusions. First, Route 1 is generally faster than Route 2. Only at a 7:15 a.m. “Start Time” was Route 2 faster than Route 1. The average travel time of Route 1 was also just under 3 minutes faster than that of 2 as well.

My data also shows average times for each “Start Time”. The data clearly shows that beginning my morning journey at 7:45 a.m. will in turn present me with the longest drive time. It also tells me that the most efficient time for departure is 7:30 a.m. for both routes. Given that Route 1 is faster, and that 7:30 a.m. is the optimal departure time, strictly for efficiency it is best for me to take Route 1 at 7:30 a.m.

That conclusion, as I mentioned before, fell in line with what I had predicted. However, what I did not expect to see was the higher commute time for the 7:15 a.m. departure time. For both Route 1 and Route 2, commute times were higher. One assumption I made was that an earlier departure time could potentially lead to less traffic on the highways, as displayed by the 7:30 a.m. time. What I did not account for is the local K-12 students traveling to school in the morning. With this information I was able to find a “sweet spot” in my morning commute, where I can attempt to avoid any delays.

The Conclusion

This extremely small data sample provided me with enough information to improve my efficiency each day. With this information I can now save anywhere from 7 to 22 minutes of travel time to work each day. Meaning at minimum, I would save over 30 hours a year in travel time.

The data is always there. Data can be used to learn new things about your business, or even your personal life and decisions. The trick as always is the ability to identify what is important to you or your business and above all… How to make your data meaningful.

Tagged with:
 

As the evening winds down every night, I like to look back and reflect on the day’s events. Being an analytically minded person, I tend to look back at decisions that were made and how they had an effect on events that transpired throughout the day. As I delved into this nightly process, I started looking at the day in a more granular manner. Upon doing so, I came to the realization that we are all swimming in “Day to Day Data”.

Did I run out of coffee this morning? What time did I get in the car? Are these are completely unrelated items of information? If you answered no, then I would be inclined to say you are wrong. All these granular pieces of information are important to someone or some entity. Tell McDonalds the answer to these two questions, and they have the perfect time to run an ad on the radio for their “premium roast” coffee.

Take time to think about your day to day data. Think of all the pieces of information that drive your day; you can use this information to accomplish a great number of things. Identify inefficiencies that are a constant annoyance.

The day to day data that we create is astounding. As we all know now, the data is out there. The data is important but what is more important is identifying what information is vital and to whom. Data can be helpful, but it can also be stressful. Data is the figurative key that can open the door to future success but what if someone handed you the keys to the entire building instead of the room you want. How do you find the right key? It is important to have help to identify which (if any) of those keys is useful. That way you can make your Day to Day Data Meaningful.

I recently purchased a new car and the process clearly took longer than it needed to.  I am not a person that is typically prone to long-drawn out decisions, so I began wondering why this one decision took so much longer than previous car purchasing decisions.

I reflected upon one of my earlier car purchases (back in 1997) as a comparison.  When I purchased this car, the process was fairly straight forward.  I drove to a few local dealerships and browsed the showroom until I saw something in my price range that caught my eye.  Over the course of several weeks, I test drove several cars and talked with family/friends/co-workers to see if anyone had any input on the reliability of the vehicles.  Quickly, I narrowed it down to one vehicle. I didn’t lose any sleep thinking about the decision.  I just went with my “gut feel” and stayed within my budget.   I was very happy with my purchase and kept the car for 140,000 miles.

With the most recent car purchase, the process seemed to take on a life of its own.  I had so much information available to me that I spent months researching cars using every automobile search engine and consumer report that I could find.  I often used my SmartPhone to identify nearby car dealerships just so I could do “drive-bys” and look at cars.  Each car I researched had so much positive and negative input available that it became overwhelming as to what was the “right choice”.  I have to admit that I had so much information coming at me that I started dreaming of cars at night and often awakened in the morning with cars as the first thought on my mind.  I had clearly found myself with information overload and was in “analysis paralysis”.

I’ve seen this happen in business too. There is so much data and information available at every turn that it is hard for Business Leaders to weed through what is truly relevant and what is just “noise” getting in the way of making the decisions needed to move the business forward.

If we want to keep our business from getting stuck in “analysis paralysis” – we need to identify what information is truly necessary for us to achieve our business goals and implement the correct tools to visually present this data in a way that facilitates our decisions and doesn’t hinder them.  The challenge is determining what data is meaningful to the decisions that need to be made.

To have effective Business Analytics, the business goals must be clear. If the data you are looking at doesn’t help you make decisions to achieve these goals, then this data may be just “noise”.  How much data do you look at each day that really isn’t helping you achieve your goals?

If your data is keeping you up at night, maybe you have too much of it.

In my previous post, I proposed that the True Value of Agile is derived from fostering an environment of Continuous Improvement through Communication and Collaboration. To be sure, Communication and Collaboration alone will not bring success. As Vin D’Amico (www.brainslink.com) commented, a critical component to Agile Value and success is a shift in corporate culture to support an environment that encourages risk taking and refrains from seeking to assign blame on any individual when things go awry.   The challenge here is that this goes against most traditional Western, and Asian, cultures for that matter.   In order to improve we must identify what we do well as well as what needs to be addressed. Sometimes finding a better way requires taking the risk to do something no one has thought of before. That does not always lead to success; however, it provides an opportunity for learning and growth. In “Agile,” teams we do not work and make decisions in isolation only to be the single “neck in the noose.” To be an effective Agile team there is no “You” and there is no “Me”, and definitely no “Them.” To be successful the group must live and die as a team.

I am not suggesting Agile teams are like hippie communes, but there are similarities (remember that Steve Jobs lived in a commune before his 15 minutes of fame). Everyone works for the greater good, which is moving the product forward and delivering business value. There is much work to be done and defined timeframes in which to do it. We have tools, and skills, and ideas about how best to complete those tasks. We are jacks of all trades and masters of a few, not one. Effective agile development teams are made up of generalizing specialists. It is this cross-training that helps mitigate singled threaded development as well as unforeseen events that may remove someone from the team for a period of time such as attrition or illness. Once we have our list of work items, which becomes our product backlog, we are able to self-select which tasks we can commit to completing. This is another shift in culture. Traditionally, work is assigned by management to the individual specialists. In an Agile culture the people performing the work are trusted to know best how to distribute the load so that they can successfully meet their commitments and provide value quickly. There should always be something for everyone to do to move forward.  To this end, no one should be waiting for work.

We are not alone in our agile commune either. The business product owner is there with us, every day, to prioritize and test, and yes, re-prioritize and add or defer features based on our velocity and changes in scope. This is not normal for the business owner in relationship to traditional requirements planning for software development.   The scrum master, our servant leader, is also with the team, every day, to work with the product owner to find the most valuable work that can be reasonably fit into the sprint. The scrum master is a servant leader in that it is this role that both serves the needs of the development team to remove obstacles, and provide whatever is necessary for the team to be successful and at the same time leads the team by working as a bridge between the product owner and the team, working on the backlog and the schedule. The dynamic is not unlike an orchestra. The Product Owner is the composer. The scrum master is the conductor, and the members of the development team are the musicians.

What I have described is contrary to most traditional organizations where the business and IT only communicate at the beginning and the end of the project, where divided IT teams are comprised of specialists like a relay team that wait on each other to pass the baton. We need to be more like a volleyball team who work together and support each other to move that ball over the net in the most effective way possible.

Changing corporate culture where failure is treated as a cardinal sin to one where it is addressed, reviewed and used as a learning point is challenging and requires buy-in from all levels of the organization. These concepts are taught in Agile training courses; however, they are not often embraced and supported by corporate management culture. The folks sent to class are typically the developers and the project managers, but rarely the upper management and business sponsors that play a key role in making “Agile” successful.

To achieve the true value of agile we must all work together for the greater good of the company by creating and supporting a nurturing environment of continuous improvement and stop pointing fingers.

 

I recently attended an event on Agile Business Intelligence.  One of the speakers made the statement that the core value of “Agile” is to be faster and cheaper. On the surface you may say, “Of course.  We need to deliver value faster and spend less doing it.” Accelerated delivery of value and reduction in cost are the results of being Agile but not the true value.

I propose that the answer is more esoteric and suggest that the true value of Agile lies in the path the people involved follow to reach the state of rapid lower cost delivery. The goal of applying Agile principles should be to create an environment of continuous improvement.

So, what are these principles and how do they help to foster such an environment? The core principles of the Agile Development Methodology are stated in the Agile Manifesto established in February 2001 in the mountains of Utah.

I believe the most important tenets of Agile for Business Intelligence lie in the aspects of Communication, Collaboration, and the willingness to embrace change. There are practices and tools such as Test Driven Development and Continuous Integration that will help to improve the speed and quality of the work we are delivering; but, tools alone will not make us successful. The path to Agility is not a short walk. There are challenges to overcome, both organizationally and personally for each member of the team, business and IT alike.

The first stop on the path to improvement is communication. Communication between the business and IT, between members of the development team, between executive management and front line management; fundamentally, between everyone who contributes or benefits from the work we are doing.

The first point of communication is the understanding of requirements. This does not just mean “what” we are going to deliver, but also  “why” it is needed, “how” we are going to test it to make sure we did it right , “what” value it brings to the business, “how” we are going to measure that value, and “what” decisions we are going to make now that we have it. All of this information is gathered through a dialogue, a conversation that elicits questions and reflection which drives to a better understanding of the need. It is during these conversations that the requests must to be broken down into manageable units of value that can be successfully delivered in short 2 to 4 week intervals. This is a paradigm shift for most traditional organizations in that the business must learn to embrace receiving small incremental value versus waiting for everything to be completed. For example, the end goal may be to create a Management Dashboard composed of multiple pages of gauges and charts. The business may say “I need it all or it isn’t useful;” however, having one or two of the reports may provide some immediate value and again elicit more dialogue about the true needs. Perhaps the format isn’t quite what the business wanted or perhaps there are alternative contexts in which to view the data that we didn’t consider during the initial discussions. It is also possible that there was an expansion in a line of business or a new distribution channel that may impact what data needs to be captured and displayed.

By keeping the lines of communication fluid, changes will be identified quickly so that needs can be met and true value can be achieved.

Revisiting the presenter’s comment that the true value of Agile is “Cheaper, Faster”, it is my opinion that communication is foundational to achieving efficient and timely delivery of success.

Now that we know what to build, we must explore how best, we as a team can build it. In my next posting I will discuss the Agile development team dynamic.

Tagged with:
 

One of the great tools for creating reports and dashboards is the bullet graph.  This type of graph was invented by one of the great experts in the data visualization field, Stephen Few.  If you are not familiar with his work then you should check it out at PerceptualEdge.com.  Bullet graphs are great tools for showing lots of information in a small amount of space.  They allow easy comparison between an actual measure and a target, for example actual revenue results to forecast revenue.  They are very useful for any performance measure against a target.

An easy way to show actuals to a target is a simple bar chart with the addition of a target line.  I would highly recommend using these, but this type of chart is really only useful if the target measure is identical across all performance measures.

Here’s an example of a simple bar chart with a target line.

The above example is a great way to show categories against a single target or average.  You can see other examples on our Blog (The Bureau of Labor Statistics Creates and Excellent Graph and Make Category Comparisons Much Easier with these redesigns).  These are all very well done target lines measuring the actual to some target.  However, in the above example it may not be reasonable to ever sell 16 units of oranges.  Let’s say for example the realistic target for oranges is only 9.  In this case the sales team for oranges may actually be superstars and we’re showing them as missing the target by more than any other category.  This is where the bullet graph comes in. The advantage of a bullet graph is that you can achieve the same visualization but at the same time it allows you to have different targets for the different performance measures.

For today I won’t go into the details on how to make them or how to read them.  There is an excellent description and step-by-step instructions on how to build these wonderful graphs at http://www.exceluser.com/explore/bullet.htm.  Instead, I want to provide an alternative improvement that might be useful to users in certain instances.

Typically the bullet graph is shown in one of two ways, values or percent.  Both of these measures can be useful depending on what result is being measured and compared.

Here’s an example of a bullet graph using actual values (using Microsoft Excel). 

Here’s an example of a bullet graph using a percent as the measure (Created using Microsoft Excel and a different data set).

In both cases you have an actual, the thin, dark blue bar.  That is then measured against a target, which is the dark blue vertical reference line.  The three bands of color in the bar are used as tolerance bands. I have found that in some cases you may want to be able to visualize the actual and target as both the value and the percent.  By adding a dual axis you get the benefit of both (Created using Tableau).

One additional note on bullet graphs. I have encountered two issues with using these graphs on business reports.

First, these graphs are not easy to make using the standard tools.  They are not native graph types in Microsoft Excel, so it requires a good bit of Excel skill to create these and requires a dataset for each graph, specifying the various components necessary to build the graph.  In other words, the person building the graph will not be able to simply highlight the data and create a bullet graph.  Tableau is one tool that offers the bullet graphs natively, but this also requires a bit of understanding of Tableau in order to create them.  However, Tableau does allow the user to create them without creating little data sets for each graph.  There is also an issue with creating a dual-axis bullet in Tableau that they are aware of.  This is outlined in the details of How to Create a Dual Axis Bullet Graph below.

More importantly, there is often confusion around the bullet graphs, probably because they aren’t seen or used as much as other graphs that users are used to seeing.  There is a learning curve to understanding what the graph means, how the actual vs. target is represented and more often what the color bands represent and how to interrupt it all.  However, in my experience, once the users understand them they seem to adopt them relatively easy. This is probably true for any new graph type that has been created over the years.

In my opinion, neither of these issues are insurmountable.  There are enough ways to create these graphs, either by using templates, add-ins, or different software platforms, and once you build a few of them it will get easier and easier.  As it relates to teaching people how to read them, I think the benefit of taking the extra time to educate the user on what they are looking at outweighs the complexity of the graph, both in the creation of it and the user’s comprehension of it.  It may not be a quick understanding, but in this case I think it’s worth the trouble.

However, if you are creating a report that will be read by many and there is no way to explain the graph then it may be best to choose a different way of presenting the data. As an example, in some cases the bands are not necessary to tell the story.  If the goal is to simply show Actual vs. Target then having a single color bar may work just fine (or maybe no shading at all).  In this case it’s simply a bar graph with a reference line for the target. Add a label or two and anyone should be able to decipher this.

Special thanks to Kristofer Still for creating the dual axis bullet graph in Tableau and detailing the instructions below.

How to Create a Dual Axis Bullet Graph in Tableau

You need to start with a set of data with actuals and a target.  In this example the target is a monthly sales quota of widgets and we compare this with sales to date.

First you create a calculated field for the ratio of sales to date to your goal.  This creates your percentage for your first axis. (You could just as easily calculate the percentage as a field in your data source and this might be preferable if you are creating multiple bullets for many targets.)

From there the easiest way to get a bullet graph in Tableau is to use the show me menu.  To do this select your sales to date and sales goal fields and select bullet graph from the show me menu.  You now have a single axis bullet.

Now we have to do some trickery to get Tableau to give us the second axis.

First you drop the percentage field to the Columns tray.  This will create two side by side bar charts.

Next you drag the percent axis and drop it on top of your bullet graph.

Now that we have the dual-axes there are some additional formatting steps to clean things up:

You want to change the color palette to one of the single gradient palettes and make the percent of goal measure a darker shade and sales goal a lighter shade.

You should also change the formatting of the top axis to percentage.

There is also a bug in Tableau that you will notice.  The reference line for the goal of 10,000 units doesn’t cross at 100% of your top axis.  Tableau provided me with a fix for this.  If you set the axis ranges proportional to one another then they will line up.

So in this case if you extend the upper end of the range of the percentage axis to 1.05 this will cause things to line up.  Be careful, though, this axis is now fixed and won’t automatically update if you place the bullet on a dashboard.  This is less than ideal, but unless you have wild swings in the magnitude of your measures you should be fine.

Your final bullet will look something like this:

I am sure most IT professionals, especially the ones dealing with data, have heard of data quality. The idea of monitoring data to ensure the data fit into the intended use with a high level of accuracy. When creating a Business Intelligence (BI) solution how can data quality be continually monitored through the entire solution? One method is called Data Auditing. The concept might not be new, but the formalization of the process is.

Data Auditing is the process of ensuring data quality from the beginning of the BI process to the final destination in a repeatable and measured way. This includes validation of data that arrive from source to staging to star to cube (if exists). Where ever business logic can be implemented, a data audit can be used to make sure the quality of the data is consistent.

One example of data auditing I have used was for a ‘technical glitch’ with the use of SQL Server Analysis Services (SSAS). The decision was made to not change source data that was of bad quality and load it into the Kimball star schema as it was. (Now this decision can be debated, but one major reason this was done was to easily expose data quality issues to the users as they believed there would be an inconceivably small amount of data quality issues.) What was discovered with using SSAS is it would not handle dates with a year prior to around 1500. Some of the dates in the source system had the year 200 instead of 2000 and so on. A Data Audit routine was designed to look for these dates into the stage tables and change them to a pre-determined default date. This allowed the SSAS job to complete and the cube to process.

This process seems much like any other data quality processes. The true auditing came from how these instances (and the others) were reported. The code developed to catch the data quality issues also entered data into “performance” stars schemas designed to provide data on the nightly process. This data was then shown by a dashboard used by the internal IT, BI staff. Every morning we could see how many rows of data were caught by each audit. This allowed us to make quick decisions on how to handle the data to change in the source system and even change in the data warehouse (including staging tables to make sure there was not a type 2 slowly changing dimension row added by the change to the original source data).

There were other data audits that summed specific counts of rows from the source system and made sure that count was the same in the star and cube due to how the star was loaded. Another data audit allowed us to show the measures in the fact tables and the cube were the same based on logic used. These types of data audits are not used to catch data quality, but to explicitly show that on a day-to-day basis, the results are the same. This audit was extremely necessary to build confidence in the data. Confidence in “the numbers” was extremely important to our clients and this was the simplest way for us to convince our end users they were getting what they were supposed to get.

The best way to implement a data auditing solution is to use the existing BI tools to build the report, dashboard or any other means to expose the audit. Even the simple use of Excel against the cube and star can be used. Any way that can be easily maintained is preferred. Of course the data needs to be understood in order to make sure this is done correctly. If data is distributed in the star to form a lower level of granularity, then it needs to be summed back to the original level and compared to what is in the source. This could mean there are rounding errors present, but that should only provide around a penny difference.

A problem with Data Auditing is the trade-off of time developing the data auditing process takes away from time to develop the business needed parts of the BI solution. One way to incorporate this is to build it into each project. The first project to utilize the concept of data auditing will take longer due to the need to build the underlying data structure and processes. Once this is started and built, the next project to use the structure will take less time to ‘plug in’ just like the idea of re-using conformed dimensions.

Data Auditing’s benefits provide a repeatable way to show data throughout the entire BI process is correct. This concept is part of a good data quality/data governance solution. The ability to ‘watch’ the data as it goes through the entire BI process to make sure it means what it supposed to mean will provide a security blanket for the end users. The end users can say “This data is from the data warehouse and I am positively sure it is correct.” How much value to the business and reassurance to IT does that grant?

The Bureau of Labor Statistics (BLS) has published some really bad graphs and maps over the years.  Below is an example of a map they publish monthly for the “Unemployment rates by state”.  In this map they are attempting to have a sequential color scheme, going from light to dark to represent low to high unemployment rates, but because of poor color choices it has unintentionally become categorical.  Black, which is the highest rate, seems muted against the other colors.  The bright red, which is a middle value of 5%-5.9% unemployment, seems to dominate the map more than the darker red or purple color which is actually a higher rate.

However, the highlight for today is a refreshingly well done graph on the unemployment rate and median earnings when compared to education attained.

This graph is very well done.  Notice the following characteristics.

  • Simple bar graph used for comparison.  The choice of the bar graph allows the reader to easily compare the categories.
  • Consolidated labeling and diverging horizontal scale allows for combined axis labels in the middle.
  • There are no extra gridlines, no horizontal axis line, no axis scale and no border  around the chart (unfortunately the webpage coding added an unnecessary border on their website at http://www.bls.gov/emp/ep_chart_001.htm)
  • The data points are placed on the bars themselves providing addition information to the story.
  • The addition of a very clean reference line (in this case the average) gives additional context to the story and provides a context for each bar to be compared against.
  • Formatting is very clean. A single decimal place is consistent for the unemployment rate and the median income is not cluttered with decimal places, but includes a comma for thousands.
  • The use of color is simple.  Someone who is color blind may not be able to distinguish between the red and the green easily, but since the color is not crucial to the story nothing will be lost.
  • Great care was taken to have the negative statistic, in this case the unemployment rate, increase horizontally to the left, while the positive indicator of median weekly earnings increase to the right.

Notice that they utilized some of the same techniques that were discussed in the recent “Make Category Comparisons Much Easier with these Redesigns” post on the Making Data Meaningful blog.  Now some may argue the overall message of this graph, which is, higher education will lead directly to higher income.  This may or may not be the case; however, the BLS has done an excellent job at presenting this data. Congratulations to the Bureau of Labor Statistics for creating an excellent graph.

As it relates to the first unemployment map, simply changing the color scheme would solve the categorical color problem.  Here is the same map using a color blind friendly blue-orange diverging color scheme.  More importantly though, examine the difference in the emphasis on the orange and dark orange states and very little emphasis on Montana, Kansas, Louisiana and Virginia which were bright red in the original version.  

However, when using this diverging color scheme the blue still attracts attention to the low percentage states.  This kind of color contrast might work well for a political map, for example Republican vs. Democrat, but for a low to high scale this can be confusing.  A better version of this could be achieved by using a single color, light to dark, and removing a few of the bands, for example 5 or 6 bands instead of 8.  Here is the same map using the sequential color palette but only using orange and 6 bands.  This is similar to the original map, but avoids the purple and dark red being interpreted as categorical.

Here is the same map using only gray scale as the sequential coloring.

Another major issue with these charts though is the difference in scales from month to month and what appears to be an arbitrary grouping of states. Here’s a comparison of the legend for the map in December 2008 and October 2011. Notice the different scale for the two legends as well as the groupings within each color.

Compare the maps side by side.


In the December 2008 version the groupings start within a band of 0-1.9% and then move in 1% increments until the purple band which has 3%.  In the October 2011 the bands are grouped differently.  Below is a straight line band of 1% to outline the color difference.

By changing this color scheme it makes it impossible to have an apples-to-apples comparison from one time period to the next.  This is a shame because this type of map would make an excellent trellis charts to compare month by month or year over year.  Also, the color choice and band selection will have a dramatic impact on the visual story.  This inconsistency allows for the creator to manage the story. Hopefully, the future graphs of the BLS will continue to follow the good example.

The largest and most complex aspect of Business Intelligence (BI) is the data warehouse.  In this context, the data warehouse is the repository of data generally fed from many sources to keep historical perspectives of an entity’s data.   It is a behemoth that is generally expensive, slow to build, complicated in structure and difficult to maintain.  How necessary is it?  Does a company need the actual, physical data warehouse to have a successful and sustainable business intelligence (BI) program?

There are many design methodologies that take these issues into consideration.  There are advantages and disadvantages to both traditional (and non-traditional) methodologies which I do not cover in this post.  My goal is to bring up points of view of why and when a data warehouse may or may  not be used.  What I would like to cover is:

  • The Corporate Information Factory (CIF), based on the Inmon approach
  • The Kimball Style of data warehousing
  • BI using no data warehouse at all

Corporate Information Factory

The Corporate Information Factory methodology, in a nutshell, says there is no way of getting around this inevitable fact of the need for a data warehouse.  In order to have a successful and sustainable BI program, a data warehouse is needed.  Not only is it needed, it needs to be completely designed, built and populated prior to any further analysis or BI work can be done.  This is due to the nature of how business concepts are intertwined within each other necessitating the big picture view.  This style also views the architecture process more from the IT/data perspective compared to the business need point of view.

Kimball Methodology

The Kimball methodology of data warehouse design is not as structured and regimented as the Corporate Information Factory.  The Kimball data warehouse is the sum of its parts; meaning one area of the business could be designed, developed and deployed providing BI insight while other aspects of the business have not been discussed.  This concept will speed the development of the data warehouse compared to the CIF, but the underlying data warehouse can become much more complex as more and more is added to it along with the possibility of rework.  This style views the architecture process from the business needs point of view compared to the IT/data perspective.

No Data Warehouse?

What about not using a data warehouse?  In the new age of Data as a Service (DaaS), Master Data Management along with Service Oriented Architecture (SOA), why re-store data from disparate systems?  Why not store the metadata of where the data is found and attach the business logic to the SOA call?  This can be a very powerful way to gain insight into data.  The idea that the development of a data warehouse can be done without the data warehouse.  There are already tools that will do this.  One of them is Qlikview from Qliktech.  The basic premise behind this tool is to allow the user to develop the Transform and Load aspects of ETL (Extract Transform and Load) in memory to delivery very quick analytics in a solid visual manner.  This tool is not a methodology, but SOA could be used in a larger context with the same principles. This style views the architecture process as something the business could do, but IT does not have to do.

The idea that a data warehouse is necessary for a successful BI implementation is not necessarily true.  A data warehouse is not necessary to have analytics or provide a picture of the data you have.  I believe it is very questionable to say this process is sustainable to leverage every benefit for BI.  The very important aspect of BI that cannot be overcome by SOA, or in-memory analytic tools like Qlikview, is the entire reason the data warehouse first came about.

The decision for building or not building a datawarehouse is all about the history of the data.  Not the history that is required by law to be kept like financial data or what in many cases is considered ‘facts’ in the Kimball style.  If this were the only history needed, a data warehouse would be less necessary.  The type of history that is important is the history that cannot be reproduced within the source systems.  This is the history of changes made that are not kept by the source system.  In many cases a customer’s address may not be historically important in a transactional/source system so only  the most current record is kept.  If that history is not kept somewhere (like a data warehouse), analytics of historical purchases of products will not show a true picture of what actually happened.  It will only show the picture of what is in the source system at the current point in time.  This situation is the quinticential lynchpinn for why a data warehouse should be necessary.  The ability to track and keep history that is not kept in the source system is something SOA, or in-memory BI is not capable of reproducing.

If the desired BI capability for the business is operational in nature, a data warehouse will not offer any significant benefit over SOA.  This is a short sighted tactical means of looking at data and cannot provide strategic insight, but it certainly could be the best way to answer that need for data given the circumstances.  This would not be the end-all-be-all for BI, but it certainly can provide means to start a program.

So does this completely answer the question “Is a data warehouse necessary for BI?”  The data warehouse is necessary for a complete and sustainable BI program, but it does not have to be the start of the program.  So… of course the answer to that is still…. “It depends…”

… Doug

During the Business Intelligence Symposium presented by Lucrum in conjunction with the University of Cincinnati, College of Business, Filippo Passerini, Group President of Global Business Services and CIO of P&G, promoted the idea of an Information Democracy. He is not the first person to use this phrase, only the latest to try and specifically define what is meant by the term. The power of providing an Information Democracy to the data consumers enables similar freedoms to the citizens of the U.S. democracy.
LIFE: the growth of an organization using data driven decisions (a company that is not growing is dying)
LIBERTY: the ability to quickly make the appropriate decisions based on data (a company is less suppressed by competent data driven decision making)
THE PURSUIT OF HAPPINESS: the ability to improve profits (what company is not happier with more profit??)

Since there are many types of democracies, the term Information Democracy is not easily refined. Mr. Passerini discussed his idea of Information Democracy as providing the same information at the same time to all that should view that data. This Lateral information exchange has enabled P&G unprecedented access to data propelling their decision making to be quicker and based on current data.
The purpose of their Information Democracy is to provide not only one version of the truth, but the same version of the truth to everyone. This might sound like the same concept, but there is a subtle difference and it deals with the latency of the data and ability to massage results. It tries to eliminate the “My data shows…” statements made by many because the data is owned and seen by all people at the same time. There is no delay to anyone in receiving data, no standardized reports to be re-issued, no side data to be pulled into Excel to get a different look, just the data received in a dashboard/cockpit environment.
The delivery of the data in Mr. Passerini’s Information Democracy is prolific. The same pieces of information are delivered via mobile devices, traditional PCs or P&G’s Business Sphere environment (a conference room of walls with electronic displays filled with information). The same data provided at the same time to all parties involved using multiple delivery devices allows the entire P&G managerial structure to evaluate data wherever they may be. This pervasive data culture is another example of P&Gs increased ability to adapt their business more quickly in a team environment.
The Information Democracy has not come easily at P&G as they have had to overcome obstacles. It has taken a huge effort to change the culture to embrace data for data driven solutions. Security issues make the delivering of data to all the necessary people difficult. The technology to do this is available, but the governance was generally lacking. These issues must be addressed, as P&G has, prior to successfully implementing the idea of an Information Democracy.
Transparency of the data (showing the same data to all necessary parties), timeliness of the data (getting the data to all parties as early as possible), and transportation of the data (delivering the data in multiple formats for easy consumption) make the three branches of the Information Democracy much like the executive, legislative and judicial branches make up our democracy. With these branches and the appropriate data governing processes, there truly can be an Information Democracy allowing data “…of the people, by the people and for the people.”

…Doug