Big Data….what’s this talk all about? I’m finding many technology articles espousing the “next big thing” including reasons to choose Hadoop or when not to choose Hadoop. There are a lot of stories about what Appliances to use from the likes of IBM, HP, Oracle, and others. Business leaders should want to know what kind [...]
Big Data….what’s this talk all about? I’m finding many technology articles espousing the “next big thing” including reasons to choose Hadoop or when not to choose Hadoop. There are a lot of stories about what Appliances to use from the likes of IBM, HP, Oracle, and others. Business leaders should want to know what kind of infrastructure it is going to take to do “Big Data” and this time, the Brick and Mortar can take lessons from the eTailers and Google and such. Typical Big Data systems are also migrating over to NoSQL ecosystems and other computing approaches that will be able to scale. Other questions such as what really are the limitations of RDBMS’ and what are the new design concepts.
We are now seeing the pent-up demand for Data through the holiday season and let me assure you, “Big Data” is everywhere. What’s Big Data? Is there a good definition? When in doubt, go to WIKIPEDIA? Not the “center of truth” for sure but it is also a reasonable and acceptable starting point. Here’s what they say:
“Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to “spot business trends, prevent diseases, and combat crime.”
They also go on to say:
“One current feature of big data is the difficulty working with it using relational databases and desktop statistics/visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers.”
OK, that’s one definition of “Big Data.” Let’s try to think of Big Data in the context of the business world and what it may mean to our clients and prospective clients. Although this article doesn’t focus on the “industry term” Big Data and instead really centers on what Big Data can mean for a company’s strategy and their future growth plans.
The larger picture is well-represented by the recent November 2011 McKinsey Quarterly. In the publication, McKinsey’s Michael Chui interviews Bob McDonald, CEO, P&G.
In the interview, Mr. McDonald expertly describes P&G’s future digital strategy as initiating new business models that are supported by the new technologies of analytics, real time processing, and “Big Data.” Key points that I got out of the article include:
- We believe digitization represents a source of competitive advantage.
- There’s new demand for a whole new approach to accounting: Activity-Based Accounting. It’s been around for years. The traditional “double-entry bookkeeping,” founded by the Christians over 700 years ago, focuses on historical analysis and not “real time profitability.” Think of what we could do if we were to give each operating unit real time data and processing? Most businesses are handicapped to some extent by their internal ability to be agile. Their supply-demand models limit their abilities to flex when they need to and contract when indicators are telling them to do so. A simple example that we have all known about for years is in healthcare. Let’s get personal: have you or one of your loved ones ever waited for 2 hours for a doctor’s appointment? Did the doctor do that purposely? Certainly not! The ability to match capacity with demand on a real time basis is where most businesses are going.
- “Data Partners” are referenced in the story. They are the “suppliers” of critical data that is needed for real time analytics for decision-making. Quality of data, periodicity, and timeliness are key attributes to this relationship. Perhaps traditional billboard and TV ads will not be as strategic anymore in the future?
- “How you ever done a Monte Carlo simulation?” and he goes on to say “We wanted to find people who had true mastery in computer science…analytical thinking skills have become ever more important to this company…..those innovations are always informed by data.” That is reclassifying whole finance, accounting, and IT personnel in one fell swoop!
OK, John, what’s the future in store? We keep hearing the innovation is the key. Is innovation rewarded in most companies? Apple is probably the most cherish innovator in today’s economy and there’s a whole industry and supply chain right behind them!
My reaction to his comments on human capital:
- Business Professionals need to have renewed “personal development plans” to reach higher and challenge themselves to attain a higher mastery of their particular skills and even broader skills for the future. They will need to balance both technology and business acumen together to reach a higher level of productivity. Once again, the “Knowledge Worker” is all powerful and will be able to sustain themselves in a globally competitive marketplace where anything of potential commodity value is outsourced.
- Business Professionals will need to have a well-rehearsed “business and domain-expertise” conversations that they can have with their decision-makers, customers, and suppliers. Intelligence innovation requires risk taking and guts. After the recent recession, how many businesses are really ready to get out of the proverbial “foxhole” and get their heads back to the commercial battlefield? Is there significant enough rewards to encourage this behavior or is this more of a survival tactic? The job is to identify ways to leverage data to LEAP FROG the competition with new business models. Will Fortune 500 companies like P&G lead this behavior or leave it up to the mid-market?
- Attention all Finance, HR, and Accounting Professionals: I love what Bob McDonald says in the article: Ineffective systems and cultures are bigger barriers to achievement than the talents of people. If there are ways and approaches to increase productivity and eliminate old processes, I believe McDonald indicates “now’s the time.” The opportunity is to allow for our own organizations to have a better line of sight to our “real time performance,” then we must make the investments to increase our game. It’s important that all firms look for ways to reduce manual work and increase our automation of our financial and operational reporting.
There it is. These are big ideas for you and they are inspired by one of the most successful companies on the planet. I know that you will have to do more planning at all levels of your company in order to take advantage of the explosive market-growth that is occurring in our global market. The irony is that the daily news seems to tell us that “gloom and doom” are just around the corner. Interesting enough, the current opportunity we have kinda reminds me of the late 90’s and the “go-go” years of the Clinton Era!??
I am sure most IT professionals, especially the ones dealing with data, have heard of data quality. The idea of monitoring data to ensure the data fit into the intended use with a high level of accuracy. When creating a Business Intelligence (BI) solution how can data quality be continually monitored through the [...]
I am sure most IT professionals, especially the ones dealing with data, have heard of data quality. The idea of monitoring data to ensure the data fit into the intended use with a high level of accuracy. When creating a Business Intelligence (BI) solution how can data quality be continually monitored through the entire solution? One method is called Data Auditing. The concept might not be new, but the formalization of the process is.
Data Auditing is the process of ensuring data quality from the beginning of the BI process to the final destination in a repeatable and measured way. This includes validation of data that arrive from source to staging to star to cube (if exists). Where ever business logic can be implemented, a data audit can be used to make sure the quality of the data is consistent.
One example of data auditing I have used was for a ‘technical glitch’ with the use of SQL Server Analysis Services (SSAS). The decision was made to not change source data that was of bad quality and load it into the Kimball star schema as it was. (Now this decision can be debated, but one major reason this was done was to easily expose data quality issues to the users as they believed there would be an inconceivably small amount of data quality issues.) What was discovered with using SSAS is it would not handle dates with a year prior to around 1500. Some of the dates in the source system had the year 200 instead of 2000 and so on. A Data Audit routine was designed to look for these dates into the stage tables and change them to a pre-determined default date. This allowed the SSAS job to complete and the cube to process.
This process seems much like any other data quality processes. The true auditing came from how these instances (and the others) were reported. The code developed to catch the data quality issues also entered data into “performance” stars schemas designed to provide data on the nightly process. This data was then shown by a dashboard used by the internal IT, BI staff. Every morning we could see how many rows of data were caught by each audit. This allowed us to make quick decisions on how to handle the data to change in the source system and even change in the data warehouse (including staging tables to make sure there was not a type 2 slowly changing dimension row added by the change to the original source data).
There were other data audits that summed specific counts of rows from the source system and made sure that count was the same in the star and cube due to how the star was loaded. Another data audit allowed us to show the measures in the fact tables and the cube were the same based on logic used. These types of data audits are not used to catch data quality, but to explicitly show that on a day-to-day basis, the results are the same. This audit was extremely necessary to build confidence in the data. Confidence in “the numbers” was extremely important to our clients and this was the simplest way for us to convince our end users they were getting what they were supposed to get.
The best way to implement a data auditing solution is to use the existing BI tools to build the report, dashboard or any other means to expose the audit. Even the simple use of Excel against the cube and star can be used. Any way that can be easily maintained is preferred. Of course the data needs to be understood in order to make sure this is done correctly. If data is distributed in the star to form a lower level of granularity, then it needs to be summed back to the original level and compared to what is in the source. This could mean there are rounding errors present, but that should only provide around a penny difference.
A problem with Data Auditing is the trade-off of time developing the data auditing process takes away from time to develop the business needed parts of the BI solution. One way to incorporate this is to build it into each project. The first project to utilize the concept of data auditing will take longer due to the need to build the underlying data structure and processes. Once this is started and built, the next project to use the structure will take less time to ‘plug in’ just like the idea of re-using conformed dimensions.
Data Auditing’s benefits provide a repeatable way to show data throughout the entire BI process is correct. This concept is part of a good data quality/data governance solution. The ability to ‘watch’ the data as it goes through the entire BI process to make sure it means what it supposed to mean will provide a security blanket for the end users. The end users can say “This data is from the data warehouse and I am positively sure it is correct.” How much value to the business and reassurance to IT does that grant?