Currently viewing the tag: "Data"

White Board PhotoTypically, strategic goals start off as high-level initiatives that involve revenue-based targets.  Revenue targets are followed up with operational efficiency goals (or ratios) that keep expenses in line and improve profit margins.  These goals and ratios serve as the ultimate yardstick in measuring top-end strategic performance.  There may also be competitive goals that utilize different measures such as market share, product perception, etc.   Companies believe they can achieve these results based on internal and external competitive factors.  It is important to note that the internal and external factors typically drive the timing and define the tactical activities that will be employed to achieve results.

For example, a change in government regulation may present a significant opportunity for the company that is first to capitalize on the change.  An example of an internal factor may be outstanding customer service that can serve as a market differentiator to attract and retain customers.

These competitive factors and performance measures drive the definition of the tactical operations (or plan) needed to achieve strategic goals. Tactical operations are ultimately boiled down to human activities and assigned to managers and their employees.  Human activities impact revenue, profit, and quality.  Even quality activities ultimately impact revenue and profit.

Example, an insurance company may excel at gathering high quality claims data that results in lower claim expenses and legal costs.

Human activities are incorporated into an individual’s performance plan.  Before defining the human activities though, the goals, competitive factors, and tactical operations need to be gathered into a data repository.  Once gathered, they will be used to gain and communicate corporate alignment.

Depending on your role in the organization, you may be called upon to help define and capture the financial performance ratios.  You may also be responsible for gathering and storing external factors such as survey results, industry statistics, etc.

If all goes well, the corporation captures the revenue and performance goals and defines how performance is to be measured.  This is also communicated across the enterprise (gaining alignment).  The performance goals and target financial ratios can be stored in the corporation’s data repository.  The measuring and communicating of progress will be accomplished using a company’s reporting toolset.  The company has to decide the best frequency to communicate actual performance compared to stated goals.  This frequency can be daily, weekly, monthly, or quarterly with the emphasis on providing continual feedback.  Reporting on performance results is the first, and most basic, step in the adoption of BI practices.  Performance reporting answers the question “What happened?” (Davenport & Harris, 2007).  It is very important but only the first step.

Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics The New Science of Winning. In T. H. Davenport, & J. G.

Harris, Competing on Analytics The New Science of Winning (p. 8). Boston: Harvard Business School Press.

Stay tuned for more from Rob Urbanski on Performance Management and Business Intelligence.

It is always best to avoid rotated text when creating data visualizations, yet this seems to be one of the most common problems I see.  This might be due to the fact that tools like Microsoft Excel rotate axis labels automatically in many situations and people don’t make any adjustments to these defaults.  Even Tableau, which generally has better practices built into the defaults, rotates axis labels in many situations.  In fact, Tableau doesn’t even allow the user to rotate the y-axis title and requires a work around to show the y-axis title horizontally.

The most common problem with rotated text is in the x-axis labels.  Often times, it’s simply the length of the labels that force the software to rotate the text.  Consider these bar charts below:

As I tell my data visualization students, the language of bar charts speaks both vertically and horizontally. In this case the easiest thing to do is to rotate the entire chart.  By doing so the axis labels can be read easily, without tilting your head, and the bars have the same function as they did when they were vertical.

The only exception to rotating a chart from vertical bars to horizontal bars is when dealing with time series data.  Time series data is always best on the x-axis (and typically in a line chart). Therefore, do not rotate or reorder time series data.

If you feel you must use vertical bars for some reason then consider other options to avoid rotating text (see the article Exploring All of Your Options for more detail on Time Series data and other options).

Here’s an alternative in this particular example.

I hope this information will help you avoid rotating text on your data visualizations.

The Background

Love it or hate it, the madness is upon us.  Every March, the country gets a healthy serving (or three) of College Basketball.  Each year, approximately 40 million people fill out brackets for the NCAA Men’s Basketball Tournament and each year, every single one of those people swears that they picked everything perfectly.  If you were about to Google, “What are the odds of completing a perfect bracket?” I will save you the trouble; it is 1 in 9.2 Quintillion.  If you were about to Google, “What on Earth is a Quintillion?” the answer is a 1 with 18 zeros behind it.  To put this in perspective, the odds of winning the Powerball are 1 in 175 Million. You have a better chance of winning the Powerball multiple times than picking that bracket correctly.

These however are just numbers, I began to wonder how I can slice and dice tournament history data.  Sure, I can find what teams have won the most or lost the most.  But can I dive even further, and find out what states, cities, or teams have the most wins or championships?  Which teams constantly underperform and which teams exceed expectations?

The Research

Using a data dump of NCAA Tournament History from 1939 to 2012 I was able to dive in very quickly and start seeing results.  I first wanted to see which states produced the most tournament victories.  Using Tableau I was able to visualize what the Top Ten states were in terms of victories.

Visual of the Winningest States

Using a filled map, I was able to visualize the amount of wins for the top ten states.  North Carolina and California are the top two states, no doubt fueled by the powerhouse schools of North Carolina, Duke, and UCLA. I wanted to go even further and see which cities brought the championships home for their respective states.  To create this visualization I used a dual axis map combining my filled map with a symbol map.

Visual of Winningest Cities within the Winningest States

Using this visualization you can see which cities allowed the states to appear on my first map.  Los Angeles and Lexington are homes to schools that have brought home the most national championships.  Instead of using strictly numbers and labels, I was able to represent their success using a “Circle” Symbol.  The bigger the symbol the more championships achieved.

I have a clear picture of what teams succeed, but how can I find out which teams succeed… Or don’t, when they are supposed to.  To do this, I needed to find out how many upsets occurred over the years.  Using the teams designated seeds at the beginning of the tournament I was able to determine every upset in tournament history.  I took this data and created visualizations for teams that get upset, and teams that create the upset.

Underachieving Teams Visual

Overachieving teams visual

I was able to utilize a stacked bar chart to visualize when teams were a higher seed if they were upset more often than not, and vice versa, if they were a lesser seed were they prone to upset their competitor.  The stacked bar also helped to show that while teams like Duke and North Carolina were upset the most, it was because they had the most opportunities to become upset.  The data above shows that Kansas is an overachieving team. 34 times out of 49 possibilities they upset their opponent in the tournament.

The Analysis

History shows that our top performing states are North Carolina, California, and Kentucky.  The cities that make those states successful are Lexington, Los Angeles, Chapel Hill, and Durham.  We can also see that teams such as Brigham Young, Pennsylvania and Utah State have a habit of underperforming in the tournament.  While teams such as Florida, Duke, and North Carolina, tend to over perform when they are the underdog.

The Conclusion

March Madness is an event loved by many, and the benefits of visualization allow me to recognize these findings very quickly. Imagine this type of data at your fingertips when you are filling out your bracket.  I certainly wish I would have used it to my advantage. Now, imagine these types of visualizations fueled by your company’s data. Replace the “wins” data with company revenue data.  You would be able to identify where you are successful, and then go further down to see what cities are producing that success. This allows a quick look at your business.  Use sales leads data to fuel your stacked bar charts.  See which of your offices is receiving/submitting leads and see how well they are closing them.  Data is powerful, but using visualization tools makes data meaningful.

Excerpt:

It is true about blogs and books suggesting line charts for time series data.  In fact, when teaching data visualization at the University of Cincinnati I always reinforce to my students that time series data is best as a line chart.  This is because we, as readers, typically understand time when plotted on the x-axis and we typically want to see a trend over time.  This is the biggest advantage of a line chart as it shows trend over time better than any other chart type.

Click here to read Exploring All of Your Options: Data Visualization

 

ABOUT THE AUTHOR:

Jeff ShafferJeffrey A. Shaffer is the Vice President of Information Technology and Analytics at Unifund. Mr. Shaffer joined Unifund in 1996 and has been instrumental in the creation and development of the complex systems, analytics and business intelligence platform at Unifund. Mr. Shaffer holds a BM and MM degree from the University of Cincinnati and an MBA from Xavier University where he was the winner of the 2006 Graduate Student Scholarly Project in Research. Mr. Shaffer has attended the Harvard Business School’s Executive Education Program, is a Certified Manager of Quality and Organizational Excellence through the American Society for Quality, a Certified Project Management Professional through the Project Management Institute and has completed Six Sigma Green Belt and Black Belt training with the Xavier Consulting Group. Mr. Shaffer is also Adjunct Assistant Professor at the University of Cincinnati in the Carl H. Lindner College of Business teaching Data Visualization in the Graduate Course series for Data Analytics. He is also a regular speaker at business intelligence conferences and symposiums on the topic of data visualization, writes for the data visualization blog at MakingDataMeaningful.com for LÛCRUM, Inc. and was a finalist in the 2011 Tableau Interactive Visualization Competition.

As a business undergraduate at the University of Cincinnati, I recently noticed an article in the Cincinnati Business Courier about P&G’s push into business intelligence and analytics. Why are CIOs of P&G, FedEx and Boeing just now beginning the push “to make business intelligence the way that business gets done”?

Business Intelligence is already a mature market and we’re beginning to see the next “maturity cycle.” Analytics has been a top priority for CIOs for many years, but some have yet to pull the trigger. This leads me to believe that these corporations are testing the waters, waiting to jump in when the analytics market is hot enough that competition rises.

Economically we know that competition ultimately dilutes the market with firms promoting higher quality, better services and lower prices. Firms will be at the mercy of these corporations who are trying to get the lowest price for analytics services, while business intelligence firms are trying to get as much as possible without cutting into their margins.

Gartner estimates a 7 percent increase in BI software revenue in 2013 at $13.8 Billion from 2012. By comparing BI service providers in 2011 and 2010 in terms of market share and growth, we can get a general idea of where the market is headed.

Company

2011

 Revenue

2011 Market Share (%)

2010

 Revenue

2010 Market Share (%)

2010-2011 Growth (%)

SAP

2,883.5

23.6

2,413.1

23.0

19.5

Oracle

1,913.5

15.6

1,645.8

15.7

16.3

SAS Institute

1,542.8

12.6

1,386.5

13.2

11.3

IBM

1,477.6

12.1

1,222.0

11.6

20.9

Microsoft

1,059.9

8.7

913.7

8.7

16.0

Other Vendors

3,363.8

27.5

2,931.1

27.9

14.8

Total

12,241.0

100.0

10,512.2

100.0

16.4

Will business intelligence be the star that burns the brightest? Will it become just another management methodology that will fade away like all the others?

Six Sigma seemed to work great for Jack Welch at GE but his protégés that took that methodology to other industries failed. Not every method works in every industry. Analytics has already been very successful in data overloaded industries like banking and insurance, even logistics, but how will it pan out for consumer packaged goods (CPGs) and airplane manufacturers? Most CEOs of large corporations are focused on quarter-to-quarter earnings and increased shareholder value just to keep everyone happy.

Although U.S. corporations are sitting on more cash than ever before they are more than hesitant to spend it. Apple has over $100 billion in cash, but cash won’t make a better IPhone, will it? If these companies opened up their wallets it would not be in their best interest to just throw money at their short-term problems by investing in new technology or hiring new people. Unfortunately throwing money at a problem only provides temporary relief.

On the flip side when corporations like P&G and FedEx begin to become more transparent with data and hopefully more profitable, shareholder value will rise tremendously. Business leaders understand that analytics, if implemented correctly with specific strategies and goals, will add to the business’ bottom line. For the investor time will only tell; it might be a good idea to keep an eye out for these companies by measuring performance five years before BI implementation and 5 years after.

 

Source: http://www.bizjournals.com/cincinnati/blog/2013/02/pg-ceo-mcdonald-business.html

Google Trends Graph of BI

Google Trends shows the term “Business Intelligence”, as a web headline topic, has declined since 2004. In the past two years it has been surpassed by the term “Big Data”. “Business Analytics” is emerging as the term some industry thought leaders, such as Gartner and IDC, are using as the catch-all term for software solutions that use data analysis to guide business decisions.

Despite the essential inclusiveness of all three terms, there is no shortage of discussion on the differences among these and a number of other contenders. Are the old terms so limited that they cannot contain the huge new advances in the field? Or have there been too many disappointments with attempts to deliver “Business Intelligence,” that we need new, exciting, and “untainted” terms.

It is important that we do not get distracted by new umbrella terms that cover the same mission, the same systems, and the same activities.  It is like arguing over whether a Prius is an automobile or a car. The important thing is that there are exciting new technologies that can be applied to achieve the objectives of Analytics, Business Analytics, Business Intelligence or Big Data. It really does not matter which term is used. Let’s face it, When is BI not BI?  If a term refers to ways of making data meaningful and profitable, it’s all BI.

As business travelers and Cincinnatians we have all witnessed firsthand the ever-increasing airline ticket prices from Northern Kentucky International Airport. All companies, big and small, are trying to keep down unnecessary costs. This visualization was created in Tableau Desktop in order to show how data visualization can make it easier to view trends and patterns of airline costs in order to cut costs. This type of analysis and visualization can be done in any industry in order to view the progress of the company against their goals and performance indicators.

Why are visualizations so useful and why create one? Visual.ly Blog was recently asked “Why is Data Visualization so Hot?” They responded saying that, “Visualization allows access to challenging data sets, it allows exploration, can be fun, and provides useful information in an efficient way.” I would agree and add that, as humans, we are already trained to recognize trends and patterns in graphs, which is why they are so efficient in translating data.

Some of the questions that I wanted to answer about round trip airline prices were:

How do airline ticket prices fluctuate over time?

Ticket Price by Trip Line Chart

Looking at this graph it is easy to visually see that some trip prices vary a lot and some are fairly constant. This led me to another question:

Do prices follow trends by departure airport or arrival airport? 

Price by Departure Airport and Arrival City

Looking at the graph you can see that for San Francisco the ticket prices seem to follow the same general trend. The other arrival cities were generally the same as well.

Would it be worth it to fly out of the Columbus (CMH) or Dayton Airports (DAY) instead of Cincinnati?

Average Price by Departure Airport

Looking at the graph above I found that CVG was on average $50 more expensive than CMH or DAY airports.

How competitive are the airlines by round trip? Does the cheapest flight change airlines constantly or do they stay the same? 

Airline Competitiveness

Using this visualization you can choose a trip to look at in order to see how often the color (airline) changes. If the color changes a lot then that is a competitive trip where the cheapest airline changes often. However, trips like CVG to Charlotte and Las Vegas are constant by airline.

If there is a cheaper airline which one is it and how many flights do they have?

Flights from Cheapest Airlines

The graph above shows the average price of an airline ticket (color) and the total number of flights recorded (size of circle). Overall, looking at the graph, Delta had the most round trip tickets but in the end the average price was the highest. Whereas on the other end of the scale, AirTran had the cheapest average prices but only a few tickets. In other words, the deals that AirTran does have are very good deals.

Lastly, everyone’s favorite question:

When should I buy my ticket in order to get the cheapest price?

Weekly Airline Ticket Prices

Normally when this question is asked it seemed that Tuesday was the best day to buy airline tickets, but after looking at my data, it showed that Wednesday had the cheapest prices. However, by looking at the graph you can see that the price difference is not very significant.

Using Tableau made it very easy to view the trends and patterns in airline prices. It was easy to see the fluctuations of trip prices and compare airports. Data visualization is a hot topic and can greatly help your company to quickly find anomalies and progress of performance indicators.

**Sources:

http://blog.visual.ly/why-is-data-visualization-so-hot/

This data was collected by finding the cheapest round trip ticket in March regardless of day, length of stay, or airline using www.hipmunk.com.

Airline Dashboard Tableau Visualization

Wikipedia defines Forecasting as the process of making statements about events whose actual outcomes (typically) have not yet been observed.

Examples of forecasting would be predicting weather events, forecasting sales for a particular time period or predicting the outcome of a sporting event before it is played.

 Wikipedia defines Predictive Analytics as an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.

Examples of predictive analytics would be determining customer behavior, identifying patients that are at risk for certain medical conditions or identifying fraudulent behavior.

Based on these definitions, forecasting and predictive analytics seem to be very similar…but are they? Let’s break it down.

Both forecasting and predictive analytics are concerned with predicting something in the future, something that has not yet happened. However, forecasting is more concerned with predicting future events whereas predictive analytics is concerned with predicting future trends and/or behaviors.

So, from a business perspective, forecasting would be used to determine how much of a material to buy and keep on stock based on projected sales numbers. Predictive analytics would be used to determine customer behavior like what and when are they likely to buy, how much do they spend when they do buy, and when they buy one product what else do they buy (also known as basket analysis).

Predictive analytics can be used to drive sales promotions targeting certain customers based on the information we know about their buying behavior. Likewise, the information obtained from predictive analytics can be used to influence sales projections and forecasting models.

Both, predictive analytics and forecasting, use data to achieve their purposes. But, it’s how they use that data that is much different.

In forecasting, data is used to look at past performance to determine future outcomes. For instance, how much did we sell last month or how much did we sell last year at this time of year. In predictive analytics, we are looking for new trends, things that are occurring now and in the future that will affect our future business. It is more forward looking and proactive.

So, although forecasting and predictive analytics are similar and closely related to one another, they are two distinctively different concepts. In order to be successful at either one, you have to have the right resources and tools in place to be able to extract, transform and present the data in a timely manner and in a meaningful way.

A common problem in business today is people spend much more time preparing and presenting information than they do actually determining what the data is telling them about their business. This is because they don’t have the right resources and tools in place.

 

At LÛCRUM we have the resources, strategies and tools to help businesses access, manage, transform and present their data in a meaningful way. If you would like to learn more about how LÛCRUM can help your business, visit our website or contact us today.

We are coming into the golden age of analytics.  This is the vision that the speaker – CEO of a company that develops data visualization software – illuminated for the audience at a recent customer conference.  “We are putting the power of data into the hands of creative people to explore the worlds of possibility.”

The idea that we can use our talents to solve client data puzzles is like an adventure that makes it fun to come to work every day.  We’re explorers in an unknown land, moving from all the standard questions (and the standard answers) to a place where the questions themselves haven’t been formulated yet.  New thinking, contemporary sensibilities, and breakthrough technologies are disruptive factors in this age.

Mobile DevicesOne area I pay attention to is the continued evolution of business intelligence (BI) in a mobile environment.  Mobile BI received a lot of fanfare in the past year, and all the major platforms promote a mobile solution.

Now that some of the hype is starting to settle down, what is the real story?  Here are a few thoughts based on my own observations and experience.

1. Mobile BI deployments will favor use on tablets, rather than smartphones, given current screen sizes.

It’s easy to refer to “mobile” like we refer to “Europe” as a single form factor or entity.  The reality is more complex, as there is a range of devices from the smart phone, with relatively small screen sizes, to tablets, with screens just a bit smaller than you’d see on a typical ultrabook laptop.  More screen space gives us more room to place data and provide interactivity.  I see organizations prioritizing tablet deployment over smartphone deployment in most cases.

2. Business gains so far have been incremental, providing efficiencies rather than game-changing breakthroughs.

Many mobile BI efforts seem to focus on converting the oodles of reports hanging around every corporate office into a mobile format.  That makes data more portable, which is an improvement.  But what we should seek to do is to make the analysis and decision-making that goes along with running a business happen in a portable way too – where you are, right now, as soon as information becomes available and action is needed.  This concept of “right-time mobile analytics” (not just mobile BI) is where I believe transformational gains will be found.

3. Design principles need to evolve to better anticipate the needs of mobile tablet users.

Most mobile BI efforts seem to be focused on cramming the dashboard or report that was designed for a desktop user into a mobile screen.  There are several problems with this approach.  First, a dizzying array of formats, resolutions, and screen sizes is present in the market.  It’s reminiscent of a challenge with website design, where you don’t know what kind of screen the user will have for their desktop.

Beyond screen size, you quickly discover that dashboards and interactive visualizations that are crammed onto a mobile device are balky to navigate when you substitute fine-point control of a mouse cursor with the more generalized notions of a tap or swipe.  Analysts and designers should invest the time to redesign existing visualizations and reports so that they can be easily and efficiently used with these new human interfaces.

4. Mobile BI tools do best at providing answers to known questions, rather than providing a platform for rich and interactive data exploration and discovery.

Largely due to the factors and challenges related to the interface, we have not seen a good mobile implementation of the interface needed to explore the data and design new visualizations.  Sleek interfaces that enable and facilitate data discovery, such as the forthcoming update to Tableau’s Desktop Professional software (v8), are getting there on the desktop, but are very limited on their mobile implementations.

I’m not sure we should even care, because this may be a square-peg-in-a-round-hole problem.  I don’t see the compelling need right now to port that capability to mobile, when most of the value in mobile will come from deploying effective, efficient visualizations to those who need better information to make right-time decisions, rather than enabling analysts to design new analyses on-the-go (and burning through their data plan in the process).

5. Standardization on a single mobile platform can significantly reduce development timelines.

This will help reduce complexity and allow you to dip your toe into the water with less up-front investment.  Fewer permutations of screen size, operating systems, and wireless carriers will reduce the time needed to deploy your solutions (little secret: this is one of the reasons that vendors originally delivered on Apple devices – you could predict how your software would look to the user!).

Apple, with its line of iOS devices, has the best track record in this area.  They have smartly positioned with a very small number of screen size variants across the iOS hardware platform.  And several studies have also shown that historically, users of iOS devices generated a significant majority of all mobile device traffic over any other platform.  Therefore, I believe that an engaged user base and a streamlined development lifecycle will favor adoption of mobile analytics solutions that operate on iOS devices.

6. Support the rollout of cellular-enabled mobile devices throughout the enterprise for knowledge workers that can best leverage mobile analytics applications.

Yes, this makes each tablet more expensive, and yes, cellular data plans are not free.  But neither is the lost time fumbling for the client guest intranet login, or roaming the highway off ramps looking for a coffee shop with free WiFi.  The investment from all of this valuable data and analytics applications will be reduced if your knowledge-based workforce cannot connect where they work, live, and travel.

 

smartphones and tabletsLooking to the future, I believe we are now well positioned to generate competitive differentiation through mobile, BI-integrated, right-time analytical applications.  The growing maturity of mobile BI platforms, and their support for the little known-capability known as write back, has the potential to be a turning point for the field.  Write back in the context of BI gives the user both the ability to consume data through their visualization or BI application and to generate data that is put back into the database.  This is the next secret sauce.

The actual capability to do write backs has been around for a long time.  It’s even built into Excel and can be used with Microsoft Analysis Services OLAP cubes, if they are configured for this purpose.  It’s also a part of enterprise BI tools like MicroStrategy as well

This is a big deal, because now we can combine analytical data (and the processing power of real-time analytics engines) with information that is entered by a user, in context, on site, in the moment while they are working on a particular problem.

Let’s say you’re a medical supply sales representative, and you go on site to visit a hospital client.  Your mobile BI solution provides you with historical purchase patterns.  Then, you conduct an inventory check, inputting the data while you are standing in the supply room.  That information goes back to the database, and the purchasing models apply past history, seasonality, and metadata about trends at your other healthcare clients in that area (think: regional flu outbreak), generating a purchase forecast and preliminary order.  The solution also recommends a product change, from buying individual packages to large count bulk packs, which would save the customer $10,000 this year.  You review with the administrator, make a few adjustments, and you’re done.

This simple vignette may seem far off, but it isn’t a dream.  Right-time analytics applications can become critical competitive differentiators for current and future market leaders.   The complexity here is in gathering the data, understanding behavior, and building the analytical models that will help us optimize processes in our daily work.  It’s complex, but definitely within reach for those that are willing to invest the time and effort to see it through.

Let’s start with a little quiz:

Hadoop is

a)     Twitter shortcut for “I HAD it, but OOPS I lost it”?

b)     The latest dance song craze (Macarena, Gangnam, Hadoop)?

c)      A stuffed toy elephant?

d)     A software solution for distributed computing of large datasets?

The correct answers are actually c) and d).  You see, Hadoop is a software solution developed as part of the Apache project sponsored by the Apache Software Foundation, and it was named after a stuffed elephant owned by the son of the framework’s founder, Doug Cutting.

But what exactly is Hadoop and how does it work?

Per the Apache website, “Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.” This open source software framework enables the developer/user to manage large amounts of data (Big Data) using a distributed file system.  The power of Hadoop lies in its ability to leverage distributed clusters of computing hardware.  It does this by leveraging two key technologies.

The first is the Hadoop Distributed File System (HDFS); a distributed, scalable, and portable file system.  It is written in Java specifically for the Hadoop framework.  A key component of HDFS is the name-node.  This is a single server that tracks all the other nodes in the distributed client/server cluster.  In other words, the name-node is the directory of who all the distributed clients are and which files each contains.  As clients and files are added to the cluster, commands update the links to these new nodes in the name-node.

HadoopThe second key technology leveraged within a Hadoop implementation is that of MapReduce.  MapReduce is a programming model for processing large datasets.  It works by enabling a master node (the node assigned the processing request) to break apart the work request into smaller sub-tasks, and send the sub-tasks out to worker nodes.  This is the “Map” aspect as the master node is mapping out the workload to the worker nodes.  As each worker node completes the assigned sub-task, it ships the results back to the master node.  The master node then takes all the worker node results and combines them into one result set; thereby completing the assigned request.  This is the “Reduce” aspect.

 

It is important to note that for very large or complex requests, worker nodes can also MapReduce their assigned tasks into smaller sub-tasks for their worker nodes.  You could refer to this as Big Data outsourcing.  As each node determines another node is better equipped to handle a portion of an assigned request, it relegates the work to a more efficient worker node, while retaining responsibility for getting the completed assignment back to the master node.

Sources:

Webopedia http://www.webopedia.com/TERM/H/hadoop.htmlWikipedia http://en.wikipedia.org/wiki/MapReduceHadoop Wiki http://wiki.apache.org/hadoop/

To learn more about Hadoop solutions contact LÛCRUM today.