Posts by: John Panzeca

Wikipedia defines Forecasting as the process of making statements about events whose actual outcomes (typically) have not yet been observed.

Examples of forecasting would be predicting weather events, forecasting sales for a particular time period or predicting the outcome of a sporting event before it is played.

 Wikipedia defines Predictive Analytics as an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.

Examples of predictive analytics would be determining customer behavior, identifying patients that are at risk for certain medical conditions or identifying fraudulent behavior.

Based on these definitions, forecasting and predictive analytics seem to be very similar…but are they? Let’s break it down.

Both forecasting and predictive analytics are concerned with predicting something in the future, something that has not yet happened. However, forecasting is more concerned with predicting future events whereas predictive analytics is concerned with predicting future trends and/or behaviors.

So, from a business perspective, forecasting would be used to determine how much of a material to buy and keep on stock based on projected sales numbers. Predictive analytics would be used to determine customer behavior like what and when are they likely to buy, how much do they spend when they do buy, and when they buy one product what else do they buy (also known as basket analysis).

Predictive analytics can be used to drive sales promotions targeting certain customers based on the information we know about their buying behavior. Likewise, the information obtained from predictive analytics can be used to influence sales projections and forecasting models.

Both, predictive analytics and forecasting, use data to achieve their purposes. But, it’s how they use that data that is much different.

In forecasting, data is used to look at past performance to determine future outcomes. For instance, how much did we sell last month or how much did we sell last year at this time of year. In predictive analytics, we are looking for new trends, things that are occurring now and in the future that will affect our future business. It is more forward looking and proactive.

So, although forecasting and predictive analytics are similar and closely related to one another, they are two distinctively different concepts. In order to be successful at either one, you have to have the right resources and tools in place to be able to extract, transform and present the data in a timely manner and in a meaningful way.

A common problem in business today is people spend much more time preparing and presenting information than they do actually determining what the data is telling them about their business. This is because they don’t have the right resources and tools in place.

 

At LÛCRUM we have the resources, strategies and tools to help businesses access, manage, transform and present their data in a meaningful way. If you would like to learn more about how LÛCRUM can help your business, visit our website or contact us today.

I was interested to find out how the maps visualization works in Tableau so I decided to give it a try. The first thing I needed was some interesting data that I could download and visualize. I found my data at http://earthquake.usgs.gov/earthquakes/catalogs/. The data was about earthquakes around the globe for a one week period. I began by downloading this data and saving it to an Excel spreadsheet. After connecting to this Excel spreadsheet in Tableau, I began dragging the data into the visualization pane. To my amazement, Tableau immediately recognized that the data I was working with was geocoded data – meaning it contained geographic coordinates (latitude and longitude) – and it chose a map as my visualization without me specifying that’s what I wanted. Pretty cool! But wait a minute, why was there only one mark on my map smack in the middle of Utah?

Looking into this a little closer I noticed that on the columns and rows shelves the latitude and longitude coordinates were being aggregated. This is the default behavior for Tableau. Tableau was showing me the average of all the latitudes and longitudes in my data. I needed to add a lower level of detail to see all of the marks on my map. After dragging the DateTime field to the Level of Detail shelf things began looking better.

Now there were many marks all over the map. Wow! I had no idea this many earthquakes occurred in just a one week period. I was curious to find out just how many earthquakes the marks on my map represented because it appeared that they overlapped and were not all visible. I determined the best way to figure this out would be to look at the data at the detail level using the Text Table visualization. I added a measure called “Count” which I set to a value of 1 for each row of data. I added a “Grand Total” to my Text Table visualization so I could see the total number of earthquakes in my data. While dragging the latitude and longitude to the Text Table visualization I realized that since these fields were numeric, Tableau was treating them as measures and aggregating them. Even though aggregating these coordinates did not cause an issue at this level of detail, I decided it was best to treat them as dimensions since that is what the truly are. In order to make them dimensions I simply dragged them from the Measures pane to the Dimensions pane.

I was shocked to find that the total number of earthquakes during this week long period of time was 865! Certainly they could not all be earthquakes of a high magnitude since you rarely hear about earthquakes in the news. I continued my analysis by analyzing the magnitude and depth of these earthquakes. I went back to my map visualization and added some formatting to help with this.

I dragged and dropped the Depth measure to the Color shelf so that the depth of the earthquakes would determine the shade of color shown on the map. Similarly, I placed the Magnitude measure on the Size shelf so that the earthquakes of higher magnitude would show as larger marks on the map. I also changed the color of my marks from blue to red just because I thought this was more visually appealing.

This was helpful but I still could not see all the earthquakes because the marks on my map were overlapping. I decided to use a technique I learned and mentioned in my blog Local Warming? in which you use the Pages shelf to break down the visualization by a dimension such as date or time so you can view each page one at a time in succession. This provides a historical view of the data. I placed the DateTime field in the Pages shelf to accomplish this. Now I could see all the earthquakes one day at a time as they occurred over the week long period.

Now I could see all the earthquakes but I still was not getting a clear enough picture of the depth and magnitude of the earthquakes. I decided to create a scatter plot diagram which plotted these two measures.

 

Now I could clearly see where the majority of the earthquakes fell in relation to depth and magnitude. By selecting the large clump of points near the bottom left corner of the scatter plot chart I saw that 786 of the 865 total earthquakes were in this area. This meant these earthquakes all had low depth and magnitude. This is what I would have expected.

I also wondered about the correlation between magnitude and depth. In other words, did high magnitude equal high depth and low magnitude equal low depth? So I created an area chart that plotted magnitude and depth over time. The chart clearly shows that there is a correlation between these two measures. In most or all cases high magnitude was coupled with high depth and low magnitude with low depth. While hovering over the colored area of the chart with my mouse, and moving my mouse from left to right, the points for each measure were visually moving up and down in the same pattern.

After listening to a weekly Tableau learning series put on by Ross Perez in which he spoke about adding actions to a visualization I decided to try it for myself. I created a dashboard which included a few of the visualizations I had previously created.

Now by adding filter actions to each of the visualizations on my dashboard, I could select any item in any visualization and the other visualizations would automatically be filtered as well. This was extremely helpful in isolating certain data points or groups of data points and seeing them in 3 different ways. For instance, by selecting the data point in the top right hand corner of the Graph view, I could immediately see where this earthquake occurred on the map as well as the data associated with this earthquake in the Data view.

Or, by selecting the Alaskan Peninsula region in the Data section of the dashboard, I could automatically see on the map and the scatter plot chart the data associated with this region.

So once again in a relatively short period of time I was able to create some compelling visualizations that helped me better understand the data I was working with. These visualizations can easily be shared with others as well. No matter what type of data you are working with, visualizations such as these are truly powerful and provide further insight and understanding. They help us to answer questions that are important to us and cause us to ask additional questions that can be answered quickly and easily and provide even deeper understanding about the topic we are interested in.  They say a picture is worth a thousand words and in this case it is true.

You can view my earthquake visualizations at…

http://public.tableausoftware.com/views/EarthquakesVisualized/Dashboard?:embed=y

 

To find out more on Visualization or how LÛCRUM can help your business by making your data meaningful contact us.

Tagged with:
 

I was having a conversation with some LÛCRUM colleagues recently and the topic of warmer temperatures in Cincinnati came up.  My two friends believed that the temperatures we are experiencing lately are warmer than the temperatures we experienced earlier in our lives.  I wasn’t so sure, so I decided to find out for myself.  Once again this seemed like a great opportunity to get some additional experience with Tableau and have something meaningful at the end of the learning process.  See my blog Tableau & Fantasy Baseball where I spoke about making the learning process more fun and interesting by working on a practical application that is useful in the end.

First I began looking for a site that contained the data I would need to do my analysis.  I found a site that had Average Daily Temperatures for a number of cities from 1995 to 2012.  I downloaded this data for Cincinnati and loaded it into an Excel spreadsheet.  After connecting to this data in Tableau, I was off and running.

One of the first things I noticed as I began dragging my data into my visualizations was there were some years where the average temperature was much lower than the rest of the years.  After doing some further analysis, I determined this was due to a handful of average daily temperatures that were very low, negative 99 degrees to be exact.  Obviously this was a data issue because I’m pretty sure I would remember a day that cold, especially in August!  I decided that rather than skew the data with such extremely low temperatures, I would just NULL (blank) these temperatures out in the data.

 

 

 

 

 

 

 

The next thing I noticed was the maximum temperature in my data was 87.70 degrees.  I knew we had had hotter days than that over the last 17 years.  I began to think about what average daily temperature really means and how it would be calculated.  I realized it wasn’t just the high temperature for that day and that it must be an average of temperature readings taken throughout the day.  But how many readings would need to be taken to arrive at this calculation?  A quick internet search on How to Calculate Average Daily Temperature led me to the answer…24.

I began to wonder if average daily temperature was an accurate way of confirming what my colleagues believed to be true.  My thinking was since average daily temperature takes into account hourly temperatures for a 24 hour period, it may be an accurate way of determining whether overall temperatures are truly increasing over time.  However, it may not be an accurate measure of how a particular person would experience these temperatures since average daily temperature factors in the low temperatures throughout the night when most people are asleep.  For example, if a particular day was extremely hot during the day but cooled down considerably during the night this would affect the average daily temperature but not affect how an individual experienced the temperature during the day.  An individual would still consider that day to be extremely hot even though the average daily temperature was lowered by the night time temperatures.  So despite the errors in the data and my uncertainty about whether the data would answer the question I wanted to answer, I decided to continue with my analysis.

The first visualization I created was a line graph showing the average daily temperature per year for the months of June, July and August.  I also filtered out 1995 and 2012 data since I did not have full years of data for these years.  There did not appear to be a dramatic trend towards warmer temperatures over the past 15 years.

I began to wonder if possibly there were a string of hotter days during recent years that would lead one to believe that temperatures are truly hotter recently.  My next visualization was a heat map that showed the concentration of number of days at a certain temperature.  I created a calculation of average daily temperature multiplied by the number of days at this temperature and used this as my measure in the heat map.

 

Once again, I did not notice anything in my visualization that would lead me to believe that temperatures are dramatically warmer in recent years.

I then decided I wanted to see which years of the past 15 were actually the hottest.  I created a simple bar chart showing the average daily temperatures by year.

 

 

This was somewhat helpful but in order to make things a little clearer I created a calculated item called Recency.  This field would divide up the 15 years into 3 even sections Most Recent, Recent and Least Recent.  I then created another simple bar chart to show the average daily temperature by Recency.  Now I was getting somewhere.

 

 

This visualization clearly shows that the average daily temperature in Cincinnati has gone up about 1 degree for each 5 year period over the last 15 years.

Now for some real fun!  I wanted to try the animation functionality in Tableau.  This functionality will allow you to animate trends over time by pressing a Play button.

 

I created a scatter plot chart that showed Average Daily Temperature and Number of Days.  By dragging the Year dimension to the Page shelf I was able to animate the changes over time by pressing the Play button.

 

 

 

Another neat feature I discovered was the Show Caption feature which describes the visualization (worksheet) and can be displayed as part of the visualization.  Also, you can use the Describe Sheet option to give a more extensive description of the worksheet.  This description has helpful information about the various properties of the worksheet.

 

 

So I was able to learn more about how to use Tableau to explore and visualize data in a way that brings insight and understanding.  Analyzing the data in this way is a very logical approach because it follows your train of thought.  Each visualization sparks thoughts or questions that can be explored further in the next visualization.  All of your visualizations can be saved as worksheets that can be updated, re-used and easily re-ordered to show a thought progression, leading to a conclusion, about the data you are analyzing.  This type of analysis is useful and valuable no matter what kind of data you are analyzing and can lead to better decision making.  Better decision making in business is vitally important to operating your business efficiently and gaining a competitive advantage in the market place.

You can view my visualizations at…
Local Warming?

Tagged with:
 

Recently, I read some analysis from Eric Duell, LÛCRUM’s Client Services Partner, around data discovery tools. His analysis concluded that Tableau is one of the top data discovery tools available today. I decided it was time for me to learn this tool. Whenever I learn a new tool, I like to combine my learning with some practical application that will be useful to me or someone else. This makes the learning more interesting and fun and actually gives me something I can use at the end. In this instance, I decided to use Major League Baseball data to help me gain a competitive advantage in my fantasy baseball league. At the time of writing this, the baseball season was at the half-way point, also known as the “All-Star Break.” I was in first place for most of the first half of the season, but on the last day of the first half of the season I dropped into second place…definitely time for action!

I analyzed where my specific needs were if I really wanted to win this thing…and I do!  It was clear I was in need of more stolen bases and home runs.  I first looked for a reliable website where I could download some data for free.  After successfully finding a site, FANGRAPHS and exporting the data to an Excel spreadsheet, I was well on my way to a championship!

I was easily able to connect to the data in Tableau and began the data discovery process.  After getting used to the tool and trying out a few different types of graphs, I found one that would propel me to greatness in the world of fantasy baseball…the scatter plot chart.

Remember, I need stolen bases and home runs…especially stolen bases.  I decided I wanted to find one player who excelled at both.  My scatter plot chart showed me clearly that the rookie Mike Trout was just the player I was fishing for…pun intended.

 

                 SB+HR Analysis 2012

He had 26 stolen bases and 12 homeruns for the year.  Hmmm…but what about recently?  I quickly created another scatter plot chart for the last 30 days.  I found he had performed increasingly better than the competition during this time period.  He had 15 stolen bases and 7 homeruns in the last 30 days.

           SB+HR Analysis Last 30 Days

I knew now that I needed this guy on my team.  The problem was he belonged to the team that just passed me up in the standings.  If I wanted him I would need to trade for him.  After doing some additional analysis, I decided I would offer him Melky Cabrera (AKA “The Milk Man”) in exchange for Mike Trout.  Only time will tell if this trade will be accepted or not.

I have used other tools like Tableau in the past but found that Tableau was easier and more intuitive to use.  I was able to learn the tool quickly and create about 15 visualizations in a just a few hours.  Since I was able to connect to the Excel data directly, I will be able to update my Excel spreadsheets later in the season and re-use the graphs I have created.  Now that I am comfortable with the tool, developing new visualizations will go much quicker.  The graphs are visually appealing and flexible.  The Undo button is extremely helpful when making changes.  I love the fact that all the graphs I created can be shown as worksheets (tabs) like in Excel or as a filmstrip.

I created a dashboard quickly and easily by dragging and dropping a few worksheets on to the dashboard pane.

 

I even published the workbook I created on the Tableau Public web site.

So, in just a few short hours, I learned a great new tool and improved my chances of winning my fantasy baseball league.   If Tableau can help me be successful in fantasy baseball, just think how it could help businesses be more successful.  The ability to access data and visually represent it in a way that makes it more meaningful is truly a valuable thing.

                                                           View my visualizations at
                                                              MLB Analysis – 1st Half

 

Tagged with: