Love it or hate it, the madness is upon us. Every March, the country gets a healthy serving (or three) of College Basketball. Each year, approximately 40 million people fill out brackets for the NCAA Men’s Basketball Tournament and each year, every single one of those people swears that they picked everything perfectly. If you were about to Google, “What are the odds of completing a perfect bracket?” I will save you the trouble; it is 1 in 9.2 Quintillion. If you were about to Google, “What on Earth is a Quintillion?” the answer is a 1 with 18 zeros behind it. To put this in perspective, the odds of winning the Powerball are 1 in 175 Million. You have a better chance of winning the Powerball multiple times than picking that bracket correctly.
These however are just numbers, I began to wonder how I can slice and dice tournament history data. Sure, I can find what teams have won the most or lost the most. But can I dive even further, and find out what states, cities, or teams have the most wins or championships? Which teams constantly underperform and which teams exceed expectations?
Using a data dump of NCAA Tournament History from 1939 to 2012 I was able to dive in very quickly and start seeing results. I first wanted to see which states produced the most tournament victories. Using Tableau I was able to visualize what the Top Ten states were in terms of victories.
Using a filled map, I was able to visualize the amount of wins for the top ten states. North Carolina and California are the top two states, no doubt fueled by the powerhouse schools of North Carolina, Duke, and UCLA. I wanted to go even further and see which cities brought the championships home for their respective states. To create this visualization I used a dual axis map combining my filled map with a symbol map.
Using this visualization you can see which cities allowed the states to appear on my first map. Los Angeles and Lexington are homes to schools that have brought home the most national championships. Instead of using strictly numbers and labels, I was able to represent their success using a “Circle” Symbol. The bigger the symbol the more championships achieved.
I have a clear picture of what teams succeed, but how can I find out which teams succeed… Or don’t, when they are supposed to. To do this, I needed to find out how many upsets occurred over the years. Using the teams designated seeds at the beginning of the tournament I was able to determine every upset in tournament history. I took this data and created visualizations for teams that get upset, and teams that create the upset.
I was able to utilize a stacked bar chart to visualize when teams were a higher seed if they were upset more often than not, and vice versa, if they were a lesser seed were they prone to upset their competitor. The stacked bar also helped to show that while teams like Duke and North Carolina were upset the most, it was because they had the most opportunities to become upset. The data above shows that Kansas is an overachieving team. 34 times out of 49 possibilities they upset their opponent in the tournament.
History shows that our top performing states are North Carolina, California, and Kentucky. The cities that make those states successful are Lexington, Los Angeles, Chapel Hill, and Durham. We can also see that teams such as Brigham Young, Pennsylvania and Utah State have a habit of underperforming in the tournament. While teams such as Florida, Duke, and North Carolina, tend to over perform when they are the underdog.
March Madness is an event loved by many, and the benefits of visualization allow me to recognize these findings very quickly. Imagine this type of data at your fingertips when you are filling out your bracket. I certainly wish I would have used it to my advantage. Now, imagine these types of visualizations fueled by your company’s data. Replace the “wins” data with company revenue data. You would be able to identify where you are successful, and then go further down to see what cities are producing that success. This allows a quick look at your business. Use sales leads data to fuel your stacked bar charts. See which of your offices is receiving/submitting leads and see how well they are closing them. Data is powerful, but using visualization tools makes data meaningful.