Posts by: Jeff Shaffer

It is always best to avoid rotated text when creating data visualizations, yet this seems to be one of the most common problems I see.  This might be due to the fact that tools like Microsoft Excel rotate axis labels automatically in many situations and people don’t make any adjustments to these defaults.  Even Tableau, which generally has better practices built into the defaults, rotates axis labels in many situations.  In fact, Tableau doesn’t even allow the user to rotate the y-axis title and requires a work around to show the y-axis title horizontally.

The most common problem with rotated text is in the x-axis labels.  Often times, it’s simply the length of the labels that force the software to rotate the text.  Consider these bar charts below:

As I tell my data visualization students, the language of bar charts speaks both vertically and horizontally. In this case the easiest thing to do is to rotate the entire chart.  By doing so the axis labels can be read easily, without tilting your head, and the bars have the same function as they did when they were vertical.

The only exception to rotating a chart from vertical bars to horizontal bars is when dealing with time series data.  Time series data is always best on the x-axis (and typically in a line chart). Therefore, do not rotate or reorder time series data.

If you feel you must use vertical bars for some reason then consider other options to avoid rotating text (see the article Exploring All of Your Options for more detail on Time Series data and other options).

Here’s an alternative in this particular example.

I hope this information will help you avoid rotating text on your data visualizations.

Excerpt:

It is true about blogs and books suggesting line charts for time series data.  In fact, when teaching data visualization at the University of Cincinnati I always reinforce to my students that time series data is best as a line chart.  This is because we, as readers, typically understand time when plotted on the x-axis and we typically want to see a trend over time.  This is the biggest advantage of a line chart as it shows trend over time better than any other chart type.

Click here to read Exploring All of Your Options: Data Visualization

 

ABOUT THE AUTHOR:

Jeff ShafferJeffrey A. Shaffer is the Vice President of Information Technology and Analytics at Unifund. Mr. Shaffer joined Unifund in 1996 and has been instrumental in the creation and development of the complex systems, analytics and business intelligence platform at Unifund. Mr. Shaffer holds a BM and MM degree from the University of Cincinnati and an MBA from Xavier University where he was the winner of the 2006 Graduate Student Scholarly Project in Research. Mr. Shaffer has attended the Harvard Business School’s Executive Education Program, is a Certified Manager of Quality and Organizational Excellence through the American Society for Quality, a Certified Project Management Professional through the Project Management Institute and has completed Six Sigma Green Belt and Black Belt training with the Xavier Consulting Group. Mr. Shaffer is also Adjunct Assistant Professor at the University of Cincinnati in the Carl H. Lindner College of Business teaching Data Visualization in the Graduate Course series for Data Analytics. He is also a regular speaker at business intelligence conferences and symposiums on the topic of data visualization, writes for the data visualization blog at MakingDataMeaningful.com for LÛCRUM, Inc. and was a finalist in the 2011 Tableau Interactive Visualization Competition.

The NBA playoffs are quickly coming to an end. The other night the Miami Heat tied the series at 3-3 with the Boston Celtics to bring it to a game 7. The winner will go on to play the Oklahoma City Thunder. Recently we finished our first Data Visualization class at the University of Cincinnati and the course was a great success. We had 45 students work very hard in a short period of time and I was very impressed as I watched them progress, learning the basics of data visualization, moving to interactive visualizations and ultimately creating their own Tableau Viz for their final project.

The visualization below was created by 3 students at the University of Cincinnati, Dilip Kotlapati, Swetha Vemuri and Sanghavi Iyer. The students did not have much exposure to Tableau prior to taking this class and they created this visualization in just a few weeks. With their permission, I’ve adapted their Tableau Viz into Tableau Public so that I could publish it out.

The “Game Summary” tab embeds an Excel chart that they created and some images, team logos and scores to create a nice summary page of the game. The “Game Details” tab uses an image of a basketball court and the students plotted the game on the court with x/y coordinates and color coding the various types of shots, misses and assists. They made it interactive, one of the requirements for the final project, based on selecting players, teams and quarters. While there are certainly ways to improve this visualization, it was great to see what the students learned and were able to accomplish in such a short period of time.

 

One of the great tools for creating reports and dashboards is the bullet graph.  This type of graph was invented by one of the great experts in the data visualization field, Stephen Few.  If you are not familiar with his work then you should check it out at PerceptualEdge.com.  Bullet graphs are great tools for showing lots of information in a small amount of space.  They allow easy comparison between an actual measure and a target, for example actual revenue results to forecast revenue.  They are very useful for any performance measure against a target.

An easy way to show actuals to a target is a simple bar chart with the addition of a target line.  I would highly recommend using these, but this type of chart is really only useful if the target measure is identical across all performance measures.

Here’s an example of a simple bar chart with a target line.

The above example is a great way to show categories against a single target or average.  You can see other examples on our Blog (The Bureau of Labor Statistics Creates and Excellent Graph and Make Category Comparisons Much Easier with these redesigns).  These are all very well done target lines measuring the actual to some target.  However, in the above example it may not be reasonable to ever sell 16 units of oranges.  Let’s say for example the realistic target for oranges is only 9.  In this case the sales team for oranges may actually be superstars and we’re showing them as missing the target by more than any other category.  This is where the bullet graph comes in. The advantage of a bullet graph is that you can achieve the same visualization but at the same time it allows you to have different targets for the different performance measures.

For today I won’t go into the details on how to make them or how to read them.  There is an excellent description and step-by-step instructions on how to build these wonderful graphs at http://www.exceluser.com/explore/bullet.htm.  Instead, I want to provide an alternative improvement that might be useful to users in certain instances.

Typically the bullet graph is shown in one of two ways, values or percent.  Both of these measures can be useful depending on what result is being measured and compared.

Here’s an example of a bullet graph using actual values (using Microsoft Excel). 

Here’s an example of a bullet graph using a percent as the measure (Created using Microsoft Excel and a different data set).

In both cases you have an actual, the thin, dark blue bar.  That is then measured against a target, which is the dark blue vertical reference line.  The three bands of color in the bar are used as tolerance bands. I have found that in some cases you may want to be able to visualize the actual and target as both the value and the percent.  By adding a dual axis you get the benefit of both (Created using Tableau).

One additional note on bullet graphs. I have encountered two issues with using these graphs on business reports.

First, these graphs are not easy to make using the standard tools.  They are not native graph types in Microsoft Excel, so it requires a good bit of Excel skill to create these and requires a dataset for each graph, specifying the various components necessary to build the graph.  In other words, the person building the graph will not be able to simply highlight the data and create a bullet graph.  Tableau is one tool that offers the bullet graphs natively, but this also requires a bit of understanding of Tableau in order to create them.  However, Tableau does allow the user to create them without creating little data sets for each graph.  There is also an issue with creating a dual-axis bullet in Tableau that they are aware of.  This is outlined in the details of How to Create a Dual Axis Bullet Graph below.

More importantly, there is often confusion around the bullet graphs, probably because they aren’t seen or used as much as other graphs that users are used to seeing.  There is a learning curve to understanding what the graph means, how the actual vs. target is represented and more often what the color bands represent and how to interrupt it all.  However, in my experience, once the users understand them they seem to adopt them relatively easy. This is probably true for any new graph type that has been created over the years.

In my opinion, neither of these issues are insurmountable.  There are enough ways to create these graphs, either by using templates, add-ins, or different software platforms, and once you build a few of them it will get easier and easier.  As it relates to teaching people how to read them, I think the benefit of taking the extra time to educate the user on what they are looking at outweighs the complexity of the graph, both in the creation of it and the user’s comprehension of it.  It may not be a quick understanding, but in this case I think it’s worth the trouble.

However, if you are creating a report that will be read by many and there is no way to explain the graph then it may be best to choose a different way of presenting the data. As an example, in some cases the bands are not necessary to tell the story.  If the goal is to simply show Actual vs. Target then having a single color bar may work just fine (or maybe no shading at all).  In this case it’s simply a bar graph with a reference line for the target. Add a label or two and anyone should be able to decipher this.

Special thanks to Kristofer Still for creating the dual axis bullet graph in Tableau and detailing the instructions below.

How to Create a Dual Axis Bullet Graph in Tableau

You need to start with a set of data with actuals and a target.  In this example the target is a monthly sales quota of widgets and we compare this with sales to date.

First you create a calculated field for the ratio of sales to date to your goal.  This creates your percentage for your first axis. (You could just as easily calculate the percentage as a field in your data source and this might be preferable if you are creating multiple bullets for many targets.)

From there the easiest way to get a bullet graph in Tableau is to use the show me menu.  To do this select your sales to date and sales goal fields and select bullet graph from the show me menu.  You now have a single axis bullet.

Now we have to do some trickery to get Tableau to give us the second axis.

First you drop the percentage field to the Columns tray.  This will create two side by side bar charts.

Next you drag the percent axis and drop it on top of your bullet graph.

Now that we have the dual-axes there are some additional formatting steps to clean things up:

You want to change the color palette to one of the single gradient palettes and make the percent of goal measure a darker shade and sales goal a lighter shade.

You should also change the formatting of the top axis to percentage.

There is also a bug in Tableau that you will notice.  The reference line for the goal of 10,000 units doesn’t cross at 100% of your top axis.  Tableau provided me with a fix for this.  If you set the axis ranges proportional to one another then they will line up.

So in this case if you extend the upper end of the range of the percentage axis to 1.05 this will cause things to line up.  Be careful, though, this axis is now fixed and won’t automatically update if you place the bullet on a dashboard.  This is less than ideal, but unless you have wild swings in the magnitude of your measures you should be fine.

Your final bullet will look something like this:

The most criticized tool in Data Visualization is the pie chart. There are many areas of debate in the world of Data Visualization, but there is little debate among the experts about the pie chart. The number one rule about pie charts is “Don’t Use Pie Charts”. Personally, I’m not offended by them. I understand that it has been the tool of choice and that it has become ingrained into society and business. However, I am in complete support of the expert opinions. Pie charts are deficient in displaying and comparing data. There are a few acceptable uses for them, but in most cases a simple bar chart would be a better tool overall and provide a much better visual comparison.

I have heard people argue that pie charts take up less space or that they are easier to understand, but even these arguments are not valid. There are just too many fundamental problems with pie charts and this is why I advocate that they should not be used. Let’s examine a very simple data set and compare. Here is a table of The Twelve Days of Christmas.

 

Below is a pie chart of the Twelve Days of Christmas and basically the default view from Excel. To help this visual I’ve followed a common rule of pie charts which is to start at noon and move clockwise from the largest to smallest. The other common practice, as described by Dona Wong in The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures, is to place the largest slice at noon and the second largest slice to the left of noon and then clockwise with the remaining largest to smaller. I find this practice to be even more confusing, unless the last category is “Other” or “Misc.” and therefore an aggregation of the remaining smaller categories. Also, I added the data to the legend and resized it as large as reasonably possible to make the text readable.

Note the following problems with the pie chart:
• To visually compare the reader must go back and forth from the pie chart to the legend to determine which present matches which color. It would be impossible to list the labels within each slice because the text would be too long. Another popular option is to create lines from the pie chart pointing to each label and place the labels around the pie chart. This creates a very busy chart and clutters the chart with extra lines.
• The use of many different colors is required to create a categorical comparison color scheme. This makes it difficult to see the difference in colors from the shades of blue, red and purple.
• The comparison between the categories is very difficult. The eye cannot easily discern between the size of the “Drummers Drumming” and the “Pipers Piping”. This is because the size of the pie slice is not easily calculated.
• The beginning of one category starts at the end of the previous category. This means that you cannot compare multiple categories from the same baseline, because the baseline shifts from one category to the next.
• Finally, to generate a pie chart it is necessary to calculate the percentage of the categories, after all a pie chart is by nature showing 100% and not 78 total gifts. This may be done manually, but that is not necessary as the software used to create the pie chart will do this automatically (these example charts were built in Microsoft Excel). Now in some cases a percentage might be the correct measure, but in other cases the values may be more appropriate. Below are the calculated fields for what the pie chart is actually showing.

There is nothing wrong mathematically with the pie chart. There are twice as many Geese a Laying then there are French Hens and three times as many Ladies Dancing and French Hens. However, the comparison between these is exactly the point. The pie chart does not make it easy to tell that comparison. It’s hard enough to tell which slice is bigger. It would be impossible to discern twice as much or three times as much.
Here is the same data graphed using a simple bar chart.

This chart solves all of the problems mentioned above.
• Comparisons are made easily from one category to the other because the baseline is now the same for each category. Turtle Doves is clearly twice as many as the Partridge in the Pear Tree. There is no question if there are more Ladies Dancing or Maids a Milking.
• Color is easily managed. There is no color requirement to discern between categories. In fact, this graph could be done in gray scale and printed on a black and white printer or copy machine and it would still be usable.
• The axis labels are now adjacent to the data and the bar. This allows for a very compact chart and is easy to read.
• Finally, unless the pie chart is shrunk to a tiny graphic, for example as a data layer on top of a map, then there is no real space savings. In fact, the bar chart takes up less room on the page and is more readable than the pie chart.

Hopefully this holiday example illustrates the problems associated with using pie charts and the better alternatives. Best wishes for a safe and happy holidays and please keep checking back for more on Data Visualization.

The Bureau of Labor Statistics (BLS) has published some really bad graphs and maps over the years.  Below is an example of a map they publish monthly for the “Unemployment rates by state”.  In this map they are attempting to have a sequential color scheme, going from light to dark to represent low to high unemployment rates, but because of poor color choices it has unintentionally become categorical.  Black, which is the highest rate, seems muted against the other colors.  The bright red, which is a middle value of 5%-5.9% unemployment, seems to dominate the map more than the darker red or purple color which is actually a higher rate.

However, the highlight for today is a refreshingly well done graph on the unemployment rate and median earnings when compared to education attained.

This graph is very well done.  Notice the following characteristics.

  • Simple bar graph used for comparison.  The choice of the bar graph allows the reader to easily compare the categories.
  • Consolidated labeling and diverging horizontal scale allows for combined axis labels in the middle.
  • There are no extra gridlines, no horizontal axis line, no axis scale and no border  around the chart (unfortunately the webpage coding added an unnecessary border on their website at http://www.bls.gov/emp/ep_chart_001.htm)
  • The data points are placed on the bars themselves providing addition information to the story.
  • The addition of a very clean reference line (in this case the average) gives additional context to the story and provides a context for each bar to be compared against.
  • Formatting is very clean. A single decimal place is consistent for the unemployment rate and the median income is not cluttered with decimal places, but includes a comma for thousands.
  • The use of color is simple.  Someone who is color blind may not be able to distinguish between the red and the green easily, but since the color is not crucial to the story nothing will be lost.
  • Great care was taken to have the negative statistic, in this case the unemployment rate, increase horizontally to the left, while the positive indicator of median weekly earnings increase to the right.

Notice that they utilized some of the same techniques that were discussed in the recent “Make Category Comparisons Much Easier with these Redesigns” post on the Making Data Meaningful blog.  Now some may argue the overall message of this graph, which is, higher education will lead directly to higher income.  This may or may not be the case; however, the BLS has done an excellent job at presenting this data. Congratulations to the Bureau of Labor Statistics for creating an excellent graph.

As it relates to the first unemployment map, simply changing the color scheme would solve the categorical color problem.  Here is the same map using a color blind friendly blue-orange diverging color scheme.  More importantly though, examine the difference in the emphasis on the orange and dark orange states and very little emphasis on Montana, Kansas, Louisiana and Virginia which were bright red in the original version.  

However, when using this diverging color scheme the blue still attracts attention to the low percentage states.  This kind of color contrast might work well for a political map, for example Republican vs. Democrat, but for a low to high scale this can be confusing.  A better version of this could be achieved by using a single color, light to dark, and removing a few of the bands, for example 5 or 6 bands instead of 8.  Here is the same map using the sequential color palette but only using orange and 6 bands.  This is similar to the original map, but avoids the purple and dark red being interpreted as categorical.

Here is the same map using only gray scale as the sequential coloring.

Another major issue with these charts though is the difference in scales from month to month and what appears to be an arbitrary grouping of states. Here’s a comparison of the legend for the map in December 2008 and October 2011. Notice the different scale for the two legends as well as the groupings within each color.

Compare the maps side by side.


In the December 2008 version the groupings start within a band of 0-1.9% and then move in 1% increments until the purple band which has 3%.  In the October 2011 the bands are grouped differently.  Below is a straight line band of 1% to outline the color difference.

By changing this color scheme it makes it impossible to have an apples-to-apples comparison from one time period to the next.  This is a shame because this type of map would make an excellent trellis charts to compare month by month or year over year.  Also, the color choice and band selection will have a dramatic impact on the visual story.  This inconsistency allows for the creator to manage the story. Hopefully, the future graphs of the BLS will continue to follow the good example.

Data Tables are great tools for adding context to a chart or graph for data visualization. As an example, in the redesign of the Hamilton County Auditor’s chart the addition of the data table at the bottom of the chart provides detailed data to augment their compelling story. See “Design Issues Distort a Compelling Story for the Hamilton County Auditor”. However, be careful when using these data tables in Excel and remember to always check your data!
Below are 2 columns of data. Column A has a date and column B has number of units.

 To create this simple line chart with this data and add a data table:
1.) Highlight cells A1:B5
2.) Click “Insert”, select “Line” and then select the first 2-D Line Chart option
3.) Click “Layout” under Chart Tools, select “Data Tables”, then select “Show Data Table”

By default the chart will look something like this:

After some chart clean up and reformatting, the chart looks like this:
Data labels added to showcase the discrepancy between the actual data and the data table values.

Excel automatically treats the dates in the proper order, from September to December. Also, the line chart and the data labels show correctly and the values are clearly decreasing at a rate of 100 units per month. However, the values in the data table are reversed. This is not an axis problem and this cannot be fixed by reversing the order on the axis. This is a critical error in the way Excel handles the data within the data table. The first cell, either at the top of vertical data or on the left of horizontal data, will be the first cell in the data of the data table, even if the date are automatically sorted by Excel.

The easiest solution for this is to simply reorder the data by date from oldest to most recent. Once the dates are resorted then the data in the data table will be correct.

In further testing, it appears this problem is only associated with Excel’s “Date” format. If Column A were just a 4 digit year and set to Excel’s “General” format instead of the “Date” format then the data in the data table would match the appropriate year. However, in that case both date and units would be in the reverse order and then the horizontal axis would need to be reversed.

The data table offered in Excel 2007 and 2010 is a terrific tool, but there are some issues to look for when using them. An important lesson is outlined here as well, always double check the output. Always approach the data on the assumption that there could be a problem with the data and never trust the data or any tool to do the work automatically and without error.

The 100% stacked bar chart is a useful chart when compressing lots of data into a small area.  However, it’s really only useful when the whole is more important as a comparison than the parts.  In this case, it is more important to compare the parts to one another and each part to the average, i.e. the “All” category.  There are some other things that make this chart very difficult to read.

  • The bars are so widely spaced that they create another set of bars in between them.
  • The gridlines are close together (at 10% interval) creating a moiré effect.
  • The 10% axis interval requires 10 labels to show an approximation for 7 data bars.
  • The smallest value of “% Good” is 86%, making all of the bars very high. Therefore the difference in the “% Rejected” and “unknown” categories are made much smaller.
  • There is no logical ranking to this chart.
  • One goal is to compare to “All”, but that category blends in with the others as the 2nd bar.
  • Labels on the x-axis are hard to read because they are vertical.

This version solves the ranking issues.

 

However, it still has the other issues as before.

  • The bars are so widely spaced that they create another set of bars in between them.
  • The gridlines are close together (at 10% interval) creating a moiré effect.
  • The 10% axis interval requires 10 labels to show an approximation for 7 data bars.
  • The smallest value of “% Good” is 86%, making all of the bars very high. Therefore the difference in the “% Rejected” and “unknown” categories are made much smaller.
  • One goal is to compare to “All”, but that category blends in with the others and is now in the middle of the chart.
  • Labels on the x-axis are hard to read because they are vertical.

Here is a better solution for presenting this data.

  • The x-axis and the gridlines have been removed completely.  There are only 6 data points now, so it’s better to just list the actual values at the point vs. listing multiple values on the axis and adding gridlines.  If this were 10 or more data points it would probably be better the other way, but this may not always be the case.
  • The focus of comparison is now on a single group, “% Rejected”.
  • Comparing to “All” is now easier as well with the addition of a reference line for that point.
  • Labels on the x-axis are now on the y-axis making them easy to see.  This is typically a better solution for long labels and a bar graph. This wouldn’t be possible with other data, ex. a line chart with time series data.
  • One improvement here might be a solid reference line vs. a dotted line, as it might be a cleaner, less busy look. However, by making it black and dotted it is now set it off from the y-axis.

Here’s a slight variation that allows the reader to have the data value immediately next to the vendor name.  This eliminates the need for the eye to go back and forth. Also, in this case the labels on the outside would also interfere with the reference line, specifically Vendor I and I-F, so manual adjustments would have been required to avoid this.

When combined together it creates a small multiple that allows the reader to see the data together and quickly compare the categories across outcomes.

This chart below is a fairly good presentation of the data.  However, there are a few things that would clean this chart up a bit.

  • There is no need for 4 decimal places on the y-axis when the data only goes to 1 decimal.
  • The x-axis repeats “month” over and over again and forces the text to angle.
  • “Liquidation Rates” is both the title of the chart and the label for the x-axis.  The label for the y-axis should be “month”.  Also, the title isn’t very descriptive.  What liquidation rate?
  • The square points are a bit heavy for the lines. Using smaller circles makes them less obtrusive and by adding a data point there is now a context for the highest point.

Also there is an interesting data question that comes up in this graph. Does the liquidation rate really go to zero in month 12? Or is this just bad or missing data? This is something to investigate.

These charts are attempting to tell a compelling story on how the auditor’s office has decreased staff consistently from 1990 through 2011 while Hamilton County increased staff until 2008 before finally cutting staff in 2010 due to budget issues.  However, these charts have some serious design issues that hide the data and do not allow the reader to understand the story in an easy manner.

 

Here are the primary issues with these charts.

  • In general, line charts should be used to show time series data.  Bar charts are good for comparing categorical information and allow the reader to easily compare, but line charts are a much better tool for graphing data over time.
  • 3D effects serve no purpose for this chart and should be avoided in general.  It makes it more difficult for the reader to understand the data and there is no data in this case graphed on a third axis.
  • The y-axis in both cases does not start at zero.  By doing this it distorts the comparisons that the bar chart makes.  For example, on the Auditor graph the 84 out of 192 employees equals 43.75%.  To show this properly the last bar with 82 employees should be nearly half the height of the first bar at 192.  This chart makes the 84 people look like only a handful of people because the y-axis begins at 75.  The height of the bars has thus become meaningless in comparing the information.
  • Each data point is labeled with the value as well as the axis scale is labeled.  The purpose of showing scale on an axis is to give a relative idea of the data points.  If each point is listed then there is no need for axis labels.  They serve no purpose.  In this case labeling 22 different data points also clutters the graph so labeling the scale on the axis labels might be a better choice.
  • The story of these graphs is the 56.5% reduction vs. the 7.5% reduction.  These data points are listed at the bottom of each graph in small font as if it’s a footnote to the graph.  The reader may not even glance in the bottom corners and could easily miss this information entirely.
  • The x-axis labels are aligned at 45 degrees to make them fit.  It is always best to avoid rotating too much text.  If labels can fit horizontally if it easier for the reader.
  • Gridlines can be helpful, but it is best to mute them and reduce the number of them so that they don’t create a moiré effect for the reader.
      This redesign solves these issues:
However, the purpose of this is a direct comparison of the Auditor’s office vs. Hamilton County.  By putting both graphs together on a dual axis and setting the scale appropriately it is much easier to compare the auditor’s office to Hamilton County.  Notice that in both cases the scales start at zero.  The secondary y-axis scale is set so that it lines up exactly with the start of the primary axis line.  Now that they begin at the same point and scaled to zero their values can be compared to each other as if they were a %.  Comparing the data on a dual axis with the original color coding makes the story much more apparent and visual.  The data points that were originally labeled are still available in a data table below the graph for reference, but the highlight of the story, 7.5% vs. 56.5% reduction in employees, has now been placed prominently on the graph.  The primary and secondary y-axis labels are rotated to conserve space which is not ideal, but this is also redundant information since it’s listed in the data table as well corresponding to the appropriate color of the line.  Finally, by adding a dotted reference line it is easier to see the number of employees in 2011 vs. the base line set in 1990.  It’s now clear that the number of Hamilton County employees did not reduce for a 20 year period, until 2010 (and then only slightly).
Sample charts from www.HamiltonCountyAuditor.org: