Wednesday, February 11, 2009

On obscuring information in a chart

Greg Mankiw's blog post, Comparing Recessions, linked to another post, Comparing this recession to the last five, that has a line chart displaying how payroll unemployment fell during the previous five recessions.

The good about the chart is that it makes a direct comparison between the recessions possible because percentages are used instead of absolute figures.

By plotting on the X-axis the length in months instead of using a regular time axis it is possible to have all the series start from the same point, which makes it visually easier to do a comparison of both the steepness of the job losses as well as the length of each recession.

But, this chart has at least a few weaknesses too... and there are ways in someone could improve this, especially visualization vendors.

Firstly, you want to be able to see clearly the job losses that started in Dec 2007. That series is the light blue colored line. Difficult to make out, right? You have to search for it... It is also overwritten in places by the lines of the other series where they overlap.

How can we improve this chart? Several possibilities come to mind.
Before that, I should say I am using some of the best practices in effective information visualization that I read in Information Dashboard Design: The Effective Visual Communication of Data, that I reviewed last month, and also several other books.

I do not have the raw data on which this chart is based, so I used a basic screen capture tool and its flood fill feature to tweak the same chart. Doesn't look pretty, but does get the point across IMO.

Firstly, let's use unsaturated colors for the other series. And a bright saturated color like red for the Dec 2007 series. So now you can see the red series, corresponding to Dec 07, very clearly. It stands out instead of getting obscured by other equally bright colors screaming for attention.

Secondly, we could use a thicker line for the 'Dec 07' series than the rest.

Thirdly, if there are more than five or six series, the chart will quickly start to look very crowded, nor will the series' colors appear different from each other. So, what you could, should, do is use different line styles in addition to the colors. Dotted, dashed and other styles.
Take a look at the example below, that uses random data (yes - it is random, because I used Excel's randbetween() function. Okay, so it is not truly random, but pseudo-random, if you have to nitpick). Even though two series, 1974 and 1980 have the exact same series color, you can tell them apart, right? That's because they use a different type of line style.

Now, if you wanted to put in a little more effort, and thougt, some more ideas spring to mind (well... the phrase really ought to be 'spring from the mind' since they are originating from the mind and not some etherealy ether or orifice).

1. Series dimming / series highlighting / series unselection
If I am interested in, say, the 1991 series, I should be able to click on the series label in the legend, and have the other series dimmed, or entirely hidden, in addition to optionally highlighting the 1991 series.

2. Animation for a selected series.
Dancing monkey animations are not good. However, animation, if well done, helps convey change. As long as it is not too fast as to have the user miss the animation completely, or so slow as to frustrate the heck out of the user, and as long as animation focuses on one thing at a time, it can work well. So what kind of animation am I talking about? Well, one is to be able to animate one series at a time.