Abhinav Agarwal: junk viz

Showing posts with label junk viz. Show all posts

Tuesday, April 17, 2012

Lying with Charts - Google Finance

Here's a short post on visualizations and distortions, unintentional but still there.

There was an article on the web that remarked on the rather steep fall-off in Apple's stock over the past week or so. I went over to Google Finance to take a look. What I found was interesting. I took some screenshots and have added them to this blog post.

I wanted to find out how much the stock had actually fallen, which is easily done, and how much the line chart was portraying as the fall in the stock, also fairly easily done.

Let us do some math now. Simple math, the kind I like, the only kind I can probably do now.

First, let us calculate how much the stock has actually fallen. The Google Finance page on Apple, http://www.google.com/finance?q=aapl , tells us that on April 16, the stock closed at $580.13 - which we will round off to $580. Next, we find that its 52-week trading high was $644.

So, you can see that the stock has fallen $64 from its peak, which translates into a 10.2% fall from its peak (64/640).

So, the first number of significance is 10.2% - we will format it bold to make it noticeable.

Next, take a look at the first chart. Even with a linear scale, the problem is that the axis does NOT begin from zero. Notice the first number on the vertical axis is $420 - Google is using a broken axis, which is useful for highlighting the magnitude of changes, as in this graph, but misleading because of its very nature; it inaccurately magnifies increases and decreases. By how much? Let's calculate.

If you were to take a measure and see what is the height of the stock chart from the base to its maximum, i.e. $644, you would find it measures 4" from top to bottom - approximately.

Next, you measure the fall from the peak of $644 to the current trough of $580. It measures approximately 0.95".

So, in this chart, a peak of $644 equates to 4".

A drop of $64 measures 0.95"

Therefore, the chart plots the drop as a 23.75% drop as seen on the chart - we will bold it to make it noticeable.

There you have it - an actual drop of 10.2% looks, note, looks, like a 23.75% drop.

To put that in perspective, had the stock actually fallen by 23.75%, it would have sunk by $152. Yes, and it would have been trading at $492.

Apple Stock Graph in 2012

Even when you change the time-scale to 5 years, it does not help completely, because the vertical axis is STILL a broken axis. The inaccuracy as displayed on the chart is a lot less, but it is still there.

Apple Stock Price over 5 Years

It is only the 10-year plot that has a true, non-distorted picture of the stock. But because of the 10-year plot, the recent rise and steep 10% fall is not very visible. If you zoom only into the current year, 2012, then the distortions creep right back into the graph.

Apple Stock Price over 10 Years

What Google Finance needs to do is add an option, a checkbox, in their Settings panel to allow a user to select whether they want an unbroken axis or not - i.e., to let the charting engine plot a broken axis when it sees fit, or to always display an unbroken axis.

Saturday, June 18, 2011

Designing With The Mind In Mind

Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules

(Amazon IN, Kindle IN, Amazon US, Kindle US)

This is another excellent book to add to your shelf, after giving it a careful read, alongside other excellent books on information visualization.
I claim an interest in the field of data visualizations. And not just the Lego blocks in colorful arrangements type of visualizations, though that is not to gainsay their utility or ability to entertain. This interest in meaningful information and data visualizations goes back at least 8 years, to 2003, when I first started working as the product manager for visualizations in the Discoverer product from Oracle (sort of tautological - working for Oracle would presuppose that the products I worked on would be Oracle products too...). Starting in 2004 my interest in visualizations took a more detailed turn when I starting haranguing people about the utility of having interactive visualizations. Some of what I have written in my capacity as a product manager for data visualizations in Oracle BI since 2006 has made its way into the product, much more is making its way into the product, and there is much that will eventually, I hope, make its way into the product.

Therefore, it is but natural that I also have an interest in literature on data visualizations. To that end I have read some books and papers and blogs on the topic over the years, including Information Dashboard Design: The Effective Visual Communication of Data, The Visual Display of Quantitative Information, Envisioning Information, A Tour through the Visualization Zoo - ACM Queue, and more... There is always the humorous yet educational blog Junk Charts. Then there is the often acerbic yet valuable blog by Stephen Few (whose post first led me to this book), Visual Business Intelligence. And so on...

This year, the first book I have read on the topic, Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules by Jeff Johnson, is not really a book on data visualizations per-se. This book will not tell you the utility of a bar graph versus a line graph. It will not tell you what decorations to apply or not apply to graphs, whether 3D effects look good on a graph (they don't), what chart junk is (see Edward Tufte's books for that), etc... The author, Jeff Johnson, is an authority in this field, has been active in the field of HCI (Human Computer Interactions) for more than 30 years, and has worked at Xerox, Sun Microsystems, HP Labs, etc...

His latest work is more a book about the theory of how the mind perceives information, of how humans understand what they read, and how our eyes are attuned to paying attention to not just what's happening in front of us but also at the periphery of our vision. This is a "design" book - "Design rules often describe goals rather than actions. They are purposefully very general to make them broadly applicable, but that means that their exact meaning and their applicability to specific design situations is open to interpretation.". It is a book that informs us how some of the perceptual hard-wiring in our brains has evolved because of very sound reasons, and why information systems that tend to ignore or force their way against these perceptual conduits often fail. That you have more a vast proliferation of interfaces that are designed so as to violate these fundamental precepts of cognition is an indication of how far we still have to go in this field.

In every book on user interface design, whether specific or general, you will find the usual suspects - the Gestalt principles: Proximity, Similarity, Continuity, Closure, Symmetry, Figure/Ground, and Common. The author says these provide a useful basis for guidelines for graphic and user interface design.

... Several Gestalt principles describe our visual system’s tendency to resolve ambiguity or fill in missing data in such a way as to perceive whole objects. The first such principle, the principle of Continuity, states that our visual perception is biased to perceive continuous forms rather than disconnected segments.
Slider controls are a user-interface example of the Continuity principle. We see a slider as depicting a single range controlled by a handle that appears somewhere on the slider, not as two separate ranges separated by the handle.

A recommended practice, after designing a display, is to view it with each of the Gestalt principles in mind—Proximity, Similarity, Continuity, Closure, Symmetry, Figure/Ground, and Common Fate—to see if the design suggests any relationships between elements that you do not intend.

If we are to read and understand what's written, like instructions on a screen, or tooltips, or the like, then it stands to reason that the typeface be easy-to-read. But the author goes beyond that, deeper, into the roots of how we read and understand, and how, therefore, poorly designed interfaces can interrupt the process by which we understand what we understand.

In other words, the most efficient way to read is via context-free, bottom-up, feature-driven processes that are well learned to the point of being automatic. Context-driven reading is today considered mainly a backup method that, although it operates in parallel with feature-based reading, is only relevant when feature-driven reading is difficult or is insufficiently automatic.

... reading can be disrupted by hard-to-read scripts and typefaces. Bottom-up, context-free, automatic reading is based on recognition of letters and words from their visual features. Therefore, a typeface with difficult-to-recognize feature and shapes will be hard to read.

Visual noise in and around text can disrupt recognition of features, characters, and words and therefore drop reading out of automatic feature-based mode into a more conscious and context-based mode.

The same goes for colors too. Color patch size and separation for example are used by our visual system to make out one color from another.

Color patch size: The smaller or thinner objects are, the harder it is to distinguish their colors
Separation: The more separated color patches are, the more difficult it is to distinguish their colors...

Color patches in chart legends should be large to help people distinguish the colors

“change blindness” (Wikipedia link) is what sometimes causes people to not pay attention to not pay attention to a message of possible importance flashed by an application. Therefore, "Don’t require people to remember system status or what they have done, because their attention is focused on their primary goal and progress toward it."

There have been at least three books I have read this year that have ended up talking about the concept, history, and neurology of memory (Moonwalking with Einstein: The Art and Science of Remembering Everything, Talent Is Overrated: What Really Separates World-Class Performers from Everybody Else, and The Shallows: What the Internet Is Doing to Our Brains), and this book is a fourth one, if you add to the list books that cover only peripherally the topic of memory. In this book, the author describes the workings of memory, how they are formed, and what implications it has when it comes to aiding users in remember previously performed actions in a graphical user-interface.

memories, like perceptions, consist of patterns of activation of large sets of neurons. Related memories correspond to overlapping patterns of activated neurons.
...

However, it has many weaknesses: it is error-prone, impressionist, free-associative, idiosyncratic, retroactively alterable, and easily biased by a variety of factors at the time of recording or of retrieval.
...
One implication of this pattern is that interactive systems should indicate what users have done versus what they have not yet done.
...
A new face stimulates a pattern of neural activity that has not been activated before, so no sense of recognition results. Of course, a new face may be so similar to a face we have seen that it triggers a misrecognition, or it may be just similar enough that the neural pattern it activates triggers a familiar pattern, causing a feeling that the new face reminds us of someone we know.
...
In contrast, recall is long-term memory reactivating old neural patterns without immediate similar perceptual input.
...
Whatever the evolutionary reasons, our brain did not evolve to recall facts.
...
Because people are bad at recall, they develop methods and technologies to help them remember facts and procedures
...
The relative ease with which we can recognize things rather than recall them is the basis of the graphical user interface (GUI)
...
The relative ease with which we can recognize things rather than recall them is the basis of the graphical user interface (GUI) (Johnson et al., 1989). The GUI is based on two well-known user interface design rules:

See and choose is easier than recall and type.

Use pictures where possible to convey function.

And what does the author mean here?

Even insects, mollusks, and worms, without even an old brain—just a few neuron clusters—can learn from experience. However, only creatures with a cortex or brain structures serving similar functions[2] can learn from the experiences of others.
...
caveat is that some birds can learn from watching other birds.

The mind just races with the possibilities. A student peering over the shoulder of another at an exam is sure learning from the experience of others, on a lighter note.

As with other tasks, consistency within an application's interface is critical. This is also one of the primary tasks of a user interface design engineer - to ensure that different screens, different parts of an application all have the same vocabulary of interface and action. Different parts of an application are worked upon by different engineers, and this can often enough cause those parts of an application to look inconsistent in how they look and feel (the classic problem that LAF standards seek to minimize). Even with the benefit of guidelines and look-and-feel standards that are in place at most large software development companies, it is inevitable that inconsistencies can creep into the UI of an application. This is where the importance of a user-interface and user-experience design team cannot be stressed enough.

To reduce the time it takes for people to master your application, Web site, or appliance, so that using it becomes automatic or nearly so, don’t force them to learn a whole new vocabulary
....
Same name, same thing; different name, different thing. (FormsThatWork.com) This means that terms and concepts should map strictly 1:1. Never use different terms for the same concept, or the same term for different concepts. Even terms that are ambiguous in the real world should mean only one thing in the system. Otherwise, the system will be harder to learn and remember.

Performance and the perception of responsiveness are different beasts altogether, related only by the often contentious thread of individual experiences. Personal experiences can differ widely. What one considers slow is considered acceptable by someone else. In my life and times as a product manager, there have been several occasions where discussions about performance, the expectation of performance, and what can be considered as responsiveness on the part of an application and what should be considered as 'slow' have ranged from the pleasant, the cordial, to the contentious even.

Responsiveness is related to performance, but it is different. Performance is measured in terms of computations per unit of time. Responsiveness is measured in terms of compliance with human time requirements and, as described above, user satisfaction.
...
Time lag between a visual event and our full perception of it: 100 milliseconds (0.1 seconds)
...
Our brain compensates by extrapolating the position of moving objects by 0.1 second. Therefore, as a rabbit runs across your visual field, you see it where your brain estimates it is now, not where it was 0.1 second ago

To be perceived by users as responsive, interactive software must follow these guidelines:
Acknowledge user actions instantly, even if returning the answer will take time; preserve users’ perception of cause and effect
Let users know when the software is busy and when it isn’t
...
Animate movement smoothly and clearly • Allow users to abort (cancel) lengthy operations they don’t want
...
Interactive systems should avoid lengthy gaps in on their side of the conversation. Otherwise, the human user will wonder what is happening. Systems have about 1 second to either do what the user asked or indicate how long it will take.
...
It is true that meeting those deadlines on the Web is difficult—often impossible. However, it is also true that those deadlines are psychological time constants, wired into us by millions of years of evolution, governing our perception of responsiveness.

Every book on memory and cognition will also talk about the two kinds of memory that exist. One is the long-term memory, which consists of the things we remember for a long time, often as long as our lives. The other is short-term or working memory, to which is the attributed the magic number of seven, plus or minus two, which is the average number of objects a person can hold in their working memory. It turns out that while this number may not appear to be impressively high, in reality it is even lower!

This breaking down of tasks into subtasks ends with small subtasks that can be completed without a break in concentration, with the subgoal and all necessary information either held in working memory or directly perceivable in the environment. These bottom-level subtasks are called “unit tasks” (Card et al., 1983).
...
Unit tasks have been observed in activities as diverse as editing documents, entering checkbook transactions, designing electronic circuits, and maneuvering fighter jet planes in dogfights, and they always last somewhere in the range of 6 – 30 seconds.

In conclusion, this is not the book to pick up in the middle of a time-sensitive project to get guidance on user-interface doubts. No. The time to pick this book and go through is before. Or in-between deadline-driven assignments.

Kindle Excerpt:

Thursday, November 26, 2009

Visualizations - The Pie Chart

The Telecom Regulatory Authority of India (TRAI) put up a press release, Date: November 21, 2009 Press Release:Telecom subscribers growth for the month of October 2009. , that has information on the telecom subscription data in India for the month of October 2009. Apart from the quite amazing piece of news that India added 16.67 million (that is 16,670,000) new wireless subscribers, and that the total telephone subscriber base now stands at 525.65 million (that is more than half a billion), the notable thing as far as this blog post is concerned is that depressing use of visualizations in the note.

A few things are obvious at first glance:
- It is a pie chart with a 3D effect.
- This is an Excel generated chart.
- There is redundancy in the chart: the slice labels contain the operator name, and then the legend at the bottom repeats the same information.
- The data is not sorted, so even if you could somehow compare these 3D slices, you would have a tough time finding which is the largest slice, which is the second largest slice, and so on.
- To find the largest slice, you are better off simply comparing the numbers. Which makes the chart itself quite unnecessary.
- The color scheme is very Excel-ish, which is to say, quite unpleasing to the eye. Excel 2007 is an improvement, for sure.
- There are black borders around the slices, which do not make the chart any better.
How to improve this?
Here are some examples:

Example 1:
You cannot really go wrong with a bar chart. This bar chart displays the same data, except now as a bar chart. Straight off you can tell from a visual inspection that "Tata" added the most subscribers, close to 25% of the net additions in October 2009.

Example 2:
I have now added data labels at the top of each bar. This makes it possible to see the precise values for each operator.

Example 3:
By now, it is clear that sorting the bars would make the data a lot more easily digestable. So what insights are now possible with this example? For one, that Reliance and Aircel and even Idea are two operators that added almost the same number of subscribers. Not very obvious from the above examples. Aircel is a relatively new operator, but seems to be growing quite fast, thanks to its aggressive advertising.

Second Chart:

This table above shows "Category wise Net Additions during the Month of October 2009'.
Notwithstanding the fact that the data here would be a lot more easy to understand if it had been formatted with commas, let us see how it may be visualized as a chart:

This chart does one thing well. It gives a sense of the difference in scale between the wireline and wireless segments. The wireless segment is growing by millions, in every circle, while the wireline segment is in decline. The decline is however minuscule. And without labels, it is difficult to gauge even the approximate values.

So, if I plot this now as a percent stacked bar chart, it looks like an improvement. What I have done is added labels to each stack. I can now see that the Metro segment showed a rise, while the other three segments showed a decline in the wireless segments.
However, this chart is sort of misleading, because it makes the wireline and wireless segments appear equal. Which, as we saw, is most certainly not the case.

As the third example, I have now plotted the same data as a stacked vertical bar chart. Not as a percent stacked chart, but simply taken the absolute values and stacked them.

The vertical chart brings out quite nicely the difference in magnitude between the wireline and wireless segments.
A problem existed for this chart also. Which is that the categories for the wireline segment are so small, that the individual stacks are barely visible, even on a chart as tall as this one. So, I have added data labels, and then manually moved the labels so that they don't overlap.

© 2009, Abhinav Agarwal. All rights reserved.

Friday, July 3, 2009

Junk Viz - Search

Search Engine Land has a post, Michael Jackson’s Death: An Inside Look At How Google, Yahoo, & Bing Handled An Extraordinary Day In Search, on how web traffic spiked at some of the web's leading properties like Google, Facebook, and Wikipedia, as a result of Michael Jackson's death.

All good and fine, and a sad day for fans of Michael Jackson, the king of pop as he was known as, but a sad day also for data visualizations.

The chart above is a time-series area graph, and you can see that on the 25th of June 2009, around 14:00 hours traffic to Google querying "Michael Jackson" or combination of words thereof, began to spike. But by how much? Where is the scale? What does each increment of the gridline indicate? 1 million searches? 10 million searches? 100 searches?
Secondly, the area chart could instead have been replaced with a line graph, thus minimizing non-data pixels.

The bar chart above does a better job, in that you can actually see what the vertical scale represents. However, there are at least three problems with this chart:

The color scheme makes it tough to see the data clearly. Of course there are only two bars, so it is not that difficult.
The X-axis labels are gibberish. There is no sub-title or explanation of what these numbers mean. What does "6.4k" mean? And what do the zeros at the end signify?
The location of the vertical scale on the right is non-standard. Most often a scale is placed on the right edge when there are two axes on the graph, as in a dual-Y bar/line graph, and the left and right edges both have different scales. For example, if you were plotting sales and units on the same chart, and using the left axis for the sales and the right axis for the units data.
Adding a fourth quibble: time series data is best visualized by a line graph.

A better graph than the first one, but with the same problems of having no vertical scale. But otherwise the best of the three examples.

Cross-posted to the Oracle BI Blog at Junk Viz - Web Searches.

© 2009, Abhinav Agarwal. All rights reserved.

Tuesday, June 30, 2009

Junk Viz Examples

I have obtained all three examples from Paul Kedrosky's blog, Infectious Greed.

http://rebis.reidin.com/home.html

A fake 3D bar chart. And a gradient effect. Two egregious errors in one chart.
http://paul.kedrosky.com/archives/2009/06/crude_oil_price_1.html

An otherwise useful chart that is marred by the use of distracting gridlines, that overpower the data plotted. The obtrusive gridlines are non-data pixels.
http://paul.kedrosky.com/archives/2009/06/the_young_entre.html

Fake 3D charts, i.e., charts that plot data in two dimensions only, but have the artifact of a three-dimensional effect added, make it difficult to understand the data plotted. The line chart above is one such example.

Wednesday, June 24, 2009

Junk Viz Example - Yahoo Mail

Getting people to think is a good thing. However, getting them to think that your ad has a chart that just does not make sense is not a good thing.

The more people see this chart, if you can call it a chart, the more they will have questions.

Who are these so called 'Other Guys'? Is Google Gmail one of them? Is Hotmail there? What about the great local email provider from my country?
These features are not listed in an alphabetic order. Does that mean something?
Is Tab View the most important feature? Is it the least important? Do the other features listed on the Y-axis build upon the Tab View?
Does it mean that none of the vendors, 'The Other Guys', offer 'Chat', or 'Unlimited Storage'? Their bars do not go up that high.
Are these the only features to look for in an online email service? I don't see an entry for 'Calendar'. Surely that's important.
Why not include other useful features like Labels, Threaded conversations view, Integrated Calendar, Post to Blogs, Facebook / Twitter integration, Rich text editor, Missing Attachment Detector, Address Suggestion, Integrated attachment viewer, Mobile support, SMS integration, and so on... ?

This chart just does not make sense.
A shining example of a junk chart.

A simpler and obvious way of showing such a comparison would be to simply use a table:

This at least gives a more honest picture of the features that the 'Other Guys' have and don't have.

Cartoony charts that serve no other purpose than to convey an illusion of geekiness should be avoided. Who anyway compares email providers today? Don't most people today have accounts on two or more of Hotmail, Yahoo Mail, Google Mail, Rediffmail, IndiaTimes mail, AOL, etc... ?

In the world of Web 2.0 you create a buzz for your products through netizens, who blog, twitter, digg, and post on Facebook, Orkut, MySpace about your products.

Also cross-posted to oraclebi.blogspot.com

© 2009, Abhinav Agarwal. All rights reserved.

Monday, June 22, 2009

Junk Viz Example

An article, "Home loan rates go down", from SiliconIndia, posted on Monday June 22 2009, has a small graphic on the left of the article. It is a good example of a junk visualization. It cannot even be called an example of a junk chart since there is no data at all. It only shows a pseudo-3D chart with small houses perched on top of each bar, and with a line arrow trending downwards, to ostensibly signify that something is going down. To make things worse, there is a reflection effect added.
Is the chart conveying any additional information that is not in the article? No.
Is the chart conveying information included in the article in a better, more understandable manner? No.

Not good.

Cross posted to the Oracle BI Blog - Junk Viz Example

© 2009, Abhinav Agarwal. All rights reserved.