Saturday, November 14, 2009

Visual Display of Quantitative Information

The Visual Display of Quantitative Information, 2nd edition 


My Amazon.com review - "Data decorators, data-ink, instant chartjunk, and naked women"

Perhaps the single most accessible book on data visualizations. You are given a tour of the history of visualizations, the seminal contributions of people such as Playfair, Tukey, and others, a rogues' gallery of sorts of awful visualizations, a peek into small-multiples visualizations, and an exposition of the principles of good graphic design and visualizations. A must-have book for anyone interested in good data visualizations.

Tufte's contention is that a lack of adequate knowledge and expertise and a mistaken notion about numbers are to blame for bad visualizations. The principles of good visualizations, on the other hand, are few and simple. The book is all about exposing bad examples and enunciating these good principles, beautifully illustrated with examples, and printed on excellent quality paper.

The rest of the review can be best told, in my opinion, through quotes from the book:
"The theory of the visual display of quantitative information consists of principles that generate design options... The principles should not be applied rigidly or in a peevish spirit... and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism." [page 191]

While seemingly a trivial matter, the issue of the size of charts, whether they should be tall or horizontal, Tufte states that "Graphics should tend toward the horizontal, greater in length than height..." and "Many graphics plot, in essence (cause and effect) and a longer horizontal helps to elaborate the workings of the causal variable in more detail." [pages 186, 187]

Time-series displays are at their best for big data sets with real variability. [page 30]

Chapter 2, "Graphical Identity" contains a stunning collection of graphs that distort, lie, deceive, and exhibit all manners of skills other than those required for data visualizations.
"Much of twentieth-century thinking about statistical graphics has been preoccupied with the question of how some amateurish chart might fool the naive viewer. ... At the core of the preoccupation with deceptive graphics was the assumption that data graphics were mainly devices for showing the obvious to the ignorant. ... The assumption led down two fruitless paths in the graphically barren years from 1930 to 1970: First, that graphics had to be "alive", "communicatively dynamic," overdecorated and exaggerated.. Second, that the main task of graphical analysis was to detect and denounce deception." [page 53]

"A graphic does not distort if the visual representation of the data is consistent with the numerical representation." [page 55]
Which leads to his definition of the term, "Lie Factor", which he defines as the "size of the effect shown in graphic" divided by "size of effect in chart".
"Another way to confuse data variation with design variation is to use areas to show one-dimensional data" [page 69]
An example cited is the depiction of "the rate of inflation", for which, "graphs show currency shrinking on two dimensions, even though the value of money is one-dimensional." [page 70]

A very important observation quoted in Chapter 3 comes from Howard Weiner - "Perhaps the reason is an increase in the perceived need for graphs ... without a concomitant increase in training in their construction." [page 79]
Tufte elaborates:
"Nearly all those who produce graphics for mass publication are trained exclusively in the fine arts and have had little experience with the analysis of data. ..." "... many graphic artists believe that statistics are boring and tedious. It then follows that decorated graphics must pep up, animate, and all too often exaggerate what evidence there is in the data." [page 79]
And
"The doctrine of boring data serves political ends, helping to advance certain interests over others in bureaucratic struggles for control of a publication's resources. ... as the art bureaucracy grows, style replaces content. And the word people, having lost space in the publication to data decorators, console themselves... " [page 80]

Tufte defines "data-ink" in Ch 4 ("Theory of Data Graphics") as
"the non-erasable core of a graphic, the non-redundant ink arranged in response to variations in the numbers represented
                               data-ink
Data-ink ration = ------------------------------------
                  total ink used to print the graphic

" [page 93]
So, it should not come as a surprise, when Tufte takes a single bar with a value label at the top of the bar, and states that "the labeled, shaded bar of the bar chart, for example, unambiguously locates the altitude in size separate ways." [page 96]. Yes: the label, the two vertical lines of the bar, the top line of the bar, the vertical axis marker - all inform us.

Chapter 5 - "Charkjunk: Vibrations, Grids, and Ducks" is perhaps the most humorous chapter, a bit sadly so, as the title itself suggests. A quote from Johnathan Swift, indicting 17th-century cartographers, says it all - "With save pictures fill their gaps, And o'er unhabitable downs, Place elephants for want of towns." [page ] ouch!

"This may well be the worst graphic ever to find its way into print:" [page 118] refers to a "series of weird three-dimensional displays appearing in the magazine American Education in the 1970s (that) delighted connoisseurs of the graphically preposterous. Here five colors report, almost by happenstance, only five pieces of data..." [page 118]
This is a screenshot of the offending graphic, taken from Amazon.com's Look Inside feature.

You may not, and I certainly did not agree with Tufte's suggestions for maximizing the data-ink efficiency of the box-plot, in the chapter on "Data-ink Maximization", but they are worth examining nonetheless. However, his redesign of the bar chart, with a border and other accouterments, on pages 126-128, are excellent.

Many examples of bad visualizations cited in the book are from the "New York Times", so it is sort of reassuring when you see that the quality of visualizations on the NYT has improved a lot, and are frequently the objects of animated discussions, and generally well regarded. There may be hope, after all.

The review title, explained, at least part thereof
Data Decorators:
as the art bureaucracy grows, style replaces content. And the word people, having lost space in the publication to data decorators, console themselves... " [page 80]
Instant chartjunk:

"... now the computer produces instant chartjunk..." [page 111]
Naked Women:
And what about that slightly inappropriate word in the title of the review?
Tufte writes that an art director with overall responsibility for the design of over 3,000 graphics annually had this to say -
"graphics are intended to more to lure the reader's attention away from the advertising than to explain the news in any detail. 'Unlike the advertisements,' he said, 'at least we don't put naked women in our graphics.' " [page 80]
We must be all thankful for small mercies, I suppose. Though there are a depressing number of vendors in the market that make their software such as to manage vulgar visualizations.

Links to Edward Tufte's Books from his web site
Other suggested books on the topic of visualizations:







© 2009, Abhinav Agarwal. All rights reserved.