Thursday, November 26, 2009

Visualizations - The Pie Chart

The Telecom Regulatory Authority of India (TRAI) put up a press release, Date: November 21, 2009 Press Release:Telecom subscribers growth for the month of October 2009. ,  that has information on the telecom subscription data in India for the month of October 2009. Apart from the quite amazing piece of news that India added 16.67 million (that is 16,670,000) new wireless subscribers, and that the total telephone subscriber base now stands at 525.65 million (that is more than half a billion), the notable thing as far as this blog post is concerned is that depressing use of visualizations in the note.

A few things are obvious at first glance:
- It is a pie chart with a 3D effect.
- This is an Excel generated chart.
- There is redundancy in the chart: the slice labels contain the operator name, and then the legend at the bottom repeats the same information.
- The data is not sorted, so even if you could somehow compare these 3D slices, you would have a tough time finding which is the largest slice, which is the second largest slice, and so on.
- To find the largest slice, you are better off simply comparing the numbers. Which makes the chart itself quite unnecessary.
- The color scheme is very Excel-ish, which is to say, quite unpleasing to the eye. Excel 2007 is an improvement, for sure.
- There are black borders around the slices, which do not make the chart any better.
How to improve this?
Here are some examples:

Example 1:
You cannot really go wrong with a bar chart. This bar chart displays the same data, except now as a bar chart. Straight off you can tell from a visual inspection that "Tata" added the most subscribers, close to 25% of the net additions in October 2009.

Example 2:
I have now added data labels at the top of each bar. This makes it possible to see the precise values for each operator.

Example 3:
By now, it is clear that sorting the bars would make the data a lot more easily digestable. So what insights are now possible with this example? For one, that Reliance and Aircel and even Idea are two operators that added almost the same number of subscribers. Not very obvious from the above examples. Aircel is a relatively new operator, but seems to be growing quite fast, thanks to its aggressive advertising.

Second Chart:

This table above shows "Category wise Net Additions during the Month of October 2009'.
Notwithstanding the fact that the data here would be a lot more easy to understand if it had been formatted with commas, let us see how it may be visualized as a chart:

This chart does one thing well. It gives a sense of the difference in scale between the wireline and wireless segments. The wireless segment is growing by millions, in every circle, while the wireline segment is in decline. The decline is however minuscule. And without labels, it is difficult to gauge even the approximate values.

So, if I plot this now as a percent stacked bar chart, it looks like an improvement. What I have done is added labels to each stack. I can now see that the Metro segment showed a rise, while the other three segments showed a decline in the wireless segments.
However, this chart is sort of misleading, because it makes the wireline and wireless segments appear equal. Which, as we saw, is most certainly not the case.

As the third example, I have now plotted the same data as a stacked vertical bar chart. Not as a percent stacked chart, but simply taken the absolute values and stacked them.

The vertical chart brings out quite nicely the difference in magnitude between the wireline and wireless segments.
A problem existed for this chart also. Which is that the categories for the wireline segment are so small, that the individual stacks are barely visible, even on a chart as tall as this one. So, I have added data labels, and then manually moved the labels so that they don't overlap.

© 2009, Abhinav Agarwal. All rights reserved.

Tuesday, November 24, 2009

Future of Hedge Fund Investing

The Future of Hedge Fund Investing: A Regulatory and Structural Solution for a Fallen Industry (Wiley Finance) 

This book looks at the history of hedge funds, a brief look at some spectacular failures of hedge funds over the years, including the one run by Bernie Madoff, the types of hedge funds and their investing strategies, the fees and redemption structure, the skills required of a hedge fund manager, the due diligence expected of a hedge fund investor, what a fund-of-hedge-funds does, regulatory mechanisms in place, and a healthy dose of prescriptive remedies for the hedge fund industry.

The book is written in a conversational style, contains no mathematical equations (save for one on the CAPM - the Capital Asset Pricing Model: E(Ra) = Rf + Beta(E(Rm) - Rf)), is short, and a very approachable primer to the world of hedge funds. It is not an investigative work, rather a descriptive one, that walks the reader through almost all aspects of a hedge fund - from the investors, the administrators, the managers, the regulators (or their lack thereof), and the markets.
"As we take a look at the following scandals, we will see that blind greed, a herd mentality to belong to an exclusive club, and lack of proper due diligence has often led to financial ruin." - while Monty states this with reference to the uber-rich community of Palm Beach Island, the same could be said of the lesser rich too. More often, the need to conform and follow-the-herd takes precedence over exercising one's gray matter.

"In the month of September 2006, Amaranth lost $6 billion or 65 percent of the fund's capital on a single natural gas trade."
Why? How??? Simple - leverage. As When Genius Failed: The Rise and Fall of Long-Term Capital Management so eloquently describes, leverage was the tool used to multiply small returns into large profits. When it works it is spectacular. When it fails, it usually brings down entire companies, or as the financial sub-prime crisis demonstrated, entire economies can be brought to their knees through hyper-leveraged speculative frenzies.
"Hedge fund strategies like relative value arbitrage, convertible and fixed-income arbitrage rely very heavily on the past relationships between various bonds and their derivative instruments to hold into the future. But, in 2008 these decade-long relationships broke down, and therefore the hedge fund strategies that relied on mean reversion of these relationships performed the worst."

Two hedge fund failures - KL Financial and Amaranth Advisors - are described initially. The point being that while one failed because of outright fraud, and the other because of an overleveraged position gone horribly wrong, the other near-collapse was because of an un-diversified strategy that went wrong. The lesson is the same - investor greed.

The lessons are fairly common-sensical, but paradoxically, all too un-common:
Lesson 1: Relationships Do Not Trump Due Diligence
Lesson 2: When Investing In Hedge Funds, Hire Experts
Lesson 3: "We Did Not Know What We Were Investing In" Is Not An Excuse
Conclusion: So Called Experts, Fund Of Funds, Have Failed

"The skill set requires a deep knowledge of the investment banking industry and a robust Rolodex of industry contacts."
To the extent that the Rolodex is used to learn about the industry through the people who actually work in it is good; to the extent that you have the Rajaratnams who are using the Rolodex to prise insider information from their contacts, it is not good.'

While a hedge fund manager and trader himself, Monty does point out the role that hedge funds played in the global financial crisis as well as the need for better regulation:
"Hedge funds, however, are not completely disconnected from the crisis. They have been blamed for violating short-selling rules and rumor mongering, as well as creating systemic risk due to their derivatives portfolios.... "
"This is clearly an area where the SEC needs to strengthen and enforce regulation. Capital markets will self-govern effectively as long as the rules of the game are being applied uniformly and followed diligently."

Investor protection through government intervention or regulation is largely diminished in the world of hedge fund investing, except to the extent that the government mandates that only "accredited investors" and "qualified purchasers" are allowed to invest in hedge funds.

Maybe the need is for tighter and more sweeping regulations; after all, over-regulation never caused a financial crisis. On the other hand, lack of adequate regulation has been a contributing factor behind almost every financial crisis in the last 100+ years.

Disclosure: Monty is my brother, and to the extent that has biased my review, it is probably inevitable.

© 2009, Abhinav Agarwal. All rights reserved.

Saturday, November 14, 2009

Visual Display of Quantitative Information

The Visual Display of Quantitative Information, 2nd edition 

My review - "Data decorators, data-ink, instant chartjunk, and naked women"

Perhaps the single most accessible book on data visualizations. You are given a tour of the history of visualizations, the seminal contributions of people such as Playfair, Tukey, and others, a rogues' gallery of sorts of awful visualizations, a peek into small-multiples visualizations, and an exposition of the principles of good graphic design and visualizations. A must-have book for anyone interested in good data visualizations.

Tufte's contention is that a lack of adequate knowledge and expertise and a mistaken notion about numbers are to blame for bad visualizations. The principles of good visualizations, on the other hand, are few and simple. The book is all about exposing bad examples and enunciating these good principles, beautifully illustrated with examples, and printed on excellent quality paper.

The rest of the review can be best told, in my opinion, through quotes from the book:
"The theory of the visual display of quantitative information consists of principles that generate design options... The principles should not be applied rigidly or in a peevish spirit... and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism." [page 191]

While seemingly a trivial matter, the issue of the size of charts, whether they should be tall or horizontal, Tufte states that "Graphics should tend toward the horizontal, greater in length than height..." and "Many graphics plot, in essence (cause and effect) and a longer horizontal helps to elaborate the workings of the causal variable in more detail." [pages 186, 187]

Time-series displays are at their best for big data sets with real variability. [page 30]

Chapter 2, "Graphical Identity" contains a stunning collection of graphs that distort, lie, deceive, and exhibit all manners of skills other than those required for data visualizations.
"Much of twentieth-century thinking about statistical graphics has been preoccupied with the question of how some amateurish chart might fool the naive viewer. ... At the core of the preoccupation with deceptive graphics was the assumption that data graphics were mainly devices for showing the obvious to the ignorant. ... The assumption led down two fruitless paths in the graphically barren years from 1930 to 1970: First, that graphics had to be "alive", "communicatively dynamic," overdecorated and exaggerated.. Second, that the main task of graphical analysis was to detect and denounce deception." [page 53]

"A graphic does not distort if the visual representation of the data is consistent with the numerical representation." [page 55]
Which leads to his definition of the term, "Lie Factor", which he defines as the "size of the effect shown in graphic" divided by "size of effect in chart".
"Another way to confuse data variation with design variation is to use areas to show one-dimensional data" [page 69]
An example cited is the depiction of "the rate of inflation", for which, "graphs show currency shrinking on two dimensions, even though the value of money is one-dimensional." [page 70]

A very important observation quoted in Chapter 3 comes from Howard Weiner - "Perhaps the reason is an increase in the perceived need for graphs ... without a concomitant increase in training in their construction." [page 79]
Tufte elaborates:
"Nearly all those who produce graphics for mass publication are trained exclusively in the fine arts and have had little experience with the analysis of data. ..." "... many graphic artists believe that statistics are boring and tedious. It then follows that decorated graphics must pep up, animate, and all too often exaggerate what evidence there is in the data." [page 79]
"The doctrine of boring data serves political ends, helping to advance certain interests over others in bureaucratic struggles for control of a publication's resources. ... as the art bureaucracy grows, style replaces content. And the word people, having lost space in the publication to data decorators, console themselves... " [page 80]

Tufte defines "data-ink" in Ch 4 ("Theory of Data Graphics") as
"the non-erasable core of a graphic, the non-redundant ink arranged in response to variations in the numbers represented
Data-ink ration = ------------------------------------
                  total ink used to print the graphic

" [page 93]
So, it should not come as a surprise, when Tufte takes a single bar with a value label at the top of the bar, and states that "the labeled, shaded bar of the bar chart, for example, unambiguously locates the altitude in size separate ways." [page 96]. Yes: the label, the two vertical lines of the bar, the top line of the bar, the vertical axis marker - all inform us.

Chapter 5 - "Charkjunk: Vibrations, Grids, and Ducks" is perhaps the most humorous chapter, a bit sadly so, as the title itself suggests. A quote from Johnathan Swift, indicting 17th-century cartographers, says it all - "With save pictures fill their gaps, And o'er unhabitable downs, Place elephants for want of towns." [page ] ouch!

"This may well be the worst graphic ever to find its way into print:" [page 118] refers to a "series of weird three-dimensional displays appearing in the magazine American Education in the 1970s (that) delighted connoisseurs of the graphically preposterous. Here five colors report, almost by happenstance, only five pieces of data..." [page 118]
This is a screenshot of the offending graphic, taken from's Look Inside feature.

You may not, and I certainly did not agree with Tufte's suggestions for maximizing the data-ink efficiency of the box-plot, in the chapter on "Data-ink Maximization", but they are worth examining nonetheless. However, his redesign of the bar chart, with a border and other accouterments, on pages 126-128, are excellent.

Many examples of bad visualizations cited in the book are from the "New York Times", so it is sort of reassuring when you see that the quality of visualizations on the NYT has improved a lot, and are frequently the objects of animated discussions, and generally well regarded. There may be hope, after all.

The review title, explained, at least part thereof
Data Decorators:
as the art bureaucracy grows, style replaces content. And the word people, having lost space in the publication to data decorators, console themselves... " [page 80]
Instant chartjunk:

"... now the computer produces instant chartjunk..." [page 111]
Naked Women:
And what about that slightly inappropriate word in the title of the review?
Tufte writes that an art director with overall responsibility for the design of over 3,000 graphics annually had this to say -
"graphics are intended to more to lure the reader's attention away from the advertising than to explain the news in any detail. 'Unlike the advertisements,' he said, 'at least we don't put naked women in our graphics.' " [page 80]
We must be all thankful for small mercies, I suppose. Though there are a depressing number of vendors in the market that make their software such as to manage vulgar visualizations.

Links to Edward Tufte's Books from his web site
Other suggested books on the topic of visualizations:

© 2009, Abhinav Agarwal. All rights reserved.

Friday, November 13, 2009

Buy, Buy, Baby

Buy, Buy Baby: How Consumer Culture Manipulates Parents and Harms Young Minds

Marketers market at children. Advertisers use and target children. Makers of toys, cartoon shows, and more know that children are a lucrative market. And that if you cannot get to the parents, get to their wallets via children. It may appear to be wrong, and it may militate against our sense of right and wrong, but it happens, all the time. This book attempts to document, expose, and reveal how, when, and possibly why this happens.

You can look at the material in this book as basically covering two topics. One is the part that dwells on how children learn, whether watching television lets them learn any better (or worse). The author also dwells on what drives parents when deciding what to spend on children. The second part is numerous examples of toys, cartoon shows, and franchises that were built into billion dollar businesses through careful branding, advertising, and marketing. Examples are Disney Princesses, Winnie the Pooh, Care Bears, Clifford the Dog, Elmo, and lots more.

One of the most useful and fascinating chapters of the book is on the theoretical and psychological insights into how children learn (Ch 3 - 'It's Like Playschool on TV'). The work of such people as Piaget and Vygotsky serve as the basis for much of the material here.

To a great extent this book works.
But... It is not a masterpiece of muckracking like 'Fast Food Nation'. There is also the sense that the author is unwilling to take a strong stand on this targeting of children, relying instead on the reader to come to that conclusion. Thirdly, the material in the book could have been better organized - themes too often intermingle in the chapters. Lastly, given that marketing to children is built on advertising, a crucial piece missing in this book is the lack of mention of any advertisements that targeted children, so the reader has an idea of the kind of advertising over the years that has targeted children.

By the way, did you know that
"In 1978 the FTC issued a report contending that commercials targeting children under the age of eight were intrinsically unethical, since children of that age were developmentally unable to discern the subtle differences between fact and fantasy." Unsurprisingly, depressingly, "The investigation and the report were quashed by lobbying efforts on behalf of the advertising industry." [both page 55].

"Exploiting nostalgia was the advertisers' chief ploy. ... the toy industry understood the draw for parents of revisiting their own youth through their children." [page 56] What would have been very useful here, in this book, if the author had actually cited advertisements that did this.

"Babies, especially, and very young children are concrete thinkers. The classic separation anxiety that an eigh-month old baby feels when his primary caregiver leaves the room is rooted in the absolute certainty that she is really gone. [page 72]

Marketers and makers of children's toys all go for the 'acceptability halo' - "a marketer who establishes 'educational credit' can get away with anything." [page 3]

GenX-ers are dealt with in some detail in the book, across several chapters.
"Though the Gen-X mother may say she doesn't care how smart her children are, her spending patterns tell a different story. ... will even lie about fast-tracking their babies." [page 65]

"When asked, as late as 2004, what guiding principles its producers used to design developmentally appropriate TV for infants, a spokeswoman for Baby Einstein replied, 'We're just really good at seeing the world from a baby's point of view.'" [page 86]

A member of the AAP's communications committee (American Association of Pediatrics) had this to say:
"There is no excuse for targeting children under two. They should not be watching television, and to target them with a show is immoral." [page 87]

"No legitimate academic research ever showed that Teletubbies was developmentally sound for babies or toddlers." [page 89]

"... toddlers were able to learn from events easily through live demonstrations, or what they believed were live demonstrations, but not when they knowingly viewed the same event via a symbolic medium, such as television." [page 92]

"... watching Sesame Street was 'negatively related' to expressive language use... Teletubbies was negatively related to both vocabulary size and expressive language use ... Barney and Friends was positively related to expressive language use and positively related to vocabulary size ... "

In 2005 an article published in the American Behavioral Scientist by Anderson, a PhD in developmental psychology, had this to say:
"... With the exception of [one finding], there is very little evidence that children under two learn anything from television. The evidence indicates that learning from television by very young children is poor and that exposure to television is associated with relatively poor outcomes." [page 101]

Feeding children into the advertising and marketing grinder has consequences also, as one would expect. Take the 'Disney Princess' brand for example.
"One reason for launching Disney Princess was, naturally, to extend the retail life of each character." [page 137]
"... toddler girls didn't want to look like just any fancy Cindrella; they wanted to look like the Disney Cindrella." [page 139]
".. irony attended the marriage of KGOY and Cindrella ... Disney's Cindrella was emerging as the polar opposite of the orignal ... Cinderella causes some some young devotees to behave more like her wicked stepsisters. ... two and three-year old girls competed on the basis of who had the prettier or greater number of accessories..." [page 140]

Finally, sample this:
"It's good for kids to learn how to manipulate - that's how you get ahead in this world." - per Rachel Geller, big honcho at the Geppetto Group, a New York based marketing firm.

20 pages of notes, and 10 pages of bibliography - fodder for further reading if one is so inclined, ending with a 12 page index.

© 2009, Abhinav Agarwal. All rights reserved.

Saturday, November 7, 2009

Envisioning Information

Envisioning Information
My review

Passionate exposition on effective visualizations. Key takeaways are small-multiples, use of color, and use of details.
However, while mostly good, it is also distractedly didactic.
While a must-have in any collection on data visualizations, for people looking for only one book on effective data visualizations, this is not it.
This book is the poetry of visualizations; you will need to supplement it with books that are the prose of visualizations.
3 Stars???
I initially gave this book four stars, but then changed it to three stars. This may seem harsh, but hear me out. There is lots that is good in the book. However, this book's focus is more on cartography and maps. And this is where it falls short. It does not address the issue of map based visualizations in any sort of depth. Not much space is devoted to the different types of map based visualizations - dot plots, qualitative and quantitative choropleths (color patches), heatmaps, proportional bars, 3D maps, maps with variable sized markers, isopleths, flow maps, dot-location maps, graduated symbol maps, and much, much, more. The other reason for deducting two stars is the fact that this book, in 2009, does read a bit dated. It is a beautifully laid out book, that almost falls into the coffee-table book category, but looking beyond that, the material does show its age. 10 or 15 years ago the rating would have been 4 or 5 stars. Perhaps unfair on my part...

Excerpts from the book:
"All communication between the readers of an image and the makers of an image must now take place on a two-dimensional surface. Escaping this flatland is the essential task of envisioning information." [page 12]

Given the inherent multi-dimensionality of data (a measure that represents value or values over time, region, and other dimensions - e.g. number of employees by year, by country, and by line-of-business), Tufte states that we should
"... increase (1) the number of dimensions that can be represented on plane surfaces and (2) the data density (amount of information per unit area)." [page 13]

This focus on data density finds resonance throughout the book:
"Simplicity of reading derives from the context of detailed and complex information, properly arranged. A most unconventional design strategy is revealed: to clarify, add detail." [page 37]

Tufte is especially harsh on charts that feature "chart junk", what he describes as
"... display apparatus and ornamentation" that "... seek to attract and divert attention...", and that "Lurking behind chart junk is contempt both for information and for the audience. ... designing as if readers were obtuse and uncaring... " [page 33, 34]

On the topic of spatial maps, Tufte highlights a problem that may emerge with conventional choropleths (blot maps): "(they)... paint over areas formed by given geographic or political boundaries ...
" and resulting in non-uniform sizes, and "historical changes in political boundaries disrupt continuity of statistical comparisons." The solution? Or at least one solution: "Mesh maps finesse these problems." Taking the example of a map of Japan, "... the whole country of Japan was divided up in 379,000 equal-sized units and then, in a heroic endeavor, census data and addresses were collated to match the new grid squares." [page 40, 41]

"The struggle between maintenance of context and enforcement of comparison... " [page 77]

Excessive or wanton use of color can be very damaging to the visualization. Eduard Imhof enumerates four rules of minimizing such color damage:
"First rule: Pure, bright colors or very strong colors have loud, unbearable effects when they stand unrelieved over large areas adjacent to each other, but extraordinary effects can be achieved when they are used sparingly on or between dull background tones. ...
Second rule: The placing of light, bright colors mixed with white next to each other usually produces unpleasant results..." [page 82]

Tufte lists "... the fundamental uses of color in information design: to label (color as noun), to measure (color as quantity), to represent or intimate reality (color as representation), and to enliven or decorate (color as beauty)." [page 81]

In Closing:
Consider this: while you may use other books more frequently to learn and reference when creating visualizations, charts, or dashboards, you will want to keep this book handy to remind yourself of the bigger picture and the historical context of visualizations.

© 2009, Abhinav Agarwal. All rights reserved.

Notes on the Upanishads

The Upanishads

These are excerpts from the 51 page endnote of the book, "Reading The Upanishads".

My previous posts on the book and its chapters:
Or simply use this tag search url to view all posts on the Upanishads.
To be Hindu in means in some sense to accept their authority, and since Hinduism, uniquely among the major religions of the world, is a decentralized system with no formal institutional controls, there is almost no other criterion. [page 251]

A second meaning of Vedic includes three classes of texts which are soon attached to, and preserved with, their respective Samhitas. The first are the Brahamanas, lengthy descriptions of the Vedic rituals in a prose which is nearly that of classical Sanskrit .... Second is a smaller and more intriguing group of texts known as Aryanakas or "forest manuals," continuations of the Brahamanas but "dealing with the speculations and spirituality of forest dwellers ... those who have renounced the world." And third are the earliest Upanishads or "confidential sessions." [page 252]

... because they are handed down at the end of the Vedic collections and are meant to be learned and recited last by Vedic students, the Upanishads are classified as vedanta, "the end of the Vedas." [page 253]

At a period when Hinduism was losing its bearings, the great mystic and philosopher Shankara(A.D. ca. 788-820), knowing that only mystical experience could re-invigorate the tradition, composed remarkable commentaries on ten of the Upanishads, giving them as it were a secondary canonization by his authority, labor, and vast intellectual achievement - and renewing Hinduism in the process. These ten Upanishads are listed by Indian tradition in the following order: Isha, Kena, Katha, Prashna, Mundaka, Mandukya, Taittriya, Aitareya, Chandogya, Brihadaranyaka. [page 255]

... that the Vedas (inclusively defined) were created eons before mankind. ... They mean the truths embodied in these forms lies so deep they constitute templates of reality; they are, as it were, evolution's plan. Therefore these four Vedas were "given at the dawn of time"; in the Gita and other texts they are identified with Brahma (the Lord of Creation) and throughout the tradition they are classified as shruti, "heard" - as we would say, a directly revealed literature, contrasting with a more indirect but still not secular which was not revealed but smriti or "recollected" by human beings (smriti also means tradition). The Upanishads are revered as shruti along with the Samhitas. [page 256]

Besides, the Samhitas and especially the oldest, the Rig Veda Samhita, contain impressive profundity in speculation about the nature of being, time, and the universe, as in the famous Nasadiya Sukta (X.129.1.4), sometimes called a "basis of the Upanishads":
At first there was neither Being nor Nonbeing.
There was not air nor sky beyond.
What was its wrapping? Where? In whose protection?
Was water there, unfathomable and deep?
In the beginning Love arose,
Which was the primal germ cell of the mind.
The seers, searching in their hearts with wisdom,
Discovered the connection of Being in Nonbeing.
Who really knows? Who can presume to tell it?
When was it born? Whence issued this creation?
Even gods came after its emergence.
Then who can tell from whence it came to be?
[page 256, 257]

At an early period, one great commentary on the Upanishads emerged as authoritative: the Brahma Sutras (also called Vedanta Sutras) of Badarayana. Indian tradition identified Badarayana with none other than Vyasa, traditional compiler of the Vedas and the Mahabharata, which is by way of acknowledging his immense importance for the cultural tradition; for the Brahma Sutras and its commentaries, serving as a kind of intellectual access to the vision of the Upanishads... [pages 278, 279]

Book Details:
  • Paperback: 311 pages
  • Publisher: Nilgiri Press; 1 edition (June 1987)
  • Language: English
  • ISBN-10: 0915132397
  • ISBN-13: 978-0915132393
  • Product Dimensions: 7.5 x 4.5 x 1 inches

© 2009, Abhinav Agarwal. All rights reserved.