Abhinav Agarwal: The Filter Bubble by Eli Pariser

The Filter Bubble: What the Internet Is Hiding From You, by Eli Pariser - my review (Amazon, Kindle, Flipkart, Infibeam, my user review on Amazon.com)
4 stars

My review: A City of Ghettos or Mosaic of Subcultures? Excellent book, but its at-times skim-ish coverage is a missed opportunity.

Makes us aware of the risks that the increasingly pervasive and invisible personalization of content on the Internet poses to innovation and creativity, not to mention privacy and liberty. An in-depth look at the technology of data laundering would have elevated the book from very good to truly outstanding.

The premise of the Internet was to open up worlds of information and bring them to our homes via our computers - information that had remained inaccessible to people for a variety of reasons like cost, access, and more. While this premise still exists and has been made possible to a large extent, the increasing amount of personalization of content - done by regular news websites, shopping sites, social media, and search engines - results in showing us more of what we already know or what we like, and hides what these sites think and decide we do not want to or would not like to view.

This personalization is more pervasive, and in most cases, more invasive of our privacy, than most people realize. Web sites track our clicks, our pageviews, where we come from, where we go, how much time we spend on different sites, what keywords we have searched for, details about our physical location, the types of devices we access these sites from - the type of computer, the browser, the operating system, and more. This rich trove of information allows sites and companies to build detailed dossiers on hundreds of millions of users. If done by governments this would be deemed intolerably intrusive and something done only by totalitarian regimes. However, such gathering of highly personal data when done in the commercial world of the Internet is par for the course. "...here’s what Acxiom knows about 96 percent of American households and half a billion people worldwide: the names of their family members, their current and past addresses, how often they pay their credit card bills whether they own a dog or a cat (and what breed it is), whether they are righthanded or left-handed, what kinds of medication they use (based on pharmacy records) ... the list of data points is about 1,500 items long." [location 593]
Scary? Well, here is an example of what even a short visit to a nondescript site like www.dictionary.com can do to your computer:

Search for a word like “depression” on Dictionary.com, and the site installs up to 223 tracking cookies and beacons on your computer so that other Web sites can target you with antidepressants. [location 140]

...

BlueCava is compiling a database of every computer, smartphone, and online-enabled gadget in the world, which can be tied to the individual people who use them. [location 1420]

In my own experience, visiting a respectable site like LinkedIn.com resulted in 30 cookies being placed on my computer (see screenshots below). I did not count the beacons placed, but I suspect there were more than a few of those too placed on my computer.

You can run, but you can't hide from the tracking that happens on the Internet.

Say you check out a pair of running sneakers online but leave the site without springing for them. If the shoe site you were looking at uses retargeting, their ads—maybe displaying a picture of the exact sneaker you were just considering—will follow you around the Internet, showing up next to the scores from last night’s game or posts on your favorite blog. And if you finally break down and buy the sneakers? Well, the shoe site can sell that piece of information to BlueKai to auction it off to, say, an athletic apparel site. Pretty soon you’ll be seeing ads all over the Internet for sweat-wicking socks. [location 609]

Tracking by itself may not be very palatable to users and consumers. But where it has the potential to turn decidedly ominous is when you consider the uses to which such information could be put to.

In some cases, algorithmic sorting based on personal data can be even more discriminatory than people would be. [location 1645]

Banks are beginning to use social data to decide to whom to offer loans: [location 1681]
...
...LinkedIn, the social job-hunting site, offers a career trajectory prediction site; [location 1686]
...As a service to customers, it’s pretty useful. But imagine if LinkedIn provided that data to corporate clients to help them weed out people who are forecast to be losers. [location 1690]

By tracking us as we traverse the web, the web (mostly Google, Facebook, and Amazon are the examples the author uses) tracks our clicks, our pageviews, over a period of time, and then shows us a personalized version of the content, stripping out news stories that are contrary to our political views - as determined by these trackers and personalizers. We end up living in an invisible echo chamber that shows us what the chamber thinks we like seeing. This will stifle creativity and innovation. "Creativity is often sparked by the collision of ideas from different disciplines and cultures. Combine an understanding of cooking and physics and you get the nonstick pan and the induction stovetop."

It seems our brains are forever balancing a cognitive tightrope "between the conflicting tendencies to learn" too much from the past" and "incorporating too much new information from the present" [location 1085].

However, "personalized filters can upset this cognitive balance" by surrounding "us with ideas with which we’re already familiar (and already agree), making us overconfident in our mental frameworks." - perpetuating a confirmation bias of sorts. "Second, it removes from our environment some of the key prompts that make us want to learn ... It can block what researcher Travis Proulx calls “meaning threats,” the confusing, unsettling occurrences that fuel our desire to understand and acquire new ideas. [locations 1088, 1157]

Or to put it in other words, what if all you kept seeing were white swans, some system were to determine that you were not interested in seeing black swans, and when a black swan did in fact appear, it were to be hidden from your view. It would only end up reinforcing this very middle-of-the-ground view of the world being inhabited by ONLY white swans. And we know the perils of ignoring black swans.

Furthermore, innovation is sparked by curiosity. According to psychologist George Lowenstein, curiosity is aroused when we’re presented with an “information gap.”

Democracy requires citizens to see things from one another’s point of view, but instead we’re more and more enclosed in our own bubbles. Democracy requires a reliance on shared facts; instead we’re being offered parallel but separate universes. [location 128]

But the filter bubble isn’t tuned for a diversity of ideas or of people. It’s not designed to introduce us to new cultures. [location 1309]

Listening to a radio station, or reading a newspaper, or watching a news channel, you are aware, to some degree at least, that there is a point of view that the newspaper, or radio, or TV channel holds. The choice to switch is yours. Not so with the web.

"On the Internet, personalized filters could promote the same kind of intense, narrow focus you get from a drug like Adderall" - which works by increasing levels of the neurotransmitter norepinephrine, which, for one, "reduces our sensitivity to new stimuli."

As Cropley points out in Creativity in Education and Learning, the physicist Niels Bohr famously demonstrated this type of creative dexterity when he was given a university exam at the University of Copenhagen in 1905. One of the questions asked students to explain how they would use a barometer (an instrument that measures atmospheric pressure) to measure the height of a building. Bohr clearly knew what the instructor was going for: Students were supposed to check the atmospheric pressure at the top and bottom of the building and do some math. Instead, he suggested a more original method: One could tie a string to the barometer, lower it, and measure the string—thinking of the instrument as a “thing with weight.” The unamused instructor gave him a failing grade—his answer, after all, didn’t show much understanding of physics. Bohr appealed, this time offering four solutions: You could throw the barometer off the building and count the seconds until it hit the ground (barometer as mass); you could measure the length of the barometer and of its shadow, then measure the building’s shadow and calculate its height (barometer as object with length); you could tie the barometer to a string and swing it at ground level and from the top of the building to determine the difference in gravity (barometer as mass again); or you could use it to calculate air pressure. [location 1282]

On the topic of fitting information to suit the user, a natural and ominous extension is the area of censorship and big brother - governmental surveillance of its citizenry. When talking of censorship the foremost country that comes to mind is China. Interestingly, the author points out that China does not need to wield a heavy hammer and censor anything and everything it deems inappropriate. Injecting sufficient distortions can also have be just as effective, and with perhaps better results. Censorship need not be absolute as in China. It can be subtle, it can be voluntary, and it can be almost completely invisible, discernible only after careful scrutiny.

China’s objective isn’t so much to blot out unsavory information as to alter the physics around it—to create friction for problematic information [location 1760]

Rather than decentralizing power, as its early proponents predicted, in some ways the Internet is concentrating it. [location 1786]
As long as a database exists, it’s potentially accessible by the state. That’s why gun rights activists talk a lot about Alfred Flatow. [location 1823]

When Amazon booted the activist Web site WikiLeaks off its servers under political pressure in 2010, the site immediately collapsed—there was nowhere to go. [location 1842]

Because of the economies of scale in data, the cloud giants are increasingly powerful. And because they’re so susceptible to regulation, these companies have a vested interest in keeping government entities happy. [location 1848]

Just as black holes can be detected only by the absence of light, similarly, sometimes censorship can be detected only by noting the absence of terms and words that are otherwise to be found in freer societies.

In December 2010, researchers at Harvard, Google, Encyclopædia Britannica, and the American Heritage Dictionary announced the results of a four-year joint effort. The team had built a database spanning the entire contents of over five hundred years’ worth of books—5.2 million books in total, in English, French, Chinese, German, and other languages. Now any visitor to Google’s “N-Gram viewer” page can query it and watch how phrases rise and fall in popularity over time, [location 2526]
And, they argued, the tool could provide “a powerful tool for automatically identifying censorship and propaganda” by identifying countries and languages in which there was a statistically abnormal absence of certain ideas or phrases. [location 2533, emphasis mine]

The constant, unending flow of personal data of Internet users from one server to another, from one company's database to another's, being enriched by the merging of even more personal information, is made possible because data can move with little friction over the web, and because it is so difficult to trace. Like money laundering, data laundering becomes not only possible but possible on a massive scale when done using the Internet.

Data are uniquely suited to gray-market activities, because they need not carry any trace of where they have come from or where they have been along the way. Wright calls this data laundering, and it’s already well under way: [location 2700]

The author comes down harshly on both Google and Facebook for trying to have it their way...

Too often, the executives of Facebook, Google, and other socially important companies play it coy: They’re social revolutionaries when it suits them and neutral, amoral businessmen when it doesn’t. And both approaches fall short in important ways.
...
Facebook describes itself as a “social utility,” as if it’s a twenty-first-century phone company. But when users protest Facebook’s constantly shifting and eroding privacy policy, Zuckerberg often shrugs it off with the caveat emptor posture that if you don’t want to use Facebook, you don’t have to.
...
Google’s founders also sometimes play a get-out-of-jail-free card.

In conclusion, a very common-sensical proposal, almost forty years old and yet mostly valid even today, lies unenforced and mostly forgotten, because it is not in the interests of those who profit from collecting information on their users to do so.

In 1973, the Department of Housing, Education, and Welfare under Nixon recommended that regulation center on what it called Fair Information Practices:
- You should know who has your personal data, what data they have, and how it’s used.
- You should be able to prevent information collected about you for one purpose from being used for others.
- You should be able to correct inaccurate information about you.
- Your data should be secure.
Nearly forty years later, the principles are still basically right, and we’re still waiting for them to be enforced.

This is a very important and very timely book on a very important topic. In that sense this book is a must-read for everyone who spends time on the Internet and has invested in creating a social identity on the net.

On the other hand, this book also falls short on at least one important area. I was expecting at least some in-depth look at how personalizations on the Internet works. This is a lost opportunity in my opinion. As in such books intended for a broad audience, the thinking tends to keep the book non-technical and technojargon-free. However, that comes at a price. The price paid is that the reader gets little in-depth understanding of the issues involved. We know that personalization exists. However, how does it actually work? Yes, there are massive data sets involved. There is data crunching at massive levels - think of Hadoop, Big Table, MapReduce, NoSQL, etc... working on clusters of tens of thousands or even hundreds of thousands of computers. There are data mining algorithms at work here, finely tuned to extract the most insightful of correlations between seemingly disparate pieces of information. Yes. However, take a specific case of a user browsing or searching for a keyword, and then follow the user and the personalization that is attached to that piece of information. Show to the user that cookie, and the tags associated with it, and the path that the information takes as it follows us from a Google search page to a news page where that a contextual, personalized ad is served up.

Or to take another case. The author notes that it is "becoming more important to develop a basic level of algorithmic literacy.". Ok, no issue there; I would tend to wholeheartedly agree with the author. But then what? How? There is a suggestion that we learn basic programming - a quite radical a suggestion from a book aimed at the masses Laudable. But where does that take the average Internet user? How does knowing programming, and having a basic level of "algorithmic literacy" help me become better aware of the way cookies and persistent Flash objects work, or prevent them from tracking me? These are questions that the book could have, should have, answered. But it doesn't.

That is missing from this book. And that is a pity in my opinion. It would have elevated the book from the very good and timely to truly outstanding.

In closing, I would say that this book is an important addition to the literature that seeks to provide a counter-argument to the wholly uncritical and the absolute way in which the Internet is viewed by technophiles. There are at least two other books I would recommend to people interested in this topic:

Some other books and articles referenced in the book:

“Do Artifacts Have Politics?”
Bowling Alone, by Robert Putnam
Alexander, Christopher, Sara Ishikawa, and Murray Silverstein. A Pattern Language: Towns, Buildings, Construction.
Battelle, John. The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture.
Conley, Dalton. Elsewhere, U.S.A.: How We Got from the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms, and Economic Anxiety
Solove, Daniel J. Understanding Privacy. Cambridge, MA: Harvard University Press, 2008.
Sunstein, Cass R. Republic.com 2.0. Princeton: Princeton University Press, 2007
Wu, Tim. The Master Switch : The Rise and Fall of Information Empires (my review, blog post)
Robert Rosenthal and Lenore Jacobson, “Teachers’ Expectancies: Determinants of Pupils’ IQ Gains,” Psychological Reports, 19 (1966)
The Filter Bubble - Books by Eli Pariser - Penguin Group (USA)
The Filter Bubble - Official book page
Filter bubble - Wikipedia, the free encyclopedia
The Shallows, by Nicholas Carr
The Googlization of Everything
What Technology Wants, by Kevin Kelly

Kindle Excerpt:

Abhinav Agarwal

Sunday, February 5, 2012

The Filter Bubble by Eli Pariser - Review