Reading outside the field is both fun and very informative. As the title of this blog implies, I’m curious. I like to find out about new things, learn new skills and the like. To that end, I make a habit to read outside the library/information field on a regular basis. You might think that reading a book on statistical analysis would be boring, but Super Crunchers is certainly not. Statistics is certainly not my strong suit, but this book has me inspired to work more on learning these skills. My first encounter with the book’s idea was a podcast at EconTalk when the author, Ian Ayres, was interviewed in 2007. The author, Ian Ayres, is both a lawyer and an economist who is the William K. Townsend Professor at Yale Law School and a Professor at Yale’s School of Management. The book explores how statistical techniques such as regressions and randomization. The math per se is only half the story though. The other half of the story is cheap, powerful computing with huge amounts of data (e.g. “Acxiom, which has been called ‘one of the biggest companies you’ve never heard of,’ manages twenty billion customer records (more than 850 terabytes of raw data – enough to fill a 2,000 mile tower of one billion diskettes.” – 146). There are almost no equations or math anywhere in the book which might explain how it reached best seller status.
The central story of the book is a clash between expert opinion and data and the fact that data is often better. The debate between the two is explored in depth on the chapter on evidence-based medicine [EBM] (a concept that was invented at a Canadian university – McMaster – in 1992 by two physicians Gordon Guyatt and David Sackatt) where Ayres chronicles the story of Don Berwick and his 100,000 Lives campaign. Berwick was inspired to get into the battle when he read a 1999 study by the Institute of Medicine that about 98,000 Americans died each year in hospitals “as a result of preventable medical errors,” and by the suffering of his wife who was treated poorly and slowly in hospitals. Berwick proposed that U.S. hospitals could eliminate 100,000 deaths in a year by following practices mandated by ECM. What does that entail exactly? One big killer was infections due caused by central-line catheters. Solution: improve hand-washing and use more hygiene. This wasn’t just a good idea, it was also backed up by solid data. That’s the good news. Alas, it is not all good. There is evidence that shows that doctors often use procedures and methods by virtue of tradition. There also appears to be some evidence that physicians aren’t keeping up with the latest research enough (e.g. “Vitamin B12 deficiencies must be treated with shots because vitamin pills are ineffective” which is misleading, apparently). How is this being fixed? Ayres refers to some specialized search tools that quickly provide access to data on procedures including chance of success etc. Oddly, librarians are not mentioned anywhere in this story.
There are many other examples in the book as well, not all of which are positive. There are companies using data analysis – such as WalMart and Capital One – to evaluate employees and prospects. Insofar as this worsens information asymmetry, it is a cause for concern. There are a few cases of customer (not much citizen driven efforts yet) services such as FareCast which uses statistics to offer recommendations on whether an air fare is likely to go up or go down in the future (and, supposedly, the likelihood of this happening.) Several governments are also experimenting with using data more with major success stories in Mexico (a program that makes welfare payments to mothers conditional on their children attending school, seeking regular medical care etc; this has actually made a big difference) and the United States. While there are some scary aspects of using data to predict purchases and the like, there are also cases of it being used to improve government and improve medical care. The only weakness – and it is a small one – is that one struggles to figure out what the next step. How can one learn to employ the statistical techniques described here?
The application of this to the information profession is clear and challenging. From my impressions, many in the field are not highly skilled in math. Math and statistics aren’t everything but they offer a lot that is being missed. Assuming we sort through the privacy issues (which should be possible to do fairly), what might librarians and information professionals offer? Predictive search results could be improved. How? Let’s say that you have five years of search data for a scientific database service like BioMedCentral or Web of Science at an academic library. You then look for patterns in searching and link it to time. There are likely patterns as students prepare for specific assignments – this could be used to create better resources to help students. Here’s another application: a public library could plot borrowing geographically across a city and look for patterns, maybe even by topic. Both of these are quite basic applications, but I think there is a lot of potential here.
Related posts:


[...] that analyzing large volumes of data can be useful (though I think this story is much better told by Ian Ayres in Super Crunchers, see my review of it), that transparency is needed more and that organizations can learn more from working with their [...]