I was a research assistant at two universities in the 1990s. So I spent a lot of time loading and crunching data on mainframes with SAS, then I’d take stacks of greenbar fanfold printouts to my bosses (professors and/or policy institute types with PhDs) and we’d pour over summary statistics, plots, regression output, etc. I enjoyed seeing the way their minds worked as they interpreted the results. They had a way of questioning what appeared to be strong results, framing stories about what is actually happening, and identifying the next steps in analysis to send me off to do next.
As I read current books on data analysis, data science, big data, whatever, there always seems to be a hand-wave when it comes to the steps of output interpretation, framing, questioning, discrimination, etc. Something like “This step is beyond the scope of this book, go figure it somewhere else, now let’s get back to R and Python code.” I know that the “somewhere else” comes in part from experience with subject matter, but to me there should be a way to provide some structured, if not formal, ways to approach building this experience.
So over the past year I accumulated a list of books towards this goal. If I read an article that mentioned a book relevant to thinking about interpreting data, I’d put that book in the list. Christmas came last month and brought some sweet Amazon gift certificates my way, so I bought the books on my list and dug into them. They’ve been great! By now, I’m pretty sure people around me are already tired of hearing me say things like “You know, according to Kahneman…” or “Nate Silver has a related story to that” or “Nassim Taleb would tell you that…” Anyway, the books are listed below from most favorite to least, but each of them were good enough that I’d buy and read them again.
The Signal and the Noise: Why So Many Predictions Fail but Some Don’t – by Nate Silver. This was probably the first book that started me down the rabbit hole of this list since I regularly read Silver’s 538 blog.
Fooled by Randomness – by Nassim Nicholas Taleb. This is older than his popular Black Swan book but I liked it more. This book’s theme is that events we think have a cause may just be due to chance.
Thinking, Fast and Slow – by Daniel Kahneman. If you have a background in psychology or economics (or like me, both) then you’ve probably heard “Kahneman & Tversky” muttered by professors several times. This book deals with fast and slow thinking, aka type 1 and type 2 thinking, or thinking about dealing with a bear versus deciding whether to get a data science degree.
The Black Swan: The Impact of the Highly Improbable – another Nassim Nicholas Taleb book, this one on how to not be a turkey.
How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life – by Thomas Gilovich. Gilovich worked with Kahneman but there’s almost no overlap with Kahneman’s book above, so get them both.
Thinking in Time: The Uses of History for Decision-Makers – by Richard Neustadt & Ernest May. The authors intended this as a book for policy makers and government employees, but I think the material generalizes to any situation.
I’d enjoy hearing your recommendations on other similar books I could add to my reading list. One that I’ve been considering is Thinking with Data by Max Shron, so if you’ve read it please let me know what you thought about it.