Showing posts with label Book Review. Show all posts
Showing posts with label Book Review. Show all posts

Monday, July 29, 2013

Book Review: The Signal and the Noise : Why So Many Predictions Fail – but Some Don't. By Nate Silver

How Politics, Sports, and Microbial Ecology are very much alike:


I thought this might be a topical post in light of Nate Silver's announcement that he will be moving his operation to ESPN. Nate is one of my favorite people, as his interests  (Sports, Politics, Big Data) match my own in many ways.





In the field of microbial ecology we are increasingly dealing with mounds upon mounds of data. This is due to the advent of DNA sequencing technologies that can count millions of pieces of DNA and tries to match them to databases that tell us which microbe the DNA came from. Sure, there is signal in these mounds, but there can also be lots of noise. When you make so many observations, there are bound to be some that happen by chance. Even if you are 95% certain that your observations aren't coincidence, it only takes 20 observations before you would expect one to be spurious (19/20= 95%)


When I started analyzing my own sequencing data, I realized I needed a much better understanding of statistics to be able to grok what my results meant. I had very little formal statistical training, which is a sad reflection on my high school (where all the smart kids should take calculus, I was assured) and undergrad (required a "calculus for business majors" class, but no stats) programs. Ask me when was the last time I formally took a derivative or integral (my undergrad calculus class-- 10 years ago). Ask me when was the last time I used any statistics (yesterday). I think there is a fundamental disconnect between which math skills are actually needed by the majority of people, and which are taught in schools.

Because I didn't have a good foundation on things like Bayes' Theorem, I started looking around for a book that would teach me some fundamentals so I could develop a good feeling for what type of statistical tests would be most appropriate for my data. I didn't want to read a dry textbook. I heard about this book on an interview that Nate did on some TV show and thought it might be an interesting way to learn some statistics. I knew about Nate from his work in predicting elections (the 2012 US elections in particular) and some of his work in sports as well. 

This book talks about the advances in predictions in fields ranging from earthquakes and weather to sports, gambling, and politics. Many of these fields have large data sets to draw from, just like microbial ecology. If you think about it, we have been keeping records in baseball for a very long time. If you wanted to ask how left-handed pitchers do against left-handed batters in the 9th inning of tied games, there is probably a decent sample size to look at. 

As a long time fantasy football and basketball player (one of my hobbies) I have played around with sports statistics for a while to try to make better decisions about who to draft when and what trades to make. (Gotta fill that all-important virtual trophy case!) There is a similar problem in fantasy sports, lots of data, lots of noise. Some people swear that 3rd-year wide-receivers are the most likely to break out, since it take players that long to learn an NFL offense. People said the similar things about quarterbacks for a long time, but then Cam Newton, Andrew Luck, and Robert Griffin III came along and blew away the avoid-rookie-quarterbacks meme. When making sit-start decisions in fantasy basketball, "experts" say that all else being equal, you should always start the player who is playing in a game where the teams are worst at defense, since you get more possessions per game to pile up stats. In actual sports games (not fantasy) there is some debate on whether things like "momentum" are real (is a team/player on a winning streak or a hot scoring streak within a game more likely to perform better than they otherwise would?). I assume one of the reasons ESPN wanted Nate was to help viewers/readers figure which of these "mechanisms" is real and which is noise. The data is there, it just takes a trained person to analyze it. 

Politics also has large datasets going back many years. With this data people try to answer questions such as: Are local elections predictive of national trends? When is the state of the economy a predictor of presidential elections? Will a candidate's race play into the outcome of an election? It takes careful analysis to separate signal from noise. (See the Redskins Rule -- when the Washington Redskins of the NFL win their last home football game prior to the U.S. Presidential Election the incumbent party wins the electoral vote for the White House; when the Redskins lose, the non-incumbent party wins). 

Microbial ecology is similar in that we can get large datasets around which to make hypotheses about the way communities work. We can try to see if they are real by breaking down the numbers and testing our theories about how mechanisms work. Instead of altered run/pass ratios in games with inclement weather, we look at altered bacteroides/firmicutes ratios (different bacterial groups) in obese people. Some correlations end up being real (the proposed mechanism actually influences the outcome) and some end up being the microbial version of the Redskins Rule (no plausible way for the outcome of a football game to affect the outcome of the election). The real mental work comes in proposing likely mechanisms for the correlations you observe and designing further tests to see if those mechanisms hold true. This takes "subject-matter expertise." Instead of proposing that 3rd-year wide receives break out due to learning an offence, we propose that the physiological effects of pH cause shifts in soil communities

Anyway, I really enjoyed this book. It keeps a light tone, and was a pretty easy read, even for the statistically uninitiated like me. I recommend it for anyone who may want to work with "big data." I give it 5/5 Petri dishes!

Wednesday, June 26, 2013

Book Review: Guns, Germs, and Steel

Guns Germs and Steel by Jared Diamond




Okay, so it's been around for a while, but I just recently finished reading it. This book seeks to answer the question "Why did some nations/cultures survive and conquer, while others failed and disappeared." Dr. Diamond narrows the reasons down to three main developments that enable some nations/cultures to beat out others. These developments lend the book its title:
  • The development of firearms
  • Immunity to diseases, particularly ones found among groups living in higher population densities
  • The ability to make steel, useful for weapons and other purposes
But this only gets us one step down. Why did some nations/cultures develop guns and steel and resistance to diseases they could then pass to others who weren't resistant? The author traces the chain of causality down to the basic structure of the Earth, with different parts having different types and numbers of useful plants and animal species depending on the lay of the land. The meat of this book is taking the reader on that journey with heaps of well-explained evidence along the way. As an ecologist working on symbiotic bacterial species, I appreciate his survey of "macrobial" species that influenced human evolution. As someone whose work involves looking at the evolution of human milk across the world, I liked the anthropological information he gave as well. And a book that ties the fate of nations in to microbes... well that's just right up my alley.

This book makes my list because not only is it interesting, it also makes some important points. I know of some fundamentalist religious and political groups that try to justify institutionalized racism by saying that God favors one group of people over others, and cite as evidence the history of specific groups being conquered by others. They say that a nation's/culture's current economic state is evidence of some blessing or punishment given by God for past obedience or disobedience. This book gives readers the scientific ammunition to support a (to me) less odious explanation of how the world ended up the way it is. The idea that one race of people is somehow (genetically?) superior to another can be fought with information about the evolutionary history of the area they live in.

Overall, this is one of my better reads from the last few years. I give it 5/5 Petri dishes!

File:Agar plate with colonies.jpgFile:Agar plate with colonies.jpgFile:Agar plate with colonies.jpgFile:Agar plate with colonies.jpgFile:Agar plate with colonies.jpg