Tuesday , December 18 2018
Home / Managerial Econ / Big data and the curse of dimensionality

Big data and the curse of dimensionality

Summary:
I just finished a fabulous book, Everybody Lies, written by Seth Stephens-Davidowitz.  From the Amazon description of the book: Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women? I particularly liked the metaphors that Stephens-Davidowitz uses to describe his results.

Topics:
[email protected] (Luke Froeb) considers the following as important:

This could be interesting, too:

[email protected] (Michael Ward) writes Markets in Information

[email protected] (Luke Froeb) writes How Prediction Markets Work

no[email protected] (Luke Froeb) writes What happened when Amazon ran price experiments to estimate MR?

[email protected] (Luke Froeb) writes Less of other peoples’ money is funding insurance

I just finished a fabulous book, Everybody Lies, written by Seth Stephens-Davidowitz.  From the Amazon description of the book:
Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

I particularly liked the metaphors that Stephens-Davidowitz uses to describe his results.  For example,  in describing why it is easy to come up with variables that correlate with the stock market, but hard to find ones that can make accurate predictions, he uses the metaphor of coin flipping:
Suppose your strategy for predicting the stock market is to find a lucky coin -- but one that will be found through careful testing. Here's your methodology: You label one thousand coins - 1 to 1,000. Every morning, for two years, you flip each coin, record whether it came up heads or tails, and then note whether the Standard & Poor's Index went up or down that day. You pore through all your data. And voila! You've found something. It turns out that 70.3 percent of the time when Coin 391 came up heads the S&P Index rose. The relationship is statistically significant! Highly so! You have found your lucky coin! 
Just flip Coin 391 every morning and buy stocks whenever it comes up heads. Your days of Target T-shirts and ramen noodle diners are over. Coin 391 is your ticket to the good life!

Every statistics user should know that when running 1000 hypothesis tests, on average 50 of them will show statistically significant results, even when there is no relationship.  This is the size of Type I error (5%) in classical hypothesis testing.

Instead, split your sample in two and use half the data to "find" (estimate) one lucky coin; and the other half to test it.

Leave a Reply

Your email address will not be published. Required fields are marked *