Thursday , June 21 2018
Home / Brad Delong, Berkeley / Should-Read: Lots to think about about how statistics and economics should be being taught these days: Drew Conway (2013): The Data Science Venn Diagram

Should-Read: Lots to think about about how statistics and economics should be being taught these days: Drew Conway (2013): The Data Science Venn Diagram

Summary:
Should-Read: Lots to think about about how statistics and economics should be being taught these days: Drew Conway (2013): The Data Science Venn Diagram: “The primary colors of data: hacking skills, math and stats knowledge, and substantive expertise… …On Monday we spent a lot of time talking about “where” a course on data science might exist at a university. The conversation was largely rhetorical, as everyone was well aware of the inherent interdisciplinary nature of the these skills; but then, why have I highlighted these three? First, none is discipline specific, but more importantly, each of these skills are on their own very valuable, but when combined with only one other are at best simply not data science, or at worst downright dangerous. For better or worse, data is a

Topics:
Bradford DeLong considers the following as important:

This could be interesting, too:

Delaney Crampton writes Weekend reading: “barriers to economic equality” edition

Equitable Growth writes Examining the links between rising wage inequality and the decline of unions

Bradford DeLong writes Brad DeLong: Worthy reads on equitable growth, May 25-31, 2018

Bridget Ansel writes A look at the motherhood wage gap on Mother’s Equal Pay Day

Should-Read: Lots to think about about how statistics and economics should be being taught these days: Drew Conway (2013): The Data Science Venn Diagram: “The primary colors of data: hacking skills, math and stats knowledge, and substantive expertise…

…On Monday we spent a lot of time talking about “where” a course on data science might exist at a university. The conversation was largely rhetorical, as everyone was well aware of the inherent interdisciplinary nature of the these skills; but then, why have I highlighted these three? First, none is discipline specific, but more importantly, each of these skills are on their own very valuable, but when combined with only one other are at best simply not data science, or at worst downright dangerous.

For better or worse, data is a commodity traded electronically; therefore, in order to be in this market you need to speak hacker…. Being able to manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically; these are the hacking skills that make for a successful data hacker. Once you have acquired and cleaned the data, the next step is to actually extract insight from it. In order to do this, you need to apply appropriate math and statistics methods, which requires at least a baseline familiarity with these tools. This is not to say that a PhD in statistics in required to be a competent data scientist, but it does require knowing what an ordinary least squares regression is and how to interpret it.

In the third critical piece—substance—is where my thoughts on data science diverge from most of what has already been written on the topic. To me, data plus math and statistics only gets you machine learning…. [But] science is about discovery and building knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods….

Finally, a word on the hacking skills plus substantive expertise danger zone. This is where I place people who, “know enough to be dangerous,” and is the most problematic area of the diagram. In this area people who are perfectly capable of extracting and structuring data, likely related to a field they know quite a bit about, and probably even know enough R to run a linear regression and report the coefficients; but they lack any understanding of what those coefficients mean. It is from this part of the diagram that the phrase “lies, damned lies, and statistics” emanates, because either through ignorance or malice this overlap of skills gives people the ability to create what appears to be a legitimate analysis without any understanding of how they got there or what they have created. Fortunately, it requires near willful ignorance to acquire hacking skills and substantive expertise without also learning some math and statistics along the way. As such, the danger zone is sparsely populated, however, it does not take many to produce a lot of damage.

Drew Conway Data Science
Bradford DeLong
J. Bradford DeLong is Professor of Economics at the University of California at Berkeley and a research associate at the National Bureau of Economic Research. He was Deputy Assistant US Treasury Secretary during the Clinton Administration, where he was heavily involved in budget and trade negotiations. His role in designing the bailout of Mexico during the 1994 peso crisis placed him at the forefront of Latin America’s transformation into a region of open economies, and cemented his stature as a leading voice in economic-policy debates.

Leave a Reply

Your email address will not be published. Required fields are marked *