The Data Scientist Against Data Science

You heard it, I’m against it. In most cases, a data scientist’s purview is more limited than we often admit.

Whether “predictive modeling specialists”, “analytics experts”, “expert statisticians”, or “data mining monsters”… They are around. They go in and out of your business, your newspaper, your favorite blog, all the time.

The legend is they work in mysterious ways to collect and crunch data. There use science, algorithms, things like neural networks, support vector machines and behold, deep learning.

Here are a couple of reasons why they might be overrated:

1. No algorithm beats the human brain, still. As a firm believer in Malcolm Gladwell’s “Blink”, I believe the human brain works in mysteriously powerful ways. This means, a “domain expert” with vast hands-on experience can drive better decisions with intuition, without investing a load of cash in collecting and interpreting data.

2. Domain knowledge is underrated. You can’t crunch bank data without knowing what a bank is, and you can’t crunch it well without being a banker. The scope of the data scientist in many business cases starts from eliciting the problem, formulating it and then only, analyzing it. We usually can’t wrap our minds around high-level concepts before we see them at work. This is why a “data science generalist” starts a step behind in catching up to the idea and generating value for the client organization.

3. Data is biased. Conducting a survey, you get the data of people willing to submit their data (a biased sample). Collecting your web site’s clickstream, you only get the people who already visit your website, skewed towards those who visit it more often. By the time you interpret and take action on the data, the population you capture may have (and probably will have) changed.

4. Data is prone to various different fallacies. If you really want to draw a conclusion from data, chances are you will find a subset or view of data to support it (i.e. Texas sharpshooter problem). If you find a correlation, you will probably jump to causality (i.e. false cause). Combine it with business arguments and it’s not hard to stride into slippery slopes, moving goalposts or straw-men. Branding an argument “data-driven” does not mean it’s not fallacious.

5. Data science talent is not abundant, and sub-standard expertise in the field can land you in a worse position than the one you started in. I run into stupid mistakes: mismanagement of bias and variance, over-sensitive assumptions and predictive modelling overkills every single day. Even if there is the miraculous and arcane art of making data work, we’re just not that good at it yet.

Making a decision or managing your business; if there is a trusted expert around, just ask his opinion. If you need to forecast the return on a campaign, just run a pilot instead of torturing your data. It’s not magic, it’s the same machine you use to read e-mails every day, and the same guy that would have otherwise worked for the census bureau. They can’t work miracles.


About Caner Turkmen

Share this post:

Leave a comment

You must register to leave a comment