Big Data


I've noticed some recent discussion on the correct use of the buzzword: "big data". It's been known to refer to distributed data-intensive technologies (hadoop, mapreduce), predictive analytics of sorts (customer analytics, SCM analytics, financial applications), and business intelligence. An article I recently read credits publications like The McKinsey Quarterly and HBR with reducing the term to mean practically anything to do with data in management.

Relying on experience on the field, here's what I would call “big data”, in two bullet points:

* It doesn't have to call for racks of servers and distributed computing, but “big data” has to be big enough so that scale is an issue. Anything modern PCs can easily handle on main memory (~4GB now, ~24gigs for a reasonable server box) is SMALL data. Big data comes in the scale of 100s of GBs, so the O/S, the DBMS and your skills and gut as a data scientist and developer matter.

* Your method of extracting insight from the data must call for more than rudimentary SQL manipulation. If all being done is limited to sums of groups and high level reporting, chances are much of the value is lost in detail. Big data work, in the sense of the modern buzzword, requires in-depth explanatory analyses and statistics magic simply not possible with your out of the box BI tool.


About Caner Turkmen

Share this post:

Leave a comment

You must register to leave a comment