We are seeing the term tossed around today the way ‘e-commerce’ was tossed around in the late 90’s. So what is Big Data, and what’s all the fuss about. Some history is in order to set the stage.
Statistics evolved as a science based upon using samplings of data to derive conclusions about the larger, or ‘total’ sample. As an example, one might survey a group of people shopping in a mall on a given day to ask what they are buying, or find out how long they planned to be in the mall, or what brought them out that day. The survey company would have decided that they needed to obtain answers from some ‘sample size’ equal to some percentage of what they thought the total foot traffic in the mall was likely to be that day. From that ‘sample’ one could extrapolate what the answers would be if they had theoretically polled 100% of all mall shoppers.
For most of history, this was the only way to look at data. Looking at ever larger sample sizes wasn’t feasible given the tabulating ability, or later, the computational ability of the systems of the day.
Enter Big Data
Big Data is simply a data set where the sample size (n) = all. There is NO sampling. All, or nearly all of the data is analyzed. What we find is remarkable. In addition to far more detailed information, correlations where none would have been visible in the past appear.
In one interesting example of this, WalMart looked at some purchase data a few years ago.
Wal Mart has captured and stored 100% of their customer transactions forever. In a study looking at what products people purchased leading up to major projected storms (hurricanes, tornadoes, etc), they found the usual items one would expect. Water, batteries, etc. What they also found, unexpectedly, was Pop Tarts. There was an unreasonably high expectation that purchasers of storm related items would also buy Pop Tarts!
There is no effort made to study the ‘why’ of this data point. Just the what. Big Data can’t tell us why something is, just that it is. Wal Mart began reconfiguring their stores in storm paths to butt end caps of Pop Tarts near the other supplies, in addition to adding to their stock of these items, and sales soared.
Coming up next…the truth is in the noise…working with messy data.
TxMQ has a Business Intelligence practice helping companies work with and manage large data sets to derive actionable information from them. Contact an account executive, or [email protected] today for a free initial consultation.
Contact us today for more information or assistance on getting your business on the right track with IBM® Cognos®.