‘Big’ Data Can Be 99.98% Smaller Than It Appears

A  Harvard statistics professor warns about non-random sources of data.

Lock
This article is for subscribers only.

It seems obvious that the opinions of 2.3 million people would be more representative than the opinions of a randomly selected 400. In reality, it depends entirely on how the bigger data set was put together.

Hoping that high quantity can compensate for low quality is a classic mistake in the burgeoning field of big data, says Xiao-Li Meng, a professor of statistics at Harvard who’s the founding editor-in-chief of the 2-year-old Harvard Data Science Review.