Remarks

‘Big’ Data Can Be 99.98% Smaller Than It Appears

A Harvard statistics professor warns about non-random sources of data.

Have a confidential tip for our reporters? Get in Touch

Before it’s here, it’s on the Bloomberg Terminal

March 8, 2021 at 10:17 AM EST

This article is for subscribers only.

It seems obvious that the opinions of 2.3 million people would be more representative than the opinions of a randomly selected 400. In reality, it depends entirely on how the bigger data set was put together.

Hoping that high quantity can compensate for low quality is a classic mistake in the burgeoning field of big data, says Xiao-Li Meng, a professor of statistics at Harvard who’s the founding editor-in-chief of the 2-year-old Harvard Data Science Review.

Have a confidential tip for our reporters? Get in Touch

Before it’s here, it’s on the Bloomberg Terminal

LEARN MORE