I WAS ATTRACTED to finance because it promised some order amid chaos. Here was this market, with billions of transactions a day — and yet it managed to set a price for each asset, a price that put a literal number on the value of future risks, or more precisely how much people value those risks in the present day. The world is inundated with information — about individual companies, about the macro economy, about geopolitical risk, about (not to get too meta) prices themselves — and this price incorporated all that, almost instantaneously. This is the definition of market efficiency.

Except for one small thing: This number, this price, has always been a little wrong. Data, as it turns out, has issues.

A roiling controversy in finance is a reminder that any certainty anyone ever had was an illusion. It concerns an academic paper that questions the benefits of factor investing, in which investors make decisions based on “factors” such as a company’s size or how its share price compares to the value of its assets. The theory is that such investments can deliver better returns than the market as a whole.

The paper argues that the data collected and made public by the fathers of factor investing, Kenneth French and Eugene Fama, changed over time — and when the numbers changed, so did the estimates of the factors and their value to your portfolio. True, both the new and old data suggest there are benefits to factor investing, but how much depends on which data are used.

This is not just an academic argument. The factor model is taught in business schools and often used to assess market performance and the price of capital. Fama and French are also affiliated with Dimensional Fund Advisors (DFA), a mutual fund company which offers funds that over-weight the factors. DFA staff assist with the Fama/French data in a non-transparent way to outsiders.

Full disclosure: I worked at DFA more than 10 years ago with Fama and French on another, unrelated data project. One lesson that has stuck with me is that all financial data, no matter the source, is very noisy. And by noisy, I mean unreliable. Most estimates made from financial data are extremely sensitive to the time frame that is selected and any assumptions that are made (and assumptions must always be made). No one should ever take an estimate of a financial variable as an actual fact.

Constructing a data set requires making many judgments, and data are often revised as more becomes available, either through changes in regulations or measurement practices. And factor data are especially noisy because they require making assumptions about which calculations should be made to define “value stocks” or small companies. It’s a process that is very much open to interpretation.

Noise is not only a fact of life for financial data, it is present in economic data, health data, even data about Reddit comments. As the great Fischer Black said in his prescient article on data noise, published almost 40 years ago: “I think almost all markets are efficient almost all of the time. ‘Almost all’ means at least 90%.”

Even 90% makes for a lot of noise.

Data not only paints a murky picture of the past, it also clouds our vision of the future. The data can’t hide the fact that value investing has had a bad few years, or that the outlook for small firms may not be much better as large technology firms continue to dominate the market.

It could be that these factors are just going through a rough patch, as they do from time to time. Or it could indicate deeper changes. A market that values intangible capital — intellectual property instead of machines — could mean value stocks will have lower returns. An economy in which the ability to scale and dominate markets is more important could mean small companies will be less valuable. A changing world means less reliable and more volatile estimates.

As the replication crisis in the social sciences continues, it’s important to note that few academics have been found to be dishonest. Many of the discrepancies simply reflect the arbitrariness of working with data, and whatever assumptions the researcher had to make. The age of big data should make for more consistent and reliable work — but a lot of data can also include a lot of noise.

This controversy in finance is instructive. The amount of data is just staggering, not only about market transactions but also about such things as medical interventions. It’s often noisy, and not even the latest tools — I’m talking about you, AI — can make it completely reliable. As we move into a world in which data is far more accessible, we should be far more aware of its limitations.

I am not being post-modern here. Data is still an incredibly valuable tool that helps people make more rational and informed choices. Black’s 90% estimate is about right — and 90% is much better than nothing. Even with all the noise, the data still show factor investing can be a sound strategy and good way of understanding the market. But in a world of Big Data, we all need to be prepared for some Big Noise. That means never assuming precision, and tempering what the data tells us with our own good old-fashioned human judgment.

BLOOMBERG OPINION