Magazine / Bar Bets to Nobel Prizes: How Statistics Shape Our Choices

Bar Bets to Nobel Prizes: How Statistics Shape Our Choices

Book Bites Habits & Productivity Politics & Economics

Below, Mark Prell shares five key insights from his new book, What Are the Odds?: A Statistical Guide to Certainty in an Uncertain World.

Mark has been a college professor and a researcher for a federal statistical agency. He was a senior economist at the U.S. Department of Agriculture, served as Co-Chair of the Federal Committee on Statistical Methodology, and taught economics at Johns Hopkins University.

What’s the big idea?

If you understand statistical thinking, you can better navigate the uncertainties of daily life. It teaches us how to use data carefully, question evidence thoughtfully, and understand how science builds knowledge over time.

Listen to the audio version of this Book Bite—read by Mark himself—in the Next Big Idea App, or buy the book.

1. Data and their quality are important.

You might analyze data for a business, a hospital, a school district, or this evening you might read about, say, the latest dietary advice on a webpage. Along the way, you will want to ask two key questions: “What do the data say?” and “Are the data any good?” All data have strengths and limitations. You want to understand the data’s quality, which involves accuracy along with several other dimensions of quality, such as relevance, granularity, and timeliness.

It may be intuitive that poor data can affect a study’s conclusions. What is more subtle is how data quality can be diminished or enhanced based on how the data are collected. One story of a threat to data quality, which had national implications, emerged in the early 2000s when households were getting rid of their landline phone service, turning instead to cell phones.

Back then, most organizations that conducted phone surveys were still relying on old-fashioned methods that called numbers only for landline telephones. As a result, data for young adults tended to be missed in these surveys because young adults disproportionately relied on cell phones. Although certain adjustments are possible to make corrections, national estimates can be affected by such discrepancies. Fortunately, by the end of the 2010s, most survey organizations had added cell phone numbers to the mix.

In data about people, it is a good idea to consider which demographic groups are captured well, and which might be underrepresented or even missing altogether in the data. There is an old saying: If the strands of your fishing net are six inches apart, you will think that all fish in the lake are bigger than six inches.

2. Statistics is about finding patterns in our data.

What does it mean to find a pattern in the data? A striking example comes from the dawn of statistics, and it shows why data can be so valuable and why statistics can be so engaging.

Beginning in 1603, an account of burials and christenings in London was published each week, along with an annual summary at the end of each year. The data were collected from individual Anglican churches, totaled up for the city, and published in what was called the Bills of Mortality. Through continuous monitoring, the weekly Bills could serve as an early warning system to indicate when a plague was gathering force. But the Bills of Mortality had other uses too.

“It is a good idea to consider which demographic groups are captured well, and which might be underrepresented or even missing altogether in the data.”

John Graunt was a little-known cloth merchant in London who had a knack for numbers. Graunt is widely considered to be the father of statistics due to the one book he wrote, first published in 1662. Graunt was the first person to assemble the many years’ worth of dusty Bills of Mortality, and then use the data to study the patterns that they revealed.

Notably, he discovered evidence that the official counts of plague deaths in the Bills undercounted how many deaths were actually due to plague. The statistical approach he developed to make that discovery about plague deaths has modern applications. In 2022, using a method that rests on John Graunt’s pioneering work, the World Health Organization estimated for various countries the excess mortality due to COVID-19.

3. Bayes’ Theorem.

Bayes’ Theorem shows the connection between observable effects and hidden underlying causes. This theorem is not a new one, it is simply a very valuable one. It was developed by Thomas Bayes, an English minister, who died in 1761; his theorem was nearly lost, but a friend reviewed Bayes’ papers and published it.

One use of Bayes’ Theorem is for the detection of cancer. A mammogram can be used as a test to assess whether a woman shows signs of breast cancer. Like other medical tests, it is not 100 percent conclusive because sometimes it can be mistaken—in two opposite ways.

A test can miss detecting cancer that is present.
A test can falsely say that cancer is present when the patient is free of cancer.

So, the medical and statistical question becomes: when a mammogram indicates that cancer is present, what is the probability that the patient has cancer?

Bayes’ Theorem answers the question and, fortunately, a solution can be found using a pen-and-paper diagram, which for many people can be more intuitive and interpretable than a complex equation. The test result, along with Bayes’ Theorem, informs and empowers both the patient and the doctor. If a mammogram says that cancer is present, they may decide to proceed with a biopsy for stronger evidence. A benefit of using a mammogram as an initial, non-invasive test is that it spares many women from getting biopsies.

4. Ethics are essential for, and come from, scientific and statistical practice.

In 1965, the biologist Jacob Bronowski published the book Science and Human Values. Bronowski’s central proposition was that certain values or ethics come from within science itself because they are conditions for its practice. Even for a simple verification of a fact, any one of us needs to rely on others and, specifically, to trust their word.

Bronowski wrote much about trust and connected it to the ethical principle of truthfulness. That principle is obligatory for each scientist. Truthfulness includes honesty—that we do not lie. Moreover, truthfulness also demands that we not leave out a relevant part of the truth. That is what the cherry pickers do, meaning those phony experts who pick through data or quotes or the body of evidence to select only those points that make their arguments seem strong.

“Truthfulness also demands that we not leave out a relevant part of the truth.”

To Bronowski, the search for truth creates a web of interrelated values beyond truthfulness, including independence and dissent. However, if honesty and dissent were the only values practiced by scientists, nobody would be listening to anybody else. Science would fail. In response, the community of scientists works hard to instill additional values of mutual respect and tolerance. The Ethical Guidelines of the American Statistical Association provide helpful details on how researchers should conduct themselves.

Today, our society can benefit by learning more about the ethics of science and statistics. The community of truthfulness and respect is not exclusively for scientists. The community knows no bounds, and it helps sustain civil society. It is open to any of us.

5. The scientific method is about disproof rather than proof.

In the mid-1900s, a new perspective emerged that has influenced how scientists think about what they are doing as they conduct experiments and statistical tests. This view holds that science cannot “prove” a hypothesis or theory to be a final, unchanging, irrefutable “proven” truth. Instead, experiments can disprove a hypothesis. The process of disproof is called falsification.

As the tools for statistical testing tools developed in the 1900s, they made use of and contributed to the methodology of falsification. In his monumental 1937 book on designing experiments, Ronald Fisher, a British statistician and geneticist, wrote that the null hypothesis is never proved or established, but is possibly disproved by the experiment.

In Fisher’s language, the “null hypothesis” is the hypothesis or theory that is potentially “nullified”—that is, falsified—by the experiment. And he wrote that every experiment exists only to give the facts a chance of disproving the null hypothesis.

“The process of disproof is called falsification.”

When observed data do fit a hypothesis—when the hypothesis is not falsified—then the data are said to be consistent with the hypothesis or theory, or to support or corroborate it. But that confirmation of the theory does not prove the theory to be true, forever and always.

For example, in the 1900s, Einstein’s general theory of relatively superseded Newton’s theories about gravity and the laws of motion by elaborating and improving on what Newton accomplished. Although tests and applications of Newtonian theory had been successful for generations, even such extensive data did not prove Newtonian theory to be unimprovably true. While today data are consistent with Einstein’s theory, sooner or later that theory may face disparities with data that require it to be modified. The process never ends: there will always be more experiments to run, more anomalies to find, and better theories to imagine.

Enjoy our full library of Book Bites—read by the authors!—in the Next Big Idea App: