P-hacking: How Statistical Manipulation Leads to Bad Science

Scientists have a perverse incentive to publish scientific papers (regardless of their validity) because it is the only major benchmark through which they can receive grants and build reputations. But, in order to publish one’s academic research in official scientific journals one must have novel, original, and unexpected findings. These incentive structures of scientific journals rewarding scientists for research as original as possible while overlooking their rigour and validity, and scientists being driven to find new and groundbreaking work, creates the breeding ground for a kind of statistical manipulation called P-hacking which helps researchers distort their data for their purposes.

This creates bad science, and here’s how it works.

When scientists test a hypothesis, they typically have two. The first is known as a null hypothesis, which says that there is no relationship between the two variables that are being tested, and the second is known as the alternative hypothesis, which would reject the null hypothesis and show a relationship between the two variables.

Oftentimes, scientists are working in less than ideal conditions, and the data they collect could show a relationship between the variable they are manipulating and the variable they are testing, but it would be purely based on chance. How do they decide what is statistically significant, and what is just random? They create a metric called a p-value.

These p-values are a statistical metrics put in place to measure false positives, where the scientist incorrectly rejects a true null hypothesis, perhaps because the data showed a relationship just based off of chance. For a scientist to properly reject the null hypothesis, their p-values need to be statistically significant. The p-value measures the chance of the results indicating a relationship if the null hypothesis were true. What that means is that as the p-value gets smaller, it gets less probable that these results were based on chance, and it would mean that the alternative hypothesis would be true.

Here’s an example of how this can be easily distorted to decrease p-values though, in order to pass off research as rigorous. A study about dark chocolate showed that eating it would reduce weight. There were many problems with this study, but a major one was the number of dependent variables. They tested for 18 different things, like cholesterol levels, blood pressure, weight, etc. So what ended up happening is the probability of a false positive increased with each dependent variable. This works in the same way as the previous problems, wherein the larger number of dependent variables made it statistically improbable for a false positive to not occur. And even if there was no relation with weight, they could’ve easily published another relationship.

Eventually, this study was disproven when recreated, but for a time it went viral, as that is the nature of such findings, yet significant damage was done.

Here’s where it gets interesting though. The scientific journals which publish research, typically have a benchmark of publishing p-values under 0.05. Initially, it may seem that these aren’t that problematic and allow for very little unrigorous research, but they can cause some huge problems. With p-values of 0.05 being acceptable, we can see that this means that every 5 in a 100 academic papers would be false positives.

To conclude, to those of you who like to read scientific studies, it’s important to be aware of the publication bias, and maybe dig deeper into its context. And secondly, there are not a lot of resources which are dedicated to publishing replicated studies, and producing negative results which could dispel distortions of the scientific method. There is a movement growing in the scientific community, to get scientists to submit hypotheses for peer review prior to conducting experiments, and it has the guarantee that no matter what the results, it will be published (thereby eliminating publication bias).

Share this:

Related

Leave a comment Cancel reply