In the first of a new trilogy we look at the statistical science behind why most new medical research could well mean nothing.
Clearly when L P Hartley wrote his memorable opening line to The Go-Between – “The past is a foreign country: they do things differently there” – he was not referring to scientific research. However, the way scientists approached an experiment back in the old days – which, as we shall see, is not actually all that long ago – and the way they do so now are very different indeed.
It used to be that scientists would come to a view as to why a theory or a drug might work and then they would put that idea to the test. For example, at the turn of the 19th century, Edward Jenner had an idea cowpox could be used to vaccinate against smallpox and, while his experiments on children to prove this might grate with modern-day ethicists, the science of inoculation was thus born.
Since the introduction of computers and the consequent ‘industrialisation’ of science, however, the research cart has as often as not gone before the horse. Compared with even a few decades ago, the ease with which scientists can screen many thousands of genes – or financial analysts many thousands of stocks – and then crunch their numbers means it is the experiments that are giving rise to new theories.
While this may be convenient for writers of some of the scarier health-related headlines in tabloid newspapers – surely it can only be a matter of time before scientists show that excessive reading of the Daily Mail can itself cause cancer – this effective reversal of the scientific process has implications for the reliability of what we know (or think we know).
The big question is whether any new research finding can be said to be true – or could its conclusions actually stem from a quirk of the analysis or underlying numbers? To return to the example of Jenner, after his subjects were exposed to smallpox, their survival could be attributed with some confidence to their prior inoculation, rather than to random chance, as the disease has about a 30% fatality rate.
Over the next two centuries, however, the larger statistical effects – science’s ‘low-hanging fruit’, such as ‘smoking causing cancer’ – have been picked off. Yet, understandably enough, scientists still want to do research, they still want to be employed, and so they end up focusing in on factors with much lower impacts – for example, if you do X, your chances of Y are, say, 5% or 10% higher.
The thing is, if enough of you are looking for these kinds of smaller effects, then – spurred on by a natural desire to hang on to your job and backed by ever-increasing levels of computer firepower that enables you to repeat your tests many thousands of times – inevitably you are going to find results that look real when in fact they are mere coincidence.
The inherent flaws in all this are explored by John Ioannidis, a US professor of health research, in his 2005 paper, unambiguously entitled Why most published research findings are false. Broadly speaking, to be of statistical significance, scientists must be able to show any new research has a less than 1-in-20 chance of being a ‘false positive’. The more you test though, the more false positives you will find.
If you would like a detailed explanation of the underlying science in all this, please follow the above link to the paper but here we will limit ourselves to a sweet-based illustration. You can actually read a cartoon version here but the thrust of it is that a team of researchers sets out to test – to that standard 1-in-20 level of statistical significance – the idea that a particular colour of jellybean causes acne.
The first colour, red, receives the all-clear, as does the second colour orange – all the way up to the 19th colour, that odd-looking brown with pale spots. But guess what – after testing the 20th colour, the scientists are able to announce “with 95% confidence” and “only 5% chance of coincidence” that green jellybeans cause acne.
You will have to take our word for it that the story has them rolling in the aisles at statistical conferences but the point is, the offending jellybean was green by chance. Furthermore, if the researchers had tested 20,000 different colours of jellybeans and not 20, it would have looked as if a lot more colours could cause acne – when the reality is all they did was press a button on their huge computer. As we will see in Part II of The Jellybean Trilogy, the implications of this for research in any sphere, including finance, are profound.