How Accurate is Your New Tool or Technology?

By Michael Kattan, PhD

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

Every day we are introduced to a new medical discovery that is promoted as being more accurate than current technology. Examples include new blood tests, imaging modalities and gene panels that claim to be 97 percent accurate. This sounds great. But is it, really? Let’s review some basic accuracy measures to really understand what these claims mean.

Never trust percent accurate

Percent accurate is an extremely popular measure of error because everyone understands what it means. If a new technology makes 100 predictions, the percent accuracy is how many of those 100 predictions were correct. While it is very easy to understand, this method has poor properties as an accuracy measure.

First, measuring percent accuracy does not adjust for the base rate. Let’s say a new blood test claims to be 93 percent accurate for predicting some interesting outcome. How do we know if that is good? Well, if the outcome occurs 50 percent of the time, the test is probably pretty good. However, if the outcome occurs 95 percent of the time, the blood test is awful. Because the outcome happens so frequently, one could simply ignore the test results and predict positive every time — and they would be more accurate than the blood test.

In these cases, a prediction tool/blood test could be manipulated to predict positive more often than it should, which would falsely improve the percent accuracy of the test. Because of this possibility, percent accuracy is not a proper scoring rule. If a new technology touts its percent accuracy, move on to something with a higher quality measure.

Area under the receiver operating characteristic (ROC) curve is a better measure

Area under the ROC curve is also reported often, for good reason. First, it is a proper scoring rule, meaning that manipulating the test result should not improve the test performance. Second, area under the ROC curve properly adjusts for how often the outcome occurs naturally (called base rate).

Area under the ROC curve measures the probability that, if you were to randomly select two patients, one with a positive outcome and one with a negative outcome, the positive outcome is more likely to test positive on your new predictive blood test. This interpretation is not very practical in a clinical setting, however. The clinician wants to know how accurate the new test will be in an individual patient, not a pair of patients in which one is truly positive and one is truly negative.

Area under the ROC curve is measured on a problematic scale

Since area under the ROC curve is a probability, it ranges from 0 to 1. Realistically, though, it ranges from 0.5 to 1. Recall that this measure is the probability that the positive patient tests higher than the negative patient. So, a value of 0 would mean that, in all possible pairs of patients, where one is positive and one is negative, the positive patient had a lower predicted probability of being positive.

In other words, a value of 0 is perfect, but in the wrong direction, and never occurs in real life. A coin toss, where heads is positive and tails is negative, would produce an area under the ROC curve of 0.5 because you would guess correctly which one is positive half of the time.

It is possible that a new test is not only inaccurate, but potentially harmful. If a physician sees that a patient has a zero or 100 percent chance of a particular outcome from a test, it could potentially lead to the wrong course of action, like unnecessary treatment or withholding necessary treatment. That is why it is so important to have a solid method for assessing the accuracy of medical decision-making tools.

A new measure, the index of prediction accuracy (IPA)

To overcome these issues with existing accuracy measures, my lab has produced a new measure, the Index of Prediction Accuracy (IPA). The IPA is a proper scoring rule and improves upon area under the ROC curve by scoring a harmful test lower than a useless test. Also, the IPA reports the accuracy averaged across individual cases, not pairs of cases, like the area under the ROC curve. The IPA scale is reasonably friendly to understand: 100 percent is a perfect test; 0 is useless; and < 0 is harmful. Details regarding the derivation of IPA are available here. Anyone trying to evaluate the performance of a new technology that purports to predict or diagnose an outcome accurately should demand to know the IPA value of the technology.

Dr. Kattan is the Dr. Keyhan and Dr. Jafar Mobasseri Endowed Chair for Innovations in Cancer Research and Chair of the Department of Quantitative Health Sciences.