Explanation of Sensitivity and Specificity and How to Use Them

Introduction

Sensitivity and specificity define the accuracy of a given diagnostic test (physical exam finding, lab value, etc.). In order to calculate these values, you need to do a study in a relevant population, with healthy and diseased individuals, and you need to compare your test of interest to a 'gold standard'. For example, with appendicitis, you would look at patients presenting to an emergency department with abdominal pain and judge accuracy against the gold standard of pathology found at surgery. Let's take fever as an example: there will be some patients with appendicitis with fever [true positives (TP)] and some without [false negatives (FN)]. Conversely, there will be some patients who have abdominal pain that is not appendicitis but still have a fever (for example, Pelvic Inflammatory Disease) [false positives (FP)]. Finally, there are patients who have another cause of abdominal pain and are afebrile [true negatives (TN)].

Sensitivity and Specificity

We can thus define some statistics to quantify how good a test is at picking out patients with our disease of interest. The two most commonly reported numbers are sensitivity and specificity. Sensitivity is simply a reflection of how many patients with the disease test positive. Specificity is a measure of your false positive rate.

Note that sensitivity and specificity do not depend on disease prevalence. They do, however, depend greatly on the population you are studying (i.e. what spectrum of disease you consider, and what your 'healthy' control population is). For example, the ACR Criteria for SLE were evaluated compared to patients with other rheumatic disorders - so the specificity reflects how the criteria distinguish Lupus from things like RA or temporal arteritis but do not necessarily reflect how they function in populations of truly healthy people.

The definitions of sensitivity and specificity are as follows:

Sensitivity = TP
TP+FN
= TP
diseased

Specificity = TN
TN+FP
= TN
healthy

So a sensitive test picks up most of the patients with the disease but may also pick up patients without disease. On the other hand, a specific test will pick up only (mostly) patients with the disease but may miss a lot of patients with disease. Sensitivity and specificity are intertwined, and you often must trade-off one for the other; you can play with the interactive simulation below to see this in action. For more details, read our discussion of ROC Curves.

 Separation between Healthy and Diseased Spread Low Medium High Spread Low Medium High Sensitivity: Specificity: AUC: Your browser does not support the HTML canvas. FP / FN TN / TP Cutoff

Interactive simulation of sensitivity and specificity. The graph displays the distributions of healthy and diseased patients on a certain hypothetical test (e.g. fasting blood sugar values for the diagnosis of diabetes). You can adjust the separation between the two distributions as well as their spreads (i.e. how much variability there is within each distribution). The simulation calculates the sensitivity and specificity as well as the area under the ROC Curve (AUC).

However, sensitivity and specificity can trick you - try out a test with sensitivity of 10% and specificity of 90% in the calculator. You will see that this text has positive and negative likelihood ratios (see below) of exactly 1, or in other words, getting a positive or negative result from this test means nothing! Why is this? Let's say you have a positive result; you might think that the disease is more likely, since the specificity is 90%. In fact, the specificity tells you that 10% of healthy people will have a positive result - but by looking at the sensitivity, we see that only 10% of diseased patients will have a positive result. In other words, the rate of positive results is the same between diseased and healthy people, which means that this positive result means you are no more or less likely to be diseased, i.e. that the test is worthless. (You can follow the same logic for a negative result.) This is why likelihood ratios are more helpful than simply looking at sensitivity and specificity.

Likelihood Ratios

Alternatively, the accuracy of a test can be expressed with likelihood ratios. Likelihood ratios express a bit more directly how we should interpret positive and negative test results, while sensitivity and specificity are more intuitively used in selecting a 'rule in' or 'rule out' test to apply. In particular, the likelihood ratio tells us how much more likely a person is to have a disease given a positive (or negative) test result. Given a pre-test probability (see below), we can most easily use likelihood ratios to figure out what our post-test probability is. Note that likelihood ratios and sensitivity/specificity are equivalent: namely, one can easily be transformed into the other. Neither of them depend on the prevalence of disease, though again likelihood ratios depend greatly on the populations studied.

Positive Likelihood Ratio = TP/(TP+FN)
FP/(TN+FP)
= probability of a diseased individual testing positive
probability of a healthy individual testing positive
= sensitivity
1-specificity

Negative Likelihood Ratio = FN/(TP+FN)
TN/(TN+FP)
= probability of a diseased individual testing negative
probability of a healthy individual testing negative
= 1-sensitivity
specificity

As mentioned above, the nice thing about likelihood ratios is that we can use them to ballpark how much a test result will change our probability. If the LR = 1, then the test does not change the probability at all. Generally if the positive LR < 2 (or, similarly, negative LR > 0.5), the test is not very discriminatory.

Predictive Values

Somewhat related to the sensitivity and specificity terms are the predictive values of a test. Unlike the above statistics, predictive values depend greatly on the prevalence of disease as well as the accuracy of the test. Positive predictive value is the probability that a person in your population with a positive test has the disease. Similarly, negative predictive value is the probability that a negative test result in your population means the patient does not have disease.

PPV = TP
TP+FP
= true positive
all positive
NPV = TN
TN+FN
= true negative
all negative

In essence, the PPV or NPV represent the post-test probability, i.e. if you get a certain result, this is the probability that the patient has (or does not have) the disease. See below for more.

Pre- and Post-Test Probabilities

Now that you know the accuracy of your test, the next obvious question is "How do I apply it?" The starting point should be the prevalence of the disease in your population - or another way of thinking is "What is the chance the patient has the disease?". This is called the pre-test probability, i.e. the probability of disease before you do thet test. Using the sensitivity and specificity or positive and negative likelihood ratios, you can then calculate the post-test probability. That is simply the chance the patient has the disease, given the test result you obtained. As you might realize, this is exactly the positive (or negative, if your test is negative) predictive value! Unfortunately, there is no elegant-appearing equation for this calculation from sensitivity / specificity or likelihood ratios.

Calculate

TP = sensitivity * pretest   FP = (1-specificity) * (1-pretest)

FN = (1-sensitivity) * pretest   TN = specificity * (1-pretest)

Then,

Positive Post-test Probability (PPV) = TP
TP+FP
or  Negative Post-test Probability (NPV) = TN
TN+FN

You can also use our calculator, which will do all of this for you and even show you the steps!

Alternatively, if you wish to use likelihood ratios, then calculate pretest odds from pretest probability:

Pretest odds = pretest
1-pretest

Then,

Post-test odds = (pretest odds) * LR [i.e. use PLR if the test is positive]

Finally, you get Post-test Probability:

Post-test Probability = post-test odds
post-test odds + 1

Luckily, GetTheDiagnosis will perform this calculation for you. Just click the link to Switch to Calculator mode on the left nav-bar, and you can apply individual or combinations of tests to see how they affect post-test probabilities. Note that the numbers calculated using a combination of multiple tests may not be accurate if the tests are not independent. For example, a positive dipstick for leukocyte esterase and white cells seen microscopically on urine come from the same process. Thus, having both of those tests positive will in reality offer less weight to your diagnosis than you might think from simply adding their accuracies.