12 Dec 2021 - tsp

Last update 18 Dec 2021

10 mins

This blog post is a short summary of Bayes rule applied to simplest form of
diagnostic test outcome estimation and contains a simple JavaScript implementation
to evaluate the expressions. I’ve decided to write this short page / article
due to popular demand and many requests of the type “if prevalence would be XY,
how large is the probability of false positive / negatives”. This is a topic
that’s currently taught a school level but nevertheless it seems to be something
like black magic to many people. Please keep in mind **this is no medical advice,
interpretation suggestion for medical tests, etc.** This article is just here to
communicate the basic idea. Correct medical diagnosis is much more
complicated - especially when done correct. As a side comment - this blog is again
as many other my other math articles not a formal description from
fundamental definitions or formally rigid - it should provide a short summary
of the idea in an easy and understandable way.

First lets define some terms:

- The prevalence is the proportion of a population that is affected by a given
condition. For example in epidemiology this is the number of people that
are infected by a given pathogenic germ or who have a given medical condition.
For other sciences this could be the percentage of people who are left handed,
who are smoking, etc. In medicine this is usually specified per 100000 people.
Prevalence is
*not*incidence! They are related though by the duration $t_D$ of the condition via $Prevalence = t_D * incidence$ (When working with the currently popular 7 day incidence that’s already the sum of all new cases during 7 days so one would have to take that into account of course) - The sensitivity of a test describes how large the probability is that the test
detects that condition if one’s sure that the condition exists (i.e. how probability
the test is correct in this case). One can also call this the
*true positive rate* - The specificity on the other hand describes how good a given statistical test
really fails only when the condition is not present (i.e. the probability the
test really is negative in case the condition is not present) - one might
call this the
*true negative rate*.

So what is the Bayes theorem? It provides a way to reverse the order of conditional probabilities. A conditional probability $P(A \mid B)$ is the probability that a given condition $A$ is present if one already is sure that the condition $B$ is true. Mathematically one can say:

[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \\ \to P(A \mid B) * P(B) = P(A \cap B) ]In this case $P(A \cap B)$ is the probability that $A$ and $B$ are both true. One can also formulate this the other way round:

[ P(B \mid A) = \frac{P(B \cap A)}{P(A)} \\ \to P(B \mid A) * P(A) = P(B \cap A) ]Setting both equations equal yields:

[ P(B \mid A) * P(A) = P(A \mid B) * P(B) \\ P(A \mid B) = \frac{P(B \mid A) * P(A)}{P(B)} ]The last line is the typical way Bayes theorem is presented. One can of course define the Theorem for continuous variables in a similar way using their conditional probability densities.

The following are two simple examples that one often looks at when talking about the Bayesian rule. They show in an impressive way why - out of emergency situations where one has only limited capacities for doing a proper check for medical history, risk factor weighting, etc. - one has to be somewhat cautious with the interpretation of such test results, should not fall into panic when a positive test arises and of course never use such a test without consulting proper medical advisors (this might also answer the question why one’s usually not advised to simply go to a lab and request such a check).

The typical example used at school is the HIV test. Let’s assume the prevalence for the disease is around $0.1\%$ (or 100 per 100000). This is the first condition that we term as

[ P(infected) = 0.001 \\ P(not infected) = 0.999 ]Now the sensitivity of such tests is around $97-99\%$, the specificity is broadly varying depending on region and used technology usually between $45\%$ and $86\%$ (see DOI 10.4314/eamj.v85i10.9666) Let’s assume $99\%$ for sensitivity and $86\%$ for specificity. This can be written as the following conditional probabilities:

[ P(positive test \mid infected) = 0.99 \\ P(positive test \mid not infected) = 0.14 \\ P(negative test \mid infected) = 0.01 \\ P(negative test \mid not infected) = 0.84 ]So how large is the probability of an infection if one really only does the laboratory test and the result is positive or negative? One can simply insert that into Bayes rule from above after calculating the total probabilities for positive and negative tests:

[ P(positive test) = P(positive test \mid infected) * P(infected) + P(positive test \mid not infected) * P(not infected) \\ \to P(positive test) = 0.99 * 0.001 + 0.14 * 0.999 = 0.14085 \\ P(negative test) = P(negative test \mid infected) * P(infected) + P(negative test \mid not infected) * P(not infected) \\ \to P(negative test) = 0.99 * 0.001 + 0.84 * 0.999 = 0.84015 ]Now inserting into Bayes rule yields the four expressions:

[ P(infected \mid positive test) = \frac{P(positive test \mid infected) * P(infected)}{P(positive test)} \\ \to P(infected \mid positive test) = \frac{0.99 * 0.001}{0.14085} = 0.00703 \approx 0.703\% \\ P(infected \mid negative test) = \frac{P(negative test \mid infected) * P(infected)}{P(negative test)} \\ \to P(infected \mid negative test) = \frac{0.01 * 0.001}{0.84015} = 1.19026 * 10^{-5} \approx 0.001\% \\ P(not infected \mid positive test) = \frac{P(positive test \mid not infected) * P(not infected)}{P(positive test)} \\ \to P(not infected \mid positive test) = \frac{0.14 * 0.999}{0.14085} = 0.99297 \approx 99.3\% \\ P(not infected \mid negative test) = \frac{P(negative test \mid not infected) * P(not infected)}{P(negative test)} \\ \to P(not infected \mid negative test) = \frac{0.84 * 0.999}{0.84015} = 0.99882 \approx 99.9\% ]As one can see the probability that one is not infected even with a positive test is larger than $99\%$. Thus the test alone will not be used during clinical diagnostic. Usually one weights these results together in a chain of Bayesian classifiers with risk analysis (lifestyle, etc.) as well as other indicators such as symptoms.

So this means:

- In case a test is positive:
- A probability of only $0.703\%$ that one is really infected
- A probability of $99.3\%$ that one is not infected in reality (false positive)

- In case a test is negative:
- A probability of $0.001\%$ one is still infected (false negative)
- A probability of $99.9\%%$ one is really not infected

So now this is the reason many people are currently asking. So let’s see about the quality of the test here. I’ve taken numbers out of DOI 10.1038/s41598-021-94196-3. Please be aware this is a random choice and these numbers vary largely by the laboratories and used technologies as well as the exact procedures. I’ve just taken these numbers to show some rough idea.

- Sensitivity has been shown in the linked study to be at least $98.2\%$
- Specificity is not so easy - they specify $100\%$ though their confidence interval points out $90.4\% - 99.7\%$ - so I’m going to take a point estimate of $95.05\%$ which would be pretty good for such a type of test anyways.

Thus:

[ P(positive test \mid infected) = 0.982 \\ P(positive test \mid not infected) = 0.0495 \\ P(negative test \mid infected) = 0.018 \\ P(negative test \mid not infected) = 0.9505 ]Again using some random prevalence at the time of writing this article - let’s
say an incidence of 367.5 per 100000 people. Note that the Incidence is *not* the same as the prevalence
though (you can look up the Wikipedia article on this).
With an average infection period of 14 days this would be a prevalence of $5145$

Now we can start calculating the overall probabilities for positive and negative tests again:

[ P(positive test) = P(positive test \mid infected) * P(infected) + P(positive test \mid not infected) * P(not infected) \\ \to P(positive test) = 0.982 * 0.05145 + 0.0495 * 0.94855 = 0.097477 \approx 9.75\% \\ P(negative test) = P(negative test \mid infected) * P(infected) + P(negative test \mid not infected) * P(not infected) \\ \to P(negative test) = 0.902523 \approx 90.25\% ]Now moving on and inserting into Bayes rule again:

[ P(infected \mid positive test) = \frac{P(positive test \mid infected) * P(infected)}{P(positive test)} = 0.518316 \approx 51.83\% \\ P(infected \mid negative test) = \frac{P(negative test \mid infected) * P(infected)}{P(negative test)} = 0.001026 \approx 0.10\% \\ P(not infected \mid positive test) = \frac{P(positive test \mid not infected) * P(not infected)}{P(positive test)} = 0.481685 \approx 48.17\% \\ P(not infected \mid negative test) = \frac{P(negative test \mid not infected) * P(not infected)}{P(negative test)} = 0.998974 \approx 99.9\% ]So this means:

- In case a test is positive:
- A probability of $51.8\%$ that one is really infected
- A probability of $48.17\%$ that one is not infected in reality (false positive)

- In case a test is negative:
- A probability of $0.1\%$ one is still infected (false negative)
- A probability of $99.9\%$ one is really not infected

If your browser supports JavaScript the following form allows one to play around with prevalence (keep in mind this is the incidence multiplied by the average period of infection - but the measured incidence might be totally off your measurements depending on test coverage and pattern) and different test parameters. Just enter the numbers you desire and then press enter or switch into another field so the values get updated. In case of invalid values no results will be shown.

Input parameters | |
---|---|

Prevalence: | per 100000 |

Sensitivity: | percent |

Specificity: | percent |

Intermediate values | |

Probability of condition present: | percent |

Probability of condition not present: | percent |

Probability of positive: | percent |

Probability of negative: | percent |

Results | |

Correct positive: | percent |

False positive: | percent |

Correct negative: | percent |

False negative: | percent |

But anyways keep in mind that this is not a substitute for a correct medical interpretation of test results - it should just allow one to get a feeling of what tests might do and under what conditions they are able to do so.

The really simple script is available as a GitHub GIST

This article is tagged: Math, Programming, How stuff works, Tutorial, School math, Statistics

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://jugujbrirx3irwyx.onion/