--- title: "Background" author: "Bob Verity" date: "Last updated: `r format(Sys.Date(), '%d %b %Y')`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Background} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE} library(tidyverse) ``` ### What are *pfhrp2/3* deletions? The majority of malaria diagnosis worldwide relies on rapid diagnostic tests (RDTs). A large proportion of these tests work by detecting the presence of the HPR2 protein in the blood of an infected individual, which reacts to give a positive result. However, *Plasmodium* parasites have been found in many countries that carry a deletion of the *pfhrp2* gene. Without this gene the parasites do not express the HPR2 protein, rendering them invisible to testing by HRP2-based RDTs. This presents a major clinical problem if these gene deletions rise to high frequencies in the population, as patients presenting with symptoms of malaria may be misdiagnosed, resulting in treatment delays or missed cases. ### What are we doing about it? Many ongoing malaria molecular surveillance (MMS) efforts are trying to establish where *pfhrp2/3* deletions are in the world, where they have already reached high frequencies, and how patterns are changing in space and time. The spread of *pfhrp2/3* deletions is recognised as a global threat to malaria control by the World Health Organization (WHO). The [WHO malaria threats map](https://apps.who.int/malaria/maps/threats/) contains a page on *pfhrp2/3* deletions where one can browse current patterns or download raw data. The WHO have also periodically released guidance on how to run a study looking for *pfhrp2/3* deletions and how to respond appropriately given the results. One such document that is highly relevant here is the [Master protocol for surveillance of *pfhrp2/3* deletions and biobanking to support future research](https://iris.who.int/handle/10665/331197), published in 2020. This document gives guidance on a wide range of issues related to *pfhrp2/3* studies, including data collection and fieldwork, data storage and laboratory analysis, and providing templates such as informed consent forms for participants. ### What does the 2020 Master Protocol recommend? The recommended approach in the 2020 Master Protocol begins with estimating the prevalence of deletions and constructing a 95% confidence interval (CI) around this estimate. There are three possible outcomes: - Outcome 1: The upper limit of the 95% CI is below the 5% threshold (*conclusion = low prevalence*). - Outcome 2: The lower limit of the 95% CI is above the 5% threshold (*conclusion = high prevalence*). - Outcome 3: The 95% CI spans the 5% threshold (*inconclusive*). Examples of these three outcomes are shown in the plot below: ```{r, echo=FALSE, fig.width=5, fig.height=4} data.frame(y = as.factor(1:3), x = c(6, 8, 2.5)) |> ggplot() + theme_bw() + geom_errorbar(aes(x = x, y = y, xmin = x - 2, xmax = x + 2), width = 0.1) + geom_point(aes(x = x, y = y)) + geom_vline(xintercept = 5, linetype = "dashed") + xlim(c(0, 15)) + xlab("Prevalence of deletions (%)") + ylab("") + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) + annotate(geom = "text", x = c(6, 8, 2.5) + 3, y = 1:3, hjust = 0, label = sprintf("Outcome %s", 3:1)) ``` If outcome 2 is found in any domain within a country then the country as a whole is recommended to switch RDTs to a brand that does not rely exclusively on the HRP2 protein. If outcomes 1 or 3 are detected in all domains then some form of periodic monitoring is still advised, but the urgency is lower. ### How does the *DRpower* method fit in? Publication of the Master Protocol was a major step forward in specifying what a well designed *pfhrp2/3* study might look like. However, there were also some issues around power and sample size calculations in this document. Some of these issues were corrected in a later [corrigenda to the original protocol](https://cdn.who.int/media/docs/default-source/malaria/diagnosis/9789240002050-corrigenda.pdf?sfvrsn=d67857_3), but other statistical issues remain (described in detail later). The *DRpower* tool was developed to take a fresh approach to power and sample size calculation for *pfhrp2/3* deletion studies, thereby solving some of the issues left by the 2020 Master Protocol. In some aspects we tried to align closely with the original study design - for example assuming a multi-centre study that attempts to draw inference about the prevalence of *pfhrp2/3* deletions at a higher geographic level than any single health facility. We also use the same "threshold analysis" approach, in which we want to determine if the prevalence of deletions is above 5%. But in other ways we redesigned the statistics from the bottom up. For example, we took a Bayesian statistical approach, and also changed the way we think about intra-cluster correlation. The hypothesis that we are testing is also subtly different from that presented in the 2020 Master protocol. Finally, one of our objectives in developing *DRpower* was to provide greater flexibility at the design stage, while also making the process as simple as possible. This motivated the development of the [pfhrp2/3 Planner](https://shiny.dide.ic.ac.uk/DRpower-app/), which runs the *DRpower* model in a simple web-based interface. ### Defining our terms The next pages go into detail about the issues with the original 2020 Master Protocol, the rationale behind the *DRpower* Bayesian model, how we estimate power under this model, and what we obtain in terms of minimum sample sizes. However, before jumping into this we should define some terminology. First, we assume that the study is run at multiple distinct locations, not just a single location. There are many possible terms used for these locations, for example centre, site, cluster, or health facility. We opt for the name *site*, as it is general enough to allow for different types of facility while also not being a purely statistical term like "cluster". However, we may occasionally move between terms, for example we refer frequently to the "intra-cluster correlation coefficient". We assume that our intention is to pool results over sites in order to estimate the prevalence at a higher geographic level. We refer to this higher geographic level as a *domain*. This term was chosen to be deliberately vague. The actual meaning of a "domain" may vary from one study to the next, and one country to the next. For example, it may be a "region" in one country, a "province" in the next, or an "ADMIN1 unit" in another. We do not talk explicitly about the ways that choice of domain may impact a study, although it almost certainly will, particularly with reference to the level of intra-cluster correlation. Finally, we refer to *pfhrp2/3* deletions to capture the fact that researchers may be interested in either the *pfhrp2* or *pfhrp3* locus. The *pfhrp3* gene expresses the HRP3 protein, which is structurally similar enough to HRP2 that it occasionally recovers the ability of the test to detect malaria. Therefore, if you are infected with a parasite strain that has the gene deletion for *pfhrp2* but not the gene deletion for *pfhrp3* then you *may* still be positive by HRP2-based RDT, although the probability of detection is lower. However, *DRpower* does not explicitly account for joint analysis of multiple loci and so we assume you are powering for just one set of deletion counts.