--- title: "Mathematical Details" author: "Bob Verity" date: "Last updated: `r format(Sys.Date(), '%d %b %Y')`" output: html_document vignette: > %\VignetteIndexEntry{Mathematical Details} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE} library(kableExtra) ``` ## 1. The Bayesian hierarchical model We start by defining the following terms: ```{r, echo=FALSE, message=FALSE, warnings=FALSE} rbind.data.frame(list(Parameter = "$p$", Definition = "Prevalence at the domain level. This is usually the main thing we want to estimate")) %>% rbind.data.frame(list(Parameter = "$r$", Definition = "Intra-cluster correlation coefficient (between 0 and 1)")) %>% rbind.data.frame(list(Parameter = "$\\alpha$, $\\beta$", Definition = "Shape parameters of beta prior on site-level prevalence. Defined based on other model parameters as follows:
$\\alpha = p(1/r - 1)$
$\\beta = (1 - p)(1/r - 1)$")) %>% rbind.data.frame(list(Parameter = "$c$", Definition = "Number of sites")) %>% rbind.data.frame(list(Parameter = "$x_i$", Definition = "Prevalence in site $i$")) %>% rbind.data.frame(list(Parameter = "$n_i$", Definition = "Sample size in site $i$")) %>% rbind.data.frame(list(Parameter = "$k_i$", Definition = "Observed pfhrp2/3 deletions in site $i$")) %>% kable(format = "html", escape = F) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>% row_spec(0, extra_css = "border-bottom: 1px solid; border-top: 1px solid") %>% row_spec(2, extra_css = "border-bottom: 1px solid") %>% scroll_box(width = "800px", height = "380px") ``` We assume a multilevel model in which $p$ is the overall prevalence at the domain level, but the individual site-level prevalence varies around this value according to a beta distribution. The parameters of this beta distribution are chosen such that $r$ represents the level of intra-cluster correlation. Data, in the form of counts of observed *pfhrp2/3* deletions, are assumed to be binomial within each site. The expected value of this distribution is the site-level prevalence. This results in counts following a beta-binomial distribution, with the following probability distribution: $Pr(k_i | n_i, p, r) = {n_i \choose k_i}\frac{B(k_i + \alpha, n_i - k_i + \beta)}{B(\alpha, \beta)}$ where $B(x, y)$ is the beta function. The overall likelihood is the product of this distribution over all sites: $Pr(\mathbf{k} | \mathbf{n}, p, r) = \prod_{i=1}^c Pr(k_i | n_i, p, r)$ We assume beta priors on the two free parameters, $p$ and $r$. By default, the shape parameters of the prior on $p$ are both equal to 1, meaning this simplifies to the uniform distribution. In contrast, based on [historical data](articles/historical_analysis.html) we assume shape parameters of 1 and 9 for the prior on $r$, giving a mean of 0.1 and entertaining values in the plausible range [0, 0.3]: The joint posterior probability is obtained (up to a constant of proportionality) by multiplying the likelihood by both priors: $Pr(p, r | \mathbf{k}, \mathbf{n}) \propto Pr(\mathbf{k} | \mathbf{n}, p, r)Pr(p)Pr(r)$ Finally, the marginal posterior distribution of either $p$ or $r$ can be obtained by integrating over the other free parameter: $Pr(p | \mathbf{k}, \mathbf{n}) \propto \int_0^1 Pr(p, r | \mathbf{k}, \mathbf{n}) dr$ $Pr(r | \mathbf{k}, \mathbf{n}) \propto \int_0^1 Pr(p, r | \mathbf{k}, \mathbf{n}) dp$ This integration is performed inside *DRpower* using an adaptive quadrature-based method. The marginal posterior distributions can be returned in full, or can be summarised using various [measures of central tendency](articles/summarise_prevalence.html) and/or credible intervals (CrI). There are multiple methods of calculating CrIs - we use the high density region (HDI) by default as this avoids the possibility of the maximum a posteriori (MAP) estimate being outside the CrI, which is possible by the more commonly used equal-tailed interval (ETI).