We start by defining the following terms:
Parameter | Definition |
---|---|
p | Prevalence at the domain level. This is usually the main thing we want to estimate |
r | Intra-cluster correlation coefficient (between 0 and 1) |
α, β |
Shape parameters of beta prior on site-level prevalence. Defined based
on other model parameters as follows: α = p(1/r − 1) β = (1 − p)(1/r − 1) |
c | Number of sites |
xi | Prevalence in site i |
ni | Sample size in site i |
ki | Observed pfhrp2/3 deletions in site i |
We assume a multilevel model in which p is the overall prevalence at the domain level, but the individual site-level prevalence varies around this value according to a beta distribution. The parameters of this beta distribution are chosen such that r represents the level of intra-cluster correlation.
Data, in the form of counts of observed pfhrp2/3 deletions, are assumed to be binomial within each site. The expected value of this distribution is the site-level prevalence. This results in counts following a beta-binomial distribution, with the following probability distribution:
$Pr(k_i | n_i, p, r) = {n_i \choose k_i}\frac{B(k_i + \alpha, n_i - k_i + \beta)}{B(\alpha, \beta)}$
where B(x, y) is the beta function. The overall likelihood is the product of this distribution over all sites:
$Pr(\mathbf{k} | \mathbf{n}, p, r) = \prod_{i=1}^c Pr(k_i | n_i, p, r)$
We assume beta priors on the two free parameters, p and r. By default, the shape parameters of the prior on p are both equal to 1, meaning this simplifies to the uniform distribution. In contrast, based on historical data we assume shape parameters of 1 and 9 for the prior on r, giving a mean of 0.1 and entertaining values in the plausible range [0, 0.3]:
The joint posterior probability is obtained (up to a constant of proportionality) by multiplying the likelihood by both priors:
Pr(p, r|k, n) ∝ Pr(k|n, p, r)Pr(p)Pr(r)
Finally, the marginal posterior distribution of either p or r can be obtained by integrating over the other free parameter:
Pr(p|k, n) ∝ ∫01Pr(p, r|k, n)dr
Pr(r|k, n) ∝ ∫01Pr(p, r|k, n)dp
This integration is performed inside DRpower using an adaptive quadrature-based method. The marginal posterior distributions can be returned in full, or can be summarised using various measures of central tendency and/or credible intervals (CrI). There are multiple methods of calculating CrIs - we use the high density region (HDI) by default as this avoids the possibility of the maximum a posteriori (MAP) estimate being outside the CrI, which is possible by the more commonly used equal-tailed interval (ETI).