The method of constructing CIs described in section 2.1 is a major improvement over the naive approach described in section 2, as it deals with overdispersion. But there are still some issues with this approach that motivate the need for something better:
CIs are symmetric. The CI formula in section 2.1 is based on the central limit theorem, which argues that the sampling distribution of the statistic is approximately normal when the number of sites is large and when the true prevalence is somewhere near 50%. Unfortunately, in our case both of these assumptions tend to be violated - the number of sites can be very small and prevalence tends to be close to 5%. One consequence is that CIs are forced to be symmetric when in fact they should be asymmetric and typically thinner on the side of small values. This makes us more likely to fail to reject the 5% threshold when in fact there is strong evidence to reject it.
Error is not propagated. The CI formula in section 2.1 uses the sample variance as a fixed value, but this statistic will also be subject to random variation and so should estimated with uncertainty. Ideally, this uncertainty would be propagated through to our CI on the prevalence, but this is not done here. Normally, we could switch to a method like bootstrapping to propagate error, but this is also not appropriate here as it relies on a large number of observations (i.e. sites).
Cannot deal with 0 counts. It is quite common in historical pfhrp2/3 datasets to see zero deletions in some sites, and even sometimes in the study as a whole. This is to be expected when looking for relatively low prevalence of deletions. If there are zero deletions in the study then the sample variance will also equal zero, and so the CI is centred at 0% prevalence and has 0 width. This gives the false impression that we are extremely confident that prevalence is 0%, when in fact this is just an artefact of our analysis. This is particularly problematic when the sample size is very small, as here we are even more likely to see zero deletions, creating the uncomfortable situation that we are most likely to confidently conclude 0% prevalence when we have the least evidence to support this claim!
Can estimate prevalence less than 0%. Similar to the criticism of symmetric CIs above, the fact that we are assuming normally distributed error means that CIs can sometimes go into negative values. This is completely nonsensical, and can create confusion when reporting results. The typical approach is to simply cut these intervals off at 0%, but this is really just a quick fix that ignores the underlying issue.
Shouldn’t this be a hypothesis test? So far, we have assumed that our plan is to construct CIs and then accept or reject the 5% threshold based on this CI. But this confuses prevalence estimation with hypothesis testing, when in fact there may be some advantages to keeping them separate. For example, one could argue that we need a one-tailed hypothesis test as we only really care if prevalence is above this threshold, while at the same time we want a two-sided CI for estimating the prevalence. Mixing these concepts means we may have lower statistical power than we could ideally achieve.
Sometimes these issues will be very minor, and sometimes they may even cancel each other out, but sometimes they won’t! We need to be confident that our analysis plan is robust no matter what our data end up looking like.