| Title: | Bayesian Nonparametric Resolution of Multi-Strain Infections |
|---|---|
| Description: | Implementation of SNP-Slice, a Bayesian nonparametric method for resolving multi-strain infections using slice sampling with stick-breaking construction. The algorithm simultaneously estimates strain haplotypes and links them to hosts from sequencing data. Supports multiple observation models including categorical, Poisson, binomial, and negative binomial distributions. |
| Authors: | Nianqiao Ju [aut], Maxwell Murphy [cre] |
| Maintainer: | Maxwell Murphy <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-03 22:02:50 UTC |
| Source: | https://github.com/PlasmoGenEpi/snp.slicer |
calculate_allele_frequencies( results, snp_indices, use_map = TRUE, n_samples = 100, interval = 0.95, allele_sep = "|" )calculate_allele_frequencies( results, snp_indices, use_map = TRUE, n_samples = 100, interval = 0.95, allele_sep = "|" )
results |
A |
snp_indices |
A vector of SNP indices to treat as a single allele |
use_map |
Logical, whether to use MAP estimates (TRUE) or sample from MCMC (FALSE) |
n_samples |
Number of MCMC samples to use if use_map = FALSE (default: 100) |
interval |
Numeric in (0, 1). Credible interval width when using MCMC (e.g. 0.95). |
allele_sep |
Separator for allele strings (default: "|") |
The structure depends on use_map.
use_map = TRUE)Data frame with columns: allele (string representation of the allele, e.g. "ref|alt|ref" for 3 SNPs), frequency (proportion of total parasites with this allele; sums to 1), count (number of parasites with this allele in the MAP allocation), total_parasites (total parasites in the MAP allocation; same for every row).
use_map = FALSE)Data frame with columns: allele, frequency (posterior mean proportion), frequency_sd (posterior SD of proportion across samples), frequency_lower and frequency_upper (credible interval, e.g. 2.5\
Calculates allele frequencies for a collection of SNPs treated as a single allele. The function takes SNP indices and calculates the frequency of each possible allele combination across all individuals based on their strain allocations.
With use_map = FALSE, counts are summarized as per-sample means rather than sums,
so mean_count and mean_total_parasites are interpretable regardless of n_samples.
Frequency uncertainty (frequency_sd, frequency_lower, frequency_upper) is
computed from the distribution of allele frequencies across MCMC samples.
# Load example results result <- load_example_results()
# Calculate allele frequencies for SNPs 1, 5, and 10 (MAP) allele_freqs <- calculate_allele_frequencies(result, c(1, 5, 10)) print(allele_freqs)
# With MCMC: posterior mean, SD, credible interval, and per-sample mean count if (!is.null(result$mcmc_samples)) allele_freqs_mcmc <- calculate_allele_frequencies(result, c(1, 5, 10), use_map = FALSE, n_samples = 50) print(allele_freqs_mcmc)
For each target set, computes allele frequencies from MCMC/MAP results and returns a list of frequency tables (one per set). Each set is a vector of target indices or target names.
calculate_allele_frequencies_by_sets( results, target_sets, use_map = TRUE, n_samples = 100, interval = 0.95, allele_sep = "|" )calculate_allele_frequencies_by_sets( results, target_sets, use_map = TRUE, n_samples = 100, interval = 0.95, allele_sep = "|" )
results |
A |
target_sets |
List of vectors; each element is target indices (integer) or target names (character) defining one set. If the list is named, those names are used for the returned list. |
use_map |
Logical; use MAP estimates (TRUE) or sample from MCMC (FALSE). |
n_samples |
Number of MCMC samples to use if |
interval |
Credible interval width when |
allele_sep |
Separator for allele strings (default: "|"). |
A named list of data frames, one per target set. List names come from
names(target_sets) or "set_1", "set_2", etc. Each data frame
has the same structure as the return value of calculate_allele_frequencies:
with MAP, columns allele, frequency, count, total_parasites;
with MCMC, columns allele, frequency, frequency_sd,
frequency_lower, frequency_upper, mean_count, n_samples,
and attribute mean_total_parasites. See that function's help for the meaning
of each column.
result <- load_example_results() target_sets <- list(locus_a = c(1, 5), locus_b = c(10)) freqs <- calculate_allele_frequencies_by_sets(result, target_sets) print(freqs$locus_a)result <- load_example_results() target_sets <- list(locus_a = c(1, 5), locus_b = c(10)) freqs <- calculate_allele_frequencies_by_sets(result, target_sets) print(freqs$locus_a)
Returns per-host complexity of infection (COI). With use_map = TRUE
you get a point estimate (MAP); with use_map = FALSE and MCMC samples,
posterior mean, SD, and credible interval are computed.
calculate_individual_coi( results, use_map = TRUE, n_samples = 100, interval = 0.95 )calculate_individual_coi( results, use_map = TRUE, n_samples = 100, interval = 0.95 )
results |
A |
use_map |
If |
n_samples |
When |
interval |
Numeric in (0, 1). Credible interval width when using MCMC (e.g. 0.95 for 2.5 and 97.5 percent quantiles). |
A data frame with one row per host: host_index, host_id, coi_estimate, coi_sd, coi_lower, coi_upper. Uncertainty columns are NA when using MAP or when no MCMC samples are available.
result <- load_example_results() coi_map <- calculate_individual_coi(result, use_map = TRUE) head(coi_map) if (!is.null(result$mcmc_samples)) { coi_post <- calculate_individual_coi(result, use_map = FALSE, n_samples = 50) head(coi_post) }result <- load_example_results() coi_map <- calculate_individual_coi(result, use_map = TRUE) head(coi_map) if (!is.null(result$mcmc_samples)) { coi_post <- calculate_individual_coi(result, use_map = FALSE, n_samples = 50) head(coi_post) }
Clear performance log
clear_performance_log()clear_performance_log()
Calculate effective sample size for MCMC diagnostics
effective_sample_size( results, parameter = "logpost", method = "autocorrelation" )effective_sample_size( results, parameter = "logpost", method = "autocorrelation" )
results |
SNP-Slice results object |
parameter |
Parameter to analyze. Options include:
|
method |
Method for ESS calculation. Options:
|
Effective sample size(s) and diagnostic information
Example reference allele read count data for testing and demonstration purposes.
example_read0example_read0
A matrix of reference allele read counts with 200 hosts and 96 SNPs
Simulated data for package testing
Example alternate allele read count data for testing and demonstration purposes.
example_read1example_read1
A matrix of alternate allele read counts with 200 hosts and 96 SNPs
Simulated data for package testing
Example SNP data for testing and demonstration purposes.
example_snp_dataexample_snp_data
A list containing:
Matrix of reference allele read counts
Matrix of alternate allele read counts
Simulated data for package testing
Extract allocation information from SNP-Slice results
extract_allocations(results)extract_allocations(results)
results |
SNP-Slice results object |
List containing allocation information
Extract strain information from SNP-Slice results
extract_strains(results)extract_strains(results)
results |
SNP-Slice results object |
List containing strain information
Get performance summary
get_performance_summary()get_performance_summary()
Performance summary data frame
Loads pre-computed SNP-Slice analysis results from the example data. These results were generated using the negative binomial model with 2000 MCMC iterations.
load_example_results()load_example_results()
A snp_slice_results object containing the analysis results
# Load the pre-computed results result <- load_example_results() print(result)# Load the pre-computed results result <- load_example_results() print(result)
Utility functions for tracking performance and identifying bottlenecks
performance_envperformance_env
An object of class environment of length 0.
Plot convergence diagnostics
plot_convergence(results, type = "logpost")plot_convergence(results, type = "logpost")
results |
SNP-Slice results object |
type |
Type of plot ("logpost", "kstar", "n_strains") |
Plot object (if ggplot2 is available)
Print performance summary
print_performance_summary()print_performance_summary()
Print comprehensive ESS results for all parameters
## S3 method for class 'ess_all_results' print(x, digits = 2, ...)## S3 method for class 'ess_all_results' print(x, digits = 2, ...)
x |
List of ESS results |
digits |
Number of digits to display |
... |
Additional arguments |
Formatted ESS results
Print effective sample size results
## S3 method for class 'ess_result' print(x, digits = 2, ...)## S3 method for class 'ess_result' print(x, digits = 2, ...)
x |
ESS result object |
digits |
Number of digits to display |
... |
Additional arguments |
Formatted ESS results
Print SNP-Slice results
## S3 method for class 'snp_slice_results' print(x, ...)## S3 method for class 'snp_slice_results' print(x, ...)
x |
SNP-Slice results object |
... |
Additional arguments |
Print information
SNP-Slice is a Bayesian nonparametric method for resolving multi-strain infections using slice sampling with stick-breaking construction. The algorithm simultaneously unveils strain haplotypes and links them to hosts from sequencing data.
snp_slice( data, model = "negative_binomial", n_mcmc = 10000, burnin = NULL, alpha = 2.6, rho = if (model == "categorical") 0.5 else NULL, threshold = 0.001, gap = NULL, seed = NULL, verbose = TRUE, log_performance = FALSE, store_mcmc = FALSE, ... ) snp_slice_categorical(data, e1 = 0.05, e2 = 0.05, ...) snp_slice_poisson(data, ...) snp_slice_binomial(data, ...) snp_slice_negative_binomial(data, ...)snp_slice( data, model = "negative_binomial", n_mcmc = 10000, burnin = NULL, alpha = 2.6, rho = if (model == "categorical") 0.5 else NULL, threshold = 0.001, gap = NULL, seed = NULL, verbose = TRUE, log_performance = FALSE, store_mcmc = FALSE, ... ) snp_slice_categorical(data, e1 = 0.05, e2 = 0.05, ...) snp_slice_poisson(data, ...) snp_slice_binomial(data, ...) snp_slice_negative_binomial(data, ...)
data |
Input data. Can be a matrix, data.frame, or file path. For read count data,
should be a list with elements |
model |
Observation model to use. Options: "categorical", "poisson", "binomial", "negative_binomial" (default). |
n_mcmc |
Number of MCMC iterations (default: 10000). |
burnin |
Burn-in period. If NULL, defaults to n_mcmc/2. |
alpha |
IBP concentration parameter (default: 2.6). |
rho |
Dictionary sparsity parameter (default: 0.5 for categorical model and NULL otherwise, which means use the global minor allele frequency) |
threshold |
Threshold for identifying single infections (default: 0.001). |
gap |
Early stopping threshold. If NULL, runs for full n_mcmc iterations. |
seed |
Random seed for reproducibility. |
verbose |
Whether to print progress information (default: TRUE). |
log_performance |
Whether to log performance metrics (default: FALSE). |
store_mcmc |
Whether to store full MCMC samples (default: FALSE). |
... |
Additional model-specific parameters. |
e1 |
Error parameter for categorical model (default: 0.05) |
e2 |
Error parameter for categorical model (default: 0.05) |
An object of class snp_slice_results containing:
allocation_matrix: Binary allocation matrix (A)
dictionary_matrix: Binary dictionary matrix (D)
mcmc_samples: MCMC samples (if store_mcmc = TRUE)
diagnostics: Convergence diagnostics
parameters: Model parameters used
model_info: Model specification
## Not run: # Example with read count data data <- list( read1 = matrix(c(10, 5, 15, 8), nrow = 2), read0 = matrix(c(90, 95, 85, 92), nrow = 2) ) result <- snp_slice(data, model = "negative_binomial", n_mcmc = 1000) # Extract results strains <- extract_strains(result) allocations <- extract_allocations(result) ## End(Not run)## Not run: # Example with read count data data <- list( read1 = matrix(c(10, 5, 15, 8), nrow = 2), read0 = matrix(c(90, 95, 85, 92), nrow = 2) ) result <- snp_slice(data, model = "negative_binomial", n_mcmc = 1000) # Extract results strains <- extract_strains(result) allocations <- extract_allocations(result) ## End(Not run)
Print summary of SNP-Slice results
## S3 method for class 'snp_slice_results' summary(object, ...)## S3 method for class 'snp_slice_results' summary(object, ...)
object |
SNP-Slice results object |
... |
Additional arguments |
Summary information