Title: | Structured Wright Fisher Simulator for Malaria Genetics |
---|---|
Description: | Unique features of the malaria life-cycle and transmission dynamics requires extensions of typical population genetic simulators. Using a spatial discrete-loci, discrete-time structured Wright Fisher framework, we simulate malaria population genetics forwards in time. Users are then able to capture the full Ancestral Recombination Graph. |
Authors: | Nick Brazeau [aut, cre], Bob Verity [aut] |
Maintainer: | Nick Brazeau <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2024-10-31 18:36:56 UTC |
Source: | https://github.com/nickbrazeau/polySimIBD |
A list of bv_tree
for each discrete-loci, which constitutes the ARG
The bv_tree
class is a lightweight representation of a marginal tree
c
vector; the node connection for each haplotype (each haplotype is an element in a vector)
t
vector; the timing of the node connection (time to MRCA)
z
vector; the order of coalescence for each set of haplotypes
bvtree
to Newick Tree FormatRecursively converts bvtree
to Newick tree format for compatibility with other downstream packages. The output format named leafs with distances that correspond to coalescent times. Additionally, clades have the total tree length explicitly (versus the acceptable Newick shorthand without total length).
bvtreeToNewick(bvtree, tlim = 10)
bvtreeToNewick(bvtree, tlim = 10)
bvtree |
S3 class; internal class for the |
tlim |
numeric; the maximum number of generations to consider before exiting gracefully if all samples have not coalesced |
Note, the overall time to the most recent common ancestor (TMRCA) is set to the tlim
, which in Newick format looks inappropriately rooted.
Note, the function does not "know" the original tlim
that was specified in the forward-simulation, and thus must be re-stated by the user.
Newick String
Given an object swf
, which is the result of forward
simulation using the function sim_swf()
, walk backwards through the
ancestry and calculate the coalescent tree at every locus for the
specified hosts and/or haplotypes.
get_arg(swf, host_index = NULL, haplo_index = NULL)
get_arg(swf, host_index = NULL, haplo_index = NULL)
swf |
result of forwards simulation using the function |
host_index |
a vector of target hosts. Defaults to all hosts |
haplo_index |
a list of target haplotypes within the hosts specified by
|
Given an object swf
, which is the result of forward
simulation using the function sim_swf()
, walks backwards through the
ancestry and calculate the between host identity by descent. The IBD
calculation is based on Verity et. al 2020, Nat Comms, PMC7192906 and
assumes relatedness if there are any IBD among the haplotypes between hosts
at a given loci (i.e. a loci is considered to be in IBD if there is any between
host IBD among the strains, regardless of COI. This means that as COI increases,
IBD may be overestimated, which has been shown to be a conservative estimand).
get_bvibd(swf, host_index = NULL, haplo_index = NULL)
get_bvibd(swf, host_index = NULL, haplo_index = NULL)
swf |
result of forwards simulation using the function |
host_index |
a vector of target hosts. Defaults to all hosts |
haplo_index |
a list of target haplotypes within the hosts specified by
|
From a single host in a SWF Simulation, extract the effective COI
for loci within the ARG. Effective COI is defined as the number of non-coalesced genomes
at the end of tlim
. Note, this framework is an independent process for each recombination event
and thus will vary along the simulated genome (but not necessarily by locus).
get_effective_coi(swf, host_index = NULL)
get_effective_coi(swf, host_index = NULL)
swf |
result of forwards simulation using the function |
host_index |
a vector of target hosts. Defaults to all hosts |
Function limited to a single host per "realization"
vector of effective COI by loci
The within-host IBD is calculated as the number of strains that have
coalesced within the tlim
at each loci divided by the original (i.e. not effective)
COI. As an example, consider that there are three strains (i.e. parasites) within a host and
that the parasite genome has ten equidistant loci with a single recombination breakpoint at loci 5 (i.e.).
Within this framework, we consider at loci 1:5 if 2/3 strains have coalesced, the
within-host IBD for this section is 2/3. Next, for loci 6:10 if no strains have coalesced
the within-host IBD is 0. Combining these results with-weighting for respective length/portion of the
genome (weights here are equal and therefore negligible) the overall within-host IBD is:
, where one is subtracted from the Host-COI for self-comparison, which gives (3-1) + (3-1) (for each loci). Note, because we consider self comparisons, the denominator is always less than the true COI.
get_within_ibd(swf, host_index = NULL)
get_within_ibd(swf, host_index = NULL)
swf |
result of forwards simulation using the function |
host_index |
a vector of target hosts. Defaults to all hosts |
Function limited to a single host per "realization"
double of within-host IBD
Unique features of the malaria life-cycle and transmission dynamics requires extensions of typical population genetic simulators. Using a spatial discrete-loci, discrete-time structured Wright Fisher model, we simulate malaria population genetics forwards in time. Users are then able to capture the full Ancestral Recombination Graph.
Simulate a population forwards with recombination that approximates the Structured Wright Fisher Process and tracks haplotype identity by descent where individuals represent demes, such that within a deme individual-level COI is considered. The model is also extended to consider spatial demes that individual hosts can move between.
sim_swf(pos, N, m, rho, mean_coi, tlim, migr_mat = 1, verbose = FALSE)
sim_swf(pos, N, m, rho, mean_coi, tlim, migr_mat = 1, verbose = FALSE)
pos |
vector; the genomic coordinates for chromosome and position of the sites |
N |
integer vector; The number of individuals to consider in each deme |
m |
numeric numeric; Probability of internal migration where m represents the probability of moving from host_origin to host_new by m*(1-1/N) of each deme |
rho |
numeric; expected recombination rate |
mean_coi |
numeric vector; The lambda of a right-shifted Poisson process, 1 + Pos(lambda) representing the average COI of each deme |
tlim |
numeric; the maximum number of generations to consider before exiting gracefully if all samples have not coalesced |
migr_mat |
numeric matrix; Migrations rates or probabilities between destination and origin. Note, because this is a Wright-Fisher model, we are drawing parents and therefore migration matrix is parameterized towards "where one came from" versus "where one is headed": origin specified as columns and destination in rows. Default value of 1 indicates non-spatial model. Note, if using a probability matrix, rows must sum to 1 (valid marginal probability); otherwise, values will be assumed to be rates and converted to probabilities. |
verbose |
boolean |
Demes are assumed to be ordered throughout (i.e. the order needs to be consistent between N, m, mean_coi, and the rows and columns of the migration matrix).
The migration matrix is assumed to be a distance matrix that is either a rate or a probability. The program will coerce the matrix into a probability distribution between origin and destination based on the row-sums.
This function is intended to be fed into the get_arg function to summarize the simulation results,
Returns a list of length six that contains
pos: The simulated genetic coordinates
coi: The COI of each individual
recomb: A recombination list of length of tlim where each element contains the recombination block – as a boolean – of the two parental haplotypes.
parent_host1: the parental host assignments for the "paternal" haplotype
parent_host1: the parental host assignments for the "maternal" haplotype
parent_haplo1 "paternal" haplotype assigment (as above)
parent_haplo2 "maternal" haplotype assigment (as above)
Given a bvtree and a vector of indices s
, creates a new
tree which is a subset of the original tree focusing only on the elements
s
.
subset_bvtree(bvtree, s)
subset_bvtree(bvtree, s)
bvtree |
an object of class "bvtree" |
s |
a vector specifying which elements in the bvtree to focus on |
The realization of the Discrete-Time Discrete-Loci Spatial Wright Fisher Malaria Model contains all of the information to create the ARG: each generation's parents, the resulting recombination events between parents, and the offspring haplotypes
pos
vector; the genomic coordinates for chromosome and position of the sites
coi
vector; the COI of each host
recomb
list; recombination blocks for each ancestral host for each generation
parent_host1
list; parent for host 1 for each generation
parent_host2
list; parent for host 2 for each generation
parent_haplo1
list; haplotypes for parent 1 for each generation
parent_haplo2
list; haplotypes for parent 2 for each generation