Title: | Genetic Relatedness Between Polyclonal Infections |
---|---|
Description: | An implementation of Dcifer (Distance for complex infections: fast estimation of relatedness), an identity by descent (IBD) based method to calculate genetic relatedness between polyclonal infections from biallelic and multiallelic data. The package includes functions that format and preprocess the data, implement the method, and visualize the results. Gerlovina et al. (2022) <doi:10.1093/genetics/iyac126>. |
Authors: | Inna Gerlovina [aut, cre] |
Maintainer: | Inna Gerlovina <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-11-01 11:24:27 UTC |
Source: | https://github.com/eppicenter/dcifer |
Calculates population allele frequencies from data, adjusting for COI.
calcAfreq(dsmp, coi, tol = 1e-04, qstart = 0.5)
calcAfreq(dsmp, coi, tol = 1e-04, qstart = 0.5)
dsmp |
a list with each element corresponding to one sample. |
coi |
a vector containing complexity of infection for each sample. |
tol |
convergence tolerance for frequency estimates. |
qstart |
a starting value for frequencies. |
A list of allele frequencies, where each element is a numeric vector containing frequencies for a single locus.
coi <- getCOI(dsmp, lrank = 2) # estimate COI first afreq <- calcAfreq(dsmp, coi, tol = 1e-5)
coi <- getCOI(dsmp, lrank = 2) # estimate COI first afreq <- calcAfreq(dsmp, coi, tol = 1e-5)
Results of relatedness estimation.
dres
dres
A three-dimensional array with 52 columns, 52 rows, and 4 matrices.
Dimension names correspond to sample ID's (rows and columns) and types of
results c("estimate", "p_value", "CI_lower", "CI_upper")
(matrices).
Microhaplotype data from two clinics in Mozambique. Samples are sorted by location.
dsmp
dsmp
A list of 52 elements (samples), each element is a list of 87 elements (loci), which are integer vectors (alleles).
readDat
and readAfreq
read data and population allele
frequencies from csv
files and reformat them for further processing.
formatDat
and formatAfreq
reformat corresponding data frames.
Original data are assumed to be in a long format, with one row per allele.
readDat(sfile, svar, lvar, avar, ...) formatDat(dlong, svar, lvar, avar) readAfreq(afile, lvar, avar, fvar, ...) formatAfreq(aflong, lvar, avar, fvar)
readDat(sfile, svar, lvar, avar, ...) formatDat(dlong, svar, lvar, avar) readAfreq(afile, lvar, avar, fvar, ...) formatAfreq(aflong, lvar, avar, fvar)
sfile |
the name of the file containing sample data. |
svar |
the name of the variable for sample ID. |
lvar |
the name of the variable for locus/marker. |
avar |
the name of the variable for allele/haplotype. |
... |
additional arguments for |
dlong |
a data frame containing sample data. |
afile |
the name of the file containing population allele frequencies. |
fvar |
the name of the variable for population allele frequiency. |
aflong |
a data frame containing population allele frequencies. |
For readDat
and formatDat
, a list with elements
corresponding to samples. Each element of the list is itself a list of
binary vectors, one vector for each locus. For readAfreq
and
formatAfreq
, a list with elements corresponding to loci. The
frequencies at each locus are normalized and sum to 1. Samples, loci, and
alleles are ordered by their IDs/names.
matchAfreq
for making sure that the list containing
sample data is conformable to provided population allele frequencies.
sfile <- system.file("extdata", "MozParagon.csv", package = "dcifer") dsmp <- readDat(sfile, svar = "sampleID", lvar = "locus", avar = "allele") # OR, if the dataset is provided as an R data frame, e.g. dlong <- read.csv(sfile) # reformat only: dsmp <- formatDat(dlong, svar = "sampleID", lvar = "locus", avar = "allele") afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer") afreq2 <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq") # OR, if allele frequencies are provided as an R data frame, e.g. aflong <- read.csv(afile) # reformat only: afreq2 <- formatAfreq(aflong, lvar = "locus", avar = "allele", fvar = "freq") dsmp2 <- matchAfreq(dsmp, afreq2)
sfile <- system.file("extdata", "MozParagon.csv", package = "dcifer") dsmp <- readDat(sfile, svar = "sampleID", lvar = "locus", avar = "allele") # OR, if the dataset is provided as an R data frame, e.g. dlong <- read.csv(sfile) # reformat only: dsmp <- formatDat(dlong, svar = "sampleID", lvar = "locus", avar = "allele") afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer") afreq2 <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq") # OR, if allele frequencies are provided as an R data frame, e.g. aflong <- read.csv(afile) # reformat only: afreq2 <- formatAfreq(aflong, lvar = "locus", avar = "allele", fvar = "freq") dsmp2 <- matchAfreq(dsmp, afreq2)
Generates a grid of parameter values to evaluate over.
generateReval(M, rval = NULL, nr = NULL)
generateReval(M, rval = NULL, nr = NULL)
M |
an integer. |
rval |
a numeric vector with parameter values for the grid. If not
provided, it will be generated by dividing [0,
1] into |
nr |
an integer. Ignored if |
A a matrix with M
rows and nr + 1
or
length(rval)
columns.
reval <- generateReval(M = 2, nr = 1e2) dim(reval)
reval <- generateReval(M = 2, nr = 1e2) dim(reval)
Calculates complexity of infection for a list of samples, using the number of detected alleles.
getCOI(dsmp, lrank = 2)
getCOI(dsmp, lrank = 2)
dsmp |
a list with each element corresponding to one sample. |
lrank |
the rank of the locus that will determine a sample's COI (loci are ranked by the number of detected alleles). |
a vector with estimated COI for each sample.
coi <- getCOI(dsmp, lrank = 2)
coi <- getCOI(dsmp, lrank = 2)
Provides pairwise relatedness estimates within a dataset or between two datasets along with optional p-values and confidence intervals (CI).
ibdDat( dsmp, coi, afreq, dsmp2 = NULL, coi2 = NULL, pval = TRUE, confint = FALSE, rnull = 0, alpha = 0.05, nr = 1000, reval = NULL )
ibdDat( dsmp, coi, afreq, dsmp2 = NULL, coi2 = NULL, pval = TRUE, confint = FALSE, rnull = 0, alpha = 0.05, nr = 1000, reval = NULL )
dsmp |
a list with each element corresponding to one sample. |
coi |
a vector containing complexity of infection for each sample. |
afreq |
a list of allele frequencies. Each element of the list corresponds to a locus. |
dsmp2 |
a list representing a second dataset. |
coi2 |
a vector with complexities of infection for a second dataset. |
pval |
a logical value specifying if p-values should be returned. |
confint |
a logical value specifying if confidence intervals should be returned. |
rnull |
a null value of relatedness parameter for hypothesis testing
(needed if |
alpha |
significance level for a 1 - α confidence region. |
nr |
an integer specifying precision of the estimate: resolution of
a grid of parameter values ([0, 1]
divided into |
reval |
a vector or a single-row matrix. A grid of parameter values, over which the likelihood will be calculated. |
For this function, M is set to 1. If
confint = FALSE
, Newton's method is used to find the estimates,
otherwise the likelihood is calculated for a grid of parameter values.
A matrix if pval
and confint
are FALSE
and
3-dimensional arrays otherwise. The matrices are lower triangular if
distances are calculated within a dataset. For a 3-dimensional array,
stacked matrices contain relatedness estimates, p-values, and endpoints of
confidence intervals (if requested).
ibdPair
for genetic relatedness between two samples
and ibdEstM
for estimating the number of related pairs of
strains.
coi <- getCOI(dsmp, lrank = 2) # estimate COI afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # estimate allele frequencies # subset of samples for faster processing i1 <- 1:15 # from Maputo i2 <- 31:40 # from Inhambane isub <- c(i1, i2) # matrix is returned dres1 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = FALSE) dim(dres1) # test a null hypothesis H0: r = 0, change precision dres2 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, rnull = 0, nr = 1e2) dim(dres2) # test H0: r = 0.2, include 99% confidence intervals dres3 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, confint = TRUE, rnull = 0.2, alpha = 0.01) dres3[2, 1, ] # pairwise relatedness between two datasets, H0: r = 0 drbetween <- ibdDat(dsmp[i1], coi[i1], afreq, dsmp2 = dsmp[i2], coi2 = coi[i2]) dim(drbetween) drbetween[1, 2, ] sum(is.na(drbetween[, , 1]))
coi <- getCOI(dsmp, lrank = 2) # estimate COI afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # estimate allele frequencies # subset of samples for faster processing i1 <- 1:15 # from Maputo i2 <- 31:40 # from Inhambane isub <- c(i1, i2) # matrix is returned dres1 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = FALSE) dim(dres1) # test a null hypothesis H0: r = 0, change precision dres2 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, rnull = 0, nr = 1e2) dim(dres2) # test H0: r = 0.2, include 99% confidence intervals dres3 <- ibdDat(dsmp[isub], coi[isub], afreq, pval = TRUE, confint = TRUE, rnull = 0.2, alpha = 0.01) dres3[2, 1, ] # pairwise relatedness between two datasets, H0: r = 0 drbetween <- ibdDat(dsmp[i1], coi[i1], afreq, dsmp2 = dsmp[i2], coi2 = coi[i2]) dim(drbetween) drbetween[1, 2, ] sum(is.na(drbetween[, , 1]))
Estimates the number of related pairs of strains between two infections along with corresponding relatedness estimates and optional inference.
ibdEstM( pair, coi, afreq, Mmax = 6, pval = FALSE, confreg = FALSE, llik = FALSE, rnull = 0, alpha = 0.05, equalr = FALSE, freqlog = FALSE, nrs = c(1000, 100, 32, 16, 12, 10), revals = NULL, tol0 = 1e-09, logrs = NULL, nevals = NULL, nloc = NULL )
ibdEstM( pair, coi, afreq, Mmax = 6, pval = FALSE, confreg = FALSE, llik = FALSE, rnull = 0, alpha = 0.05, equalr = FALSE, freqlog = FALSE, nrs = c(1000, 100, 32, 16, 12, 10), revals = NULL, tol0 = 1e-09, logrs = NULL, nevals = NULL, nloc = NULL )
pair |
a list of length two containing data for a pair of samples. |
coi |
a vector containing complexity of infection for each sample. |
afreq |
a list of allele frequencies. Each element of the list corresponds to a locus. |
Mmax |
a maximum number of related pairs of strains to evaluate over.
If greater than |
pval , confreg , llik
|
logical values specifying if p-value, confidence
region, and log-likelihood for a range of |
rnull |
a null value of relatedness parameter for hypothesis testing
(needed if |
alpha |
significance level for a 1 - α confidence region. |
equalr |
a logical value. If |
freqlog |
a logical value indicating if |
nrs |
an integer vector where |
revals |
a list where |
tol0 |
a tolerance value for an estimate to be considered zero. |
logrs |
a list where |
nevals |
a vector where |
nloc |
the number of loci. |
A named list if multiple output logical values are TRUE
- or a
vector if only rhat = TRUE
. The output includes:
a relatedness estimate (numeric vector of length corresponding to the estimated number of related pairs);
a p-value if pval = TRUE
;
parameter values from the grid in revals
that are within the
confidence region if confreg = TRUE
;
log-likelihood values for the parameter grid in revals
if
llik = TRUE
.
ibdPair
for estimates of relatedness between two
samples and ibdDat
for pairwise relatedness estimates within
a dataset or between two datasets.
coi <- getCOI(dsmp, lrank = 2) # estimate COI afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # estimate allele frequencies # two samples ipair <- c(21, 17) # for higher COI: c(33, 5): COI = 5-6; c(37, 20): 4-3, c(41, 50): 5-4 Mmax <- min(coi[ipair]) # choose resolution of the grid for different M nrs <- c(1e3, 1e2, 32, 16, 12, 10)[1:Mmax] revals <- mapply(generateReval, 1:Mmax, nr = nrs) (res1 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = FALSE, reval = revals)) (res2 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = TRUE)) # number of related pairs of strains (M') sum(res1 > 0) sum(res2 > 0) # can be 0's
coi <- getCOI(dsmp, lrank = 2) # estimate COI afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # estimate allele frequencies # two samples ipair <- c(21, 17) # for higher COI: c(33, 5): COI = 5-6; c(37, 20): 4-3, c(41, 50): 5-4 Mmax <- min(coi[ipair]) # choose resolution of the grid for different M nrs <- c(1e3, 1e2, 32, 16, 12, 10)[1:Mmax] revals <- mapply(generateReval, 1:Mmax, nr = nrs) (res1 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = FALSE, reval = revals)) (res2 <- ibdEstM(dsmp[ipair], coi[ipair], afreq, Mmax = Mmax, equalr = TRUE)) # number of related pairs of strains (M') sum(res1 > 0) sum(res2 > 0) # can be 0's
Provides estimates of relatedness between a pair of samples along with an optional support curve and inference.
ibdPair( pair, coi, afreq, M, rhat = TRUE, pval = FALSE, confreg = FALSE, llik = FALSE, maxllik = FALSE, rnull = 0, alpha = 0.05, equalr = FALSE, mnewton = NULL, freqlog = FALSE, nr = 1000, reval = NULL, tol = NULL, logr = NULL, neval = NULL, inull = NULL, nloc = NULL )
ibdPair( pair, coi, afreq, M, rhat = TRUE, pval = FALSE, confreg = FALSE, llik = FALSE, maxllik = FALSE, rnull = 0, alpha = 0.05, equalr = FALSE, mnewton = NULL, freqlog = FALSE, nr = 1000, reval = NULL, tol = NULL, logr = NULL, neval = NULL, inull = NULL, nloc = NULL )
pair |
a list of length two containing data for a pair of samples. |
coi |
a vector containing complexity of infection for each sample. |
afreq |
a list of allele frequencies. Each element of the list corresponds to a locus. |
M |
the number of related pairs of strains. |
rhat , pval , confreg , llik , maxllik
|
logical values specifying if
relatedness estimate, p-value, confidence region, log-likelihood for a
range of |
rnull |
a null value of relatedness parameter for hypothesis testing
(needed if |
alpha |
significance level for a 1 - α confidence region. |
equalr |
a logical value. If |
mnewton |
a logical value. If |
freqlog |
a logical value indicating if |
nr |
an integer specifying precision of the estimate: resolution of
a grid of parameter values ([0, 1]
divided into |
reval |
a matrix representing a grid of (r1, ..., rM) combinations, over which the likelihood will be calculated. Each column is a single combination. |
tol |
tolerance for calculating an estimate if |
logr |
a list as returned by |
neval |
the number of relatedness values/combinations to evaluate over. |
inull |
an index of the value/column of |
nloc |
the number of loci. |
Handling of irregular cases:
Allele with population frequency of 0 is present: locus is skipped (does not contribute any information).
Number of unique alleles at a locus is greater than COI: COI will be increased for that locus only.
A named list if multiple output logical values are TRUE
- or a
vector if only rhat = TRUE
or llik = TRUE
. Depending on
these logical values, the following quantities are included:
If rhat = TRUE
, a relatedness estimate (a vector of length 1 if
equalr = TRUE
or of length M if equalr = FALSE
);
If pval = TRUE
, a p-value;
If confreg = TRUE
, relatedness parameter values from the grid
reval
that are within 1 - α confidence region;
If llik = TRUE
, log-likelihood values for relatedness parameter
grid (provided in reval
or determined by nr
);
If maxllik = TRUE
, maximum log-likelihood.
ibdEstM
for estimating the number of related pairs of
strains and ibdDat
for processing multi-sample data.
coi <- getCOI(dsmp, lrank = 2) afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # two samples ipair <- c(21, 17) pair <- dsmp[ipair] coip <- coi[ipair] M <- 2 res1 <- ibdPair(pair, coip, afreq, M = M, confreg = TRUE, alpha = 0.05, equalr = FALSE, reval = revals[[M]]) res2 <- ibdPair(pair, coip, afreq, M = M, llik = TRUE, equalr = TRUE, reval = revals[[1]]) res1$rhat rep(res2$rhat, M) # plot confidence region creg <- cbind(res1$confreg, res1$confreg[2:1, ]) plot(creg[1, ], creg[2, ], xlim = c(0, 1), ylim = c(0, 1), pch = 15, cex = 0.6, col = "cadetblue3", xlab = expression(hat(r)[1]), ylab = expression(hat(r)[2])) points(res1$rhat, rev(res1$rhat), pch = 16) # plot log-likelihood plot(revals[[1]], res2$llik, type = "l", xlab = "r", ylab = "log-likelihood") ipair <- c(41, 50) pair <- dsmp[ipair] coip <- coi[ipair] # rtotal at different values of M with and without equality constraint Mmax <- min(coip) for (M in 1:Mmax) { print(paste0("M = ", M)) print(c(sum(ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = FALSE, reval = revals[[M]])), ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = TRUE)*M)) cat("\n") } # M = 1 # log-likelihood for specific r values ibdPair(pair, coip, afreq, M = 1, rhat = FALSE, pval = FALSE, llik = TRUE, reval = c(0, 0.15, 0.38, 1)) # grid vs Newton's method system.time( ibdPair(pair, coip, afreq, M = 1, mnewton = TRUE, tol = 1e-5)) system.time( ibdPair(pair, coip, afreq, M = 1, mnewton = FALSE, nr = 1e5))
coi <- getCOI(dsmp, lrank = 2) afreq <- calcAfreq(dsmp, coi, tol = 1e-5) # two samples ipair <- c(21, 17) pair <- dsmp[ipair] coip <- coi[ipair] M <- 2 res1 <- ibdPair(pair, coip, afreq, M = M, confreg = TRUE, alpha = 0.05, equalr = FALSE, reval = revals[[M]]) res2 <- ibdPair(pair, coip, afreq, M = M, llik = TRUE, equalr = TRUE, reval = revals[[1]]) res1$rhat rep(res2$rhat, M) # plot confidence region creg <- cbind(res1$confreg, res1$confreg[2:1, ]) plot(creg[1, ], creg[2, ], xlim = c(0, 1), ylim = c(0, 1), pch = 15, cex = 0.6, col = "cadetblue3", xlab = expression(hat(r)[1]), ylab = expression(hat(r)[2])) points(res1$rhat, rev(res1$rhat), pch = 16) # plot log-likelihood plot(revals[[1]], res2$llik, type = "l", xlab = "r", ylab = "log-likelihood") ipair <- c(41, 50) pair <- dsmp[ipair] coip <- coi[ipair] # rtotal at different values of M with and without equality constraint Mmax <- min(coip) for (M in 1:Mmax) { print(paste0("M = ", M)) print(c(sum(ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = FALSE, reval = revals[[M]])), ibdPair(pair, coip, afreq, M = M, pval = FALSE, equalr = TRUE)*M)) cat("\n") } # M = 1 # log-likelihood for specific r values ibdPair(pair, coip, afreq, M = 1, rhat = FALSE, pval = FALSE, llik = TRUE, reval = c(0, 0.15, 0.38, 1)) # grid vs Newton's method system.time( ibdPair(pair, coip, afreq, M = 1, mnewton = TRUE, tol = 1e-5)) system.time( ibdPair(pair, coip, afreq, M = 1, mnewton = FALSE, nr = 1e5))
reval
Calculates logarithms of reval
and 1 - reval
, as well as other
associated quantities.
logReval(reval, M = NULL, neval = NULL, equalr = FALSE)
logReval(reval, M = NULL, neval = NULL, equalr = FALSE)
reval |
a matrix representing a grid of (r1, ..., rM) combinations, over which the likelihood will be calculated. Each column is a single combination. |
M |
the number of related pairs of strains. |
neval |
the number of relatedness values/combinations to evaluate over. |
equalr |
a logical value. If |
For equalr = TRUE
relatedness estimation, reval
should
be a 1 x neval
matrix.
A list of length 5 that contains log(reval)
, log(1 -
reval)
, the number of reval = 1
for each column, the number of
0 < reval < 1
for each column, and sum(log(1 - reval[reval <
1]))
for each column.
reval <- generateReval(M = 2, nr = 1e2) logr <- logReval(reval, M = 2, equalr = FALSE) reval <- generateReval(M = 1, nr = 1e3) logr3 <- logReval(reval, M = 3, equalr = TRUE) logr1 <- logReval(reval, M = 1) all(logr3$sum1r == logr1$sum1r*3)
reval <- generateReval(M = 2, nr = 1e2) logr <- logReval(reval, M = 2, equalr = FALSE) reval <- generateReval(M = 1, nr = 1e3) logr3 <- logReval(reval, M = 3, equalr = TRUE) logr1 <- logReval(reval, M = 1) all(logr3$sum1r == logr1$sum1r*3)
Checks if the list containing sample data is conformable to provided population allele frequencies and reformats it if needed.
matchAfreq(dsmp, afreq)
matchAfreq(dsmp, afreq)
dsmp |
a list with each element corresponding to one sample. |
afreq |
a list of allele frequencies. Each element of the list corresponds to a locus. |
The function reorders loci and alleles in dsmp
to match those
in afreq
and inserts alleles into dsmp
if they are present in
afreq
and not in dsmp
; doesn't handle cases when alleles are
present in dsmp
but not in afreq
. Allele names are required
for this procedure.
A list of the same length as dsmp
, with each element matching
the lengths and the names of afreq
and its elements.
readDat
and readAfreq
for reading in and
reformating data.
afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer") afreq2 <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq") dsmp2 <- matchAfreq(dsmp, afreq2)
afile <- system.file("extdata", "MozAfreq.csv", package = "dcifer") afreq2 <- readAfreq(afile, lvar = "locus", avar = "allele", fvar = "freq") dsmp2 <- matchAfreq(dsmp, afreq2)
Creates a colorbar for a plot.
plotColorbar( rlim = c(0, 1), by = 0.1, at = NULL, horiz = FALSE, col = grDevices::hcl.colors(301, "YlGnBu", rev = TRUE), ... )
plotColorbar( rlim = c(0, 1), by = 0.1, at = NULL, horiz = FALSE, col = grDevices::hcl.colors(301, "YlGnBu", rev = TRUE), ... )
rlim |
the range of values to be represented by colors. |
by |
increment size for tickmark locations. Ignored if |
at |
a vector of tickmark locations. |
horiz |
a logical value specifying if the colorbar should be drawn horizontally. |
col |
the colors for the colorbar. |
... |
other graphical parameters. |
The colorbar will fill the whole plotting region, which needs to be
specified outside of this function to control proportions and location of
the colorbar (see examples). To match the colors in the main plot,
rlim
values should be the same for plotRel
and
plotColorbar
; if rlim = NULL
or rlim = NA
in
plotRel
, provide the actual range of relatedness estimates for
plotColorbar
(see examples).
NULL
; called for plotting.
plotRel
for plotting relatedness estimates.
parstart <- par(no.readonly = TRUE) # save starting graphical parameters # colorbar on the side of the main plot layout(matrix(1:2, 1), width = c(7, 1)) par(mar = c(2, 0, 2, 0) + 0.1) # make symmetric matrix dmat <- dres[, , "estimate"] dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))] isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE) plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1])) abline(v = 26, h = 26, col = "gray45", lty = 5) par(mar = c(2, 1, 2, 2) + 0.1) plotColorbar() # shorter colorbar, tick mark locations provided par(mar = c(2, 0, 2, 0) + 0.1) plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1])) par(mar = c(5, 0.5, 5, 2.5) + 0.1) plotColorbar(at = c(0.0625, 0.125, 0.25, 0.5, 0.78)) par(parstart) # triangular matrix, inset horizontal colorbar par(mar = c(1, 1, 1, 1)) plotRel(dres, rlim = NULL, draw_diag = TRUE, border_diag = 1, alpha = 0.05) par(fig = c(0.3, 1.0, 0.73, 0.83), new = TRUE) rlim <- range(dres[, , 1], na.rm = TRUE) plotColorbar(rlim = rlim, at = c(0.2, 0.4, 0.6, 0.8), horiz = TRUE) par(parstart)
parstart <- par(no.readonly = TRUE) # save starting graphical parameters # colorbar on the side of the main plot layout(matrix(1:2, 1), width = c(7, 1)) par(mar = c(2, 0, 2, 0) + 0.1) # make symmetric matrix dmat <- dres[, , "estimate"] dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))] isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE) plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1])) abline(v = 26, h = 26, col = "gray45", lty = 5) par(mar = c(2, 1, 2, 2) + 0.1) plotColorbar() # shorter colorbar, tick mark locations provided par(mar = c(2, 0, 2, 0) + 0.1) plotRel(dmat, draw_diag = TRUE, isig = rbind(isig, isig[, 2:1])) par(mar = c(5, 0.5, 5, 2.5) + 0.1) plotColorbar(at = c(0.0625, 0.125, 0.25, 0.5, 0.78)) par(parstart) # triangular matrix, inset horizontal colorbar par(mar = c(1, 1, 1, 1)) plotRel(dres, rlim = NULL, draw_diag = TRUE, border_diag = 1, alpha = 0.05) par(fig = c(0.3, 1.0, 0.73, 0.83), new = TRUE) rlim <- range(dres[, , 1], na.rm = TRUE) plotColorbar(rlim = rlim, at = c(0.2, 0.4, 0.6, 0.8), horiz = TRUE) par(parstart)
Represents a matrix of pairwise relatedness estimates with colors corresponding to the levels of relatedness. Optionally, also outlines results of a hypothesis testing. The plot follows a matrix layout.
plotRel( r, rlim = c(0, 1), isig = NULL, alpha = NULL, col = grDevices::hcl.colors(101, "YlGnBu", rev = TRUE), draw_diag = FALSE, col_diag = "gray", border_diag = NA, lwd_diag = 0.5, border_sig = "orangered2", lwd_sig = 1.5, xlab = "", ylab = "", add = FALSE, idlab = FALSE, side_id = c(1, 2), col_id = 1, cex_id = 0.5, srt_id = NULL, ... )
plotRel( r, rlim = c(0, 1), isig = NULL, alpha = NULL, col = grDevices::hcl.colors(101, "YlGnBu", rev = TRUE), draw_diag = FALSE, col_diag = "gray", border_diag = NA, lwd_diag = 0.5, border_sig = "orangered2", lwd_sig = 1.5, xlab = "", ylab = "", add = FALSE, idlab = FALSE, side_id = c(1, 2), col_id = 1, cex_id = 0.5, srt_id = NULL, ... )
r |
a matrix or a 3-dimensional array as returned by
|
rlim |
the range of values for colors. If |
isig |
a matrix with two columns providing indices of relatedness matrix
entries to be outlined ("significant" sample pairs). Takes precedence over
|
alpha |
significance level for hypothesis testing; determines
relatedness matrix entries to be outlined. Ignored if |
col |
the colors for the range of relatedness values. |
draw_diag |
a logical value specifying if diagonal cells should be distinguished from others by a separate color. |
col_diag , border_diag , lwd_diag
|
the color for the fill, the color for
the border, and the line width for the border of diagonal entries. Ignored
if |
border_sig , lwd_sig
|
the color and the line width for outlining entries
specified by |
xlab , ylab
|
axis labels. |
add |
a logical value specifying if the graphics should be added to the existing plot (useful for triangular matrices). |
idlab |
a logical value specifying if sample ID's should be displayed. |
side_id |
an integer vector specifying plot sides for sample ID labels. |
col_id , cex_id
|
numeric vectors for the color and the size of sample ID labels. |
srt_id |
a vector of the same length as |
... |
other graphical parameters. |
NULL
; called for plotting.
plotColorbar
for a colorbar.
parstart <- par(no.readonly = TRUE) # save starting graphical parameters par(mar = c(0.5, 0.5, 0.5, 0.5)) plotRel(dres, alpha = 0.05, draw_diag = TRUE) # draw log of p-values in the upper triangle pmat <- matrix(NA, nrow(dres), ncol(dres)) pmat[upper.tri(pmat)] <- t(log(dres[, , "p_value"]))[upper.tri(pmat)] pmat[pmat == -Inf] <- min(pmat[is.finite(pmat)]) plotRel(pmat, rlim = NULL, draw_diag = TRUE, col = hcl.colors(101, "PuRd"), add = TRUE, col_diag = "slategray2", border_diag = 1) # symmetric matrix, outline significant in upper triangle, display sample ID par(mar = c(3, 3, 0.5, 0.5)) dmat <- dres[, , "estimate"] dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))] isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE) col_id <- rep(c("plum4", "lightblue4"), each = 26) plotRel(dmat, isig = isig[, 2:1], draw_diag = TRUE, idlab = TRUE, col_id = col_id) abline(v = 26, h = 26, col = "gray45", lty = 5) # rotated sample ID labels on all sides par(mar = c(3, 3, 3, 3)) plotRel(dmat, isig = rbind(isig, isig[, 2:1]), border_sig = "magenta2", draw_diag = TRUE, idlab = TRUE, side_id = 1:4, col_id = col_id, srt_id = c(-55, 25, 65, -35)) par(parstart)
parstart <- par(no.readonly = TRUE) # save starting graphical parameters par(mar = c(0.5, 0.5, 0.5, 0.5)) plotRel(dres, alpha = 0.05, draw_diag = TRUE) # draw log of p-values in the upper triangle pmat <- matrix(NA, nrow(dres), ncol(dres)) pmat[upper.tri(pmat)] <- t(log(dres[, , "p_value"]))[upper.tri(pmat)] pmat[pmat == -Inf] <- min(pmat[is.finite(pmat)]) plotRel(pmat, rlim = NULL, draw_diag = TRUE, col = hcl.colors(101, "PuRd"), add = TRUE, col_diag = "slategray2", border_diag = 1) # symmetric matrix, outline significant in upper triangle, display sample ID par(mar = c(3, 3, 0.5, 0.5)) dmat <- dres[, , "estimate"] dmat[upper.tri(dmat)] <- t(dmat)[upper.tri(t(dmat))] isig <- which(dres[, , "p_value"] <= 0.05, arr.ind = TRUE) col_id <- rep(c("plum4", "lightblue4"), each = 26) plotRel(dmat, isig = isig[, 2:1], draw_diag = TRUE, idlab = TRUE, col_id = col_id) abline(v = 26, h = 26, col = "gray45", lty = 5) # rotated sample ID labels on all sides par(mar = c(3, 3, 3, 3)) plotRel(dmat, isig = rbind(isig, isig[, 2:1]), border_sig = "magenta2", draw_diag = TRUE, idlab = TRUE, side_id = 1:4, col_id = col_id, srt_id = c(-55, 25, 65, -35)) par(parstart)
Calculates log-likelihood for a pair of samples at a single locus.
probUxUy( Ux, Uy, nx, ny, probs, M, logj, factj, equalr = FALSE, mnewton = TRUE, reval = NULL, logr = NULL, neval = NULL )
probUxUy( Ux, Uy, nx, ny, probs, M, logj, factj, equalr = FALSE, mnewton = TRUE, reval = NULL, logr = NULL, neval = NULL )
Ux , Uy
|
sets of unique alleles for two samples at a given locus. Vectors
of indices corresponding to ordered probabilities in |
nx , ny
|
complexity of infection for two samples. Vectors of length 1. |
probs |
a vector of population allele frequencies (on a log scale) at a given locus. It is not checked if frequencies on a regular scale sum to 1. |
M |
the number of related pairs of strains. |
logj , factj
|
numeric vectors containing precalculated logarithms and factorials. |
equalr |
a logical value. If |
mnewton |
a logical value. If |
reval |
a matrix representing a grid of (r1, ..., rM) combinations, over which the likelihood will be calculated. Each column is a single combination. |
logr |
a list of length 5 as returned by |
neval |
the number of relatedness values/combinations to evaluate over. |
If mnewton = TRUE
, a vector of length 2 containing
coefficients for fast likelihood calculation;
If mnewton = FALSE
, a vector of length neval
containing
log-likelihoods for a range of parameter values.
Ux <- c(1, 3, 7) # detected alleles at locus t Uy <- c(2, 7) coi <- c(5, 6) aft <- runif(7) # allele frequencies for locus t aft <- log(aft/sum(aft)) logj <- log(1:max(coi)) factj <- lgamma(0:max(coi) + 1) # M = 2, equalr = FALSE M <- 2 reval <- generateReval(M, nr = 1e2) logr <- logReval(reval, M = M) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, equalr = FALSE, logr = logr, neval = ncol(reval)) length(llikt) # M = 2, equalr = TRUE reval <- matrix(seq(0, 1, 0.001), 1) logr <- logReval(reval, M = M, equalr = TRUE) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, equalr = TRUE, logr = logr, neval = ncol(reval)) # M = 1, mnewton = FALSE M <- 1 reval <- matrix(seq(0, 1, 0.001), 1) logr <- logReval(reval, M = M) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = FALSE, reval = reval, logr = logr, neval = ncol(reval)) # M = 1, mnewton = TRUE probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = TRUE)
Ux <- c(1, 3, 7) # detected alleles at locus t Uy <- c(2, 7) coi <- c(5, 6) aft <- runif(7) # allele frequencies for locus t aft <- log(aft/sum(aft)) logj <- log(1:max(coi)) factj <- lgamma(0:max(coi) + 1) # M = 2, equalr = FALSE M <- 2 reval <- generateReval(M, nr = 1e2) logr <- logReval(reval, M = M) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, equalr = FALSE, logr = logr, neval = ncol(reval)) length(llikt) # M = 2, equalr = TRUE reval <- matrix(seq(0, 1, 0.001), 1) logr <- logReval(reval, M = M, equalr = TRUE) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, equalr = TRUE, logr = logr, neval = ncol(reval)) # M = 1, mnewton = FALSE M <- 1 reval <- matrix(seq(0, 1, 0.001), 1) logr <- logReval(reval, M = M) llikt <- probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = FALSE, reval = reval, logr = logr, neval = ncol(reval)) # M = 1, mnewton = TRUE probUxUy(Ux, Uy, coi[1], coi[2], aft, M, logj, factj, mnewton = TRUE)
Precalculated parameter grids for a range of values of M (from 1 to 5).
revals
revals
A list of length 5, where each element corresponds to a single value of M and is a matrix with M rows. Each column of a matrix is a r1 = ... = rM) combination.