Title: | Identity-by-Descent Inference of Haploid Recombining Organisms |
---|---|
Description: | Pairwise identity by descent inference of haploid species using single nucleotide polymorphism data. isoRelate can detect IBD in the presence of multi-clonal infections and also provides a function for identifying loci under recent positive selection. |
Authors: | Lyndal Henden [aut, cre] |
Maintainer: | Lyndal Henden <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-10-25 04:32:41 UTC |
Source: | https://github.com/bahlolab/isoRelate |
The matrix A in the equation Ax=b for 2 diploid chromosomes
AmatrixDD(pop_allele_freqs, genotypes)
AmatrixDD(pop_allele_freqs, genotypes)
pop_allele_freqs |
A numeric vector of population allele frequencies for each SNP |
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
The matrix A in the equation Ax=b for 1 haploid and 1 diploid chromosome
AmatrixHD(pop_allele_freqs, genotypes)
AmatrixHD(pop_allele_freqs, genotypes)
pop_allele_freqs |
A numeric vector of population allele frequencies for each SNP |
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
The matrix A in the equation Ax=b for 2 haploid chromosomes
AmatrixHH(pop_allele_freqs, genotypes)
AmatrixHH(pop_allele_freqs, genotypes)
pop_allele_freqs |
A numeric vector of population allele frequencies for each SNP |
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
Gene annotations for the reference genome 3D7 were downloaded from http://www.plasmodb.org/common/downloads/Current_Release/Pfalciparum3D7/gff/data/, release PlasmoDB-28, last modified 23/03/2016.
annotation_genes
annotation_genes
A data frame with 6 columns of information
Chromosomes
Base-pair positions of start of genes
Base-pair positions of end of genes
The positive or negaitve gene strand
Gene name, commonly abbreviated
Gene id
areColors()
checks if colour names are valid
areColors(x)
areColors(x)
x |
vector of length 1 or higher containing numeric or character values for colours |
The vector b in the equation Ax=b for 2 diploid chromosomes
bVectorDD(genotypes)
bVectorDD(genotypes)
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
The vector b in the equation Ax=b for 1 haploid chromosome and 1 diploid chromosome
bVectorHD(genotypes)
bVectorHD(genotypes)
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
The vector b in the equation Ax=b for 2 haploid chromosomes
bVectorHH(genotypes)
bVectorHH(genotypes)
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
Calculate alpha
calculateAlpha(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
calculateAlpha(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
Calculate beta
calculateBeta(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, scale, error, gender_1, gender_2)
calculateBeta(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, scale, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
scale |
A numeric vector containing the scaling values used to scale alpha |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
Calculate gamma
calculateGamma(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
calculateGamma(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
Calculate the log-likelihood of the data
calculateLogLikelihood(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
calculateLogLikelihood(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
calculateMeiosis() estimates the number of meiosis separating a pair of isolates given the global IBD pop_allele_freqs estimates. This method is described in Purcell et al (2007).
calculateMeiosis(omega.0, omega.1, omega.2)
calculateMeiosis(omega.0, omega.1, omega.2)
omega.0 |
A numeric value between 0 and 1 representing the pop_allele_freqs of sharing 0 alleles IBD. The sum of omega.0, omega.1 and omega.2 should equal 1. |
omega.1 |
A numeric value between 0 and 1 representing the pop_allele_freqs of sharing 1 alleles IBD. |
omega.2 |
A numeric value between 0 and 1 representing the pop_allele_freqs of sharing 2 alleles IBD. |
The number of meiosis separating the pair of isoaltes.
Calculates the proportion of missing data for each SNPs or each isolate where missing values are denoted by -1.
Missing values are calculated for each column of genotypes
(where columns are isolates and rows are SNPs),
however genotypes
can be transposed to calculate missingness proportions for SNPs.
calculateMissingness(genotypes)
calculateMissingness(genotypes)
genotypes |
An integer matrix of genotype data of the form -1, 0, 1 and 2 representing missing genotypes, homozygous reference, heterozygous and homozygous alternative respectively. |
A vector of length n
where n
is the number of columns in genotypes
.
calculatePopAlleleFreq
calculates reference allele frequencies for each SNP given genotype data.
calculatePopAlleleFreq(genotypes, moi)
calculatePopAlleleFreq(genotypes, moi)
genotypes |
An integer matrix of genotype data of the form -1, 0, 1 and 2 representing missing genotypes, homozygous reference,
heterozygous and homozygous alternative respectively. Each column of |
moi |
An integer vector of multiplicity of infection (MOI) estimates for each isoalte. Isolate MOI estimates should be ordered such that value |
Calculate scale
calculateScale(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
calculateScale(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
Calculate the Viterbi sequence
calculateViterbi(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
calculateViterbi(number_states, initial_prob, meiosis, number_snps, genotypes, pop_allele_freqs, positions_cM, error, gender_1, gender_2)
number_states |
Integer. The number of IBD states in the model |
initial_prob |
A numeric vector containing the initial state probabilities |
meiosis |
Integer. The number of meiosis separating the two isolates |
number_snps |
Integer. The number of SNPs |
genotypes |
A integer martix containing the genotype calls for a pair of isolates |
pop_allele_freqs |
A numeric vector of population allele frequencies |
positions_cM |
A numeric vector of SNP genetic map positions in cM |
error |
Numeric. The genotype error rate |
gender_1 |
Integer. The MOI estimate of isolate 1 |
gender_2 |
Integer. The MOI estimate of isolate 2 |
A function that prints summary information of clusters
clusterSummary(cluster.list)
clusterSummary(cluster.list)
cluster.list |
a list of isolate groups where unique elements correspond to isolates in unique clusters |
The emission probabilities for 2 diploid chromosomes
emissionProbDD(pop_allele_freq, genotype_1, genotype_2, ibd)
emissionProbDD(pop_allele_freq, genotype_1, genotype_2, ibd)
pop_allele_freq |
The population allele frequency for SNP i. This corresponds to the reference allele |
genotype_1 |
The genotype for isolate 1 from the pair for SNP i |
genotype_2 |
The genotype for isolate 2 from the pair for SNP i |
ibd |
The IBD state |
The emission probabilities for 1 haploid chromosome and 1 diploid chromosome
emissionProbHD(pop_allele_freq, genotype_1, genotype_2, ibd, male_column, female_column)
emissionProbHD(pop_allele_freq, genotype_1, genotype_2, ibd, male_column, female_column)
pop_allele_freq |
The population allele frequency for SNP i. This corresponds to the reference allele |
genotype_1 |
The genotype for isolate 1 from the pair for SNP i |
genotype_2 |
The genotype for isolate 2 from the pair for SNP i |
ibd |
The IBD state |
male_column |
The haploid isolate from the pair. Either 1 or 2 |
female_column |
The diploid isolate from the pair. Either 1 or 2 |
The emission probabilities for 2 haploid chromosomes
emissionProbHH(pop_allele_freq, genotype_1, genotype_2, ibd)
emissionProbHH(pop_allele_freq, genotype_1, genotype_2, ibd)
pop_allele_freq |
The population allele frequency for SNP i. This corresponds to the reference allele |
genotype_1 |
The genotype for isolate 1 from the pair for SNP i |
genotype_2 |
The genotype for isolate 2 from the pair for SNP i |
ibd |
The IBD state |
Calculating the emission probability sumation when missing genotype calls present
emissionProbMissingGeno(pop_allele_freq, genotype_1, genotype_2, error, gender_1, gender_2, ibd_j)
emissionProbMissingGeno(pop_allele_freq, genotype_1, genotype_2, error, gender_1, gender_2, ibd_j)
pop_allele_freq |
The population allele frequency of SNP t |
genotype_1 |
The genotype of isolate 1 at SNP t |
genotype_2 |
The genotype of isolate 2 at SNP t |
error |
The genotype error rate |
gender_1 |
The MOI estimate of isolate 1 |
gender_2 |
The MOI estimate of isolate 2 |
ibd_j |
The IBD state |
The genotyping error probability for 1 diploid chromosome
genotypeErrorD(truth, observed, error)
genotypeErrorD(truth, observed, error)
truth |
The true genotype |
observed |
The observed genotype |
error |
The genotype error rate |
The genotyping error probability for 1 haploid chromosome
genotypeErrorH(truth, observed, error)
genotypeErrorH(truth, observed, error)
truth |
The true genotype |
observed |
The observed genotype |
error |
The genotype error rate |
getColourPaletteMajor()
generates a spectrum colour palette with a specified number of colours.
getColourPaletteMajor(number.colours)
getColourPaletteMajor(number.colours)
number.colours |
numeric. The number of colours to return. |
A character vector of length=number.colours
containing a colour specturm.
getColourPaletteMinor()
generates a specified number of shades of a given colour.getColourPaletteMinor()
generates a specified number of shades of a given colour.
getColourPaletteMinor(major.colour, number.colours)
getColourPaletteMinor(major.colour, number.colours)
major.colour |
character. The colour name or code. |
number.colours |
numeric. The number of colours to return. |
A character vector of length=number.colours
containing colour shades, excluding white.
getGenotypes()
performs pre-analysis data processing of PLINK formatted unphased genotype data,
including removal of SNPs and isolates with high proportions of missing data and SNPs with low minor
allele frequencies. It also calculates SNP allele frequencies from either the
input dataset or a specified reference dataset.
getGenotypes(ped.map, reference.ped.map = NULL, maf = 0.01, isolate.max.missing = 0.1, snp.max.missing = 0.1, chromosomes = NULL, input.map.distance = "cM", reference.map.distance = "cM")
getGenotypes(ped.map, reference.ped.map = NULL, maf = 0.01, isolate.max.missing = 0.1, snp.max.missing = 0.1, chromosomes = NULL, input.map.distance = "cM", reference.map.distance = "cM")
ped.map |
A list with 2 objects:
|
reference.ped.map |
An optional list containing reference data used to calculate SNP allele
frequencies. The list has 2 objects in the same format as |
maf |
A numeric value denoting the smallest minor allele frequency allowed in the analysis. The default value is 0.01. |
isolate.max.missing |
A numeric value denoting the maximum proportion of missing data allowed for each isolate. The default value is 0.1. |
snp.max.missing |
A numeric value denoting the maximum proportion of missing data allowed for each SNP. The default value is 0.1. |
chromosomes |
A vector containing a subset of chromosomes to perform formatting on. The
default value is |
input.map.distance |
A character string of either "M" or "cM" denoting whether the genetic map distances in the input MAP data frame are in Morgans (M) or centi-Morgans (cM). The default is cM. |
reference.map.distance |
A character string of either "M" or "cM" denoting whether the genetic map distances in the reference MAP data frame are in Morgans (M) or centi-Morgans (cM). The default is cM. |
A list of two objects named pedigree
and genotypes
:
A pedigree containing the isolates that remain after filtering.
The pedigree is the first six columns of the PED file and these columns are headed fid, iid, pid, mid, moi
and aff
respectively.
A data frame with the first five columns:
Chromosome (type "character"
, "numeric"
or "integer"
)
SNP identifiers (type "character"
)
Genetic map distance (Morgans, M) (type "numeric"
)
Base-pair position (type "integer"
)
Population allele frequency (type "integer"
)
where each row describes a single SNP. These columns are headed chr, snp_id, pos_M, pos_bp
and freq
respectively.
Columns 6 onwards contain the genotype data for each isolate, where a single column corresponds to a single isolate. These columns are
labeled with merged family IDs and isolate IDs separated by a slash symbol (/).
getIBDparameters
and getIBDsegments
.
# take a look at the data str(png_pedmap) # reformat and filter to call genotypes my_genotypes <- getGenotypes(ped.map = png_pedmap, reference.ped.map = NULL, maf = 0.01, isolate.max.missing = 0.1, snp.max.missing = 0.1, chromosomes = NULL, input.map.distance = "cM", reference.map.distance = "cM")
# take a look at the data str(png_pedmap) # reformat and filter to call genotypes my_genotypes <- getGenotypes(ped.map = png_pedmap, reference.ped.map = NULL, maf = 0.01, isolate.max.missing = 0.1, snp.max.missing = 0.1, chromosomes = NULL, input.map.distance = "cM", reference.map.distance = "cM")
getIBDiclusters()
produces a network of clusters of isolates that have been inferred IBD over a specified interval.
Isolates that are not IBD over the interval are not included in the network or output. The networks are created
using the R package igraph
.
getIBDiclusters(ped.genotypes, ibd.segments, interval = NULL, prop = 0, hi.clust = FALSE)
getIBDiclusters(ped.genotypes, ibd.segments, interval = NULL, prop = 0, hi.clust = FALSE)
ped.genotypes |
A list containing 2 objects. See the |
ibd.segments |
A data frame containing the IBD segments detected by isoRelate.
See the |
interval |
A vector of length 3 containing the region to identify clusters over. This vector should contain the chromosome ID, the start of the interval in base-pairs and the end of the interval in base-pairs; in this order respectively. |
prop |
Numeric value between 0 and 1 (inclusive).
The minimum proportion of an interval (in base-pairs) shared IBD between a pair of isolates in order for the pair to be included in the network.
For example, if |
hi.clust |
Logical. Whether to perform hierarchical clustering using the |
A list of three objects named clusters
, i.network
and hi.clust
:
A list where each object contains the names of isolates that form a disjoint cluster in the network. If hierarchical clustering has been performed then the clusters may not be disjoint.
An igraph
network used in the construction of network plots. See http://igraph.org/r/ doe more details.
Logical. Whether or not hierarchical clustering has been performed.
getGenotypes
, getIBDsegments
and getIBDpclusters
.
# generate the isolates who are IBD over the Plasmodium falciparum CRT gene my_i_clusters <- getIBDiclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3", 403222, 406317), prop=0, hi.clust = FALSE) str(my_i_clusters)
# generate the isolates who are IBD over the Plasmodium falciparum CRT gene my_i_clusters <- getIBDiclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3", 403222, 406317), prop=0, hi.clust = FALSE) str(my_i_clusters)
getIBDiR()
calculates a summary statistic for each SNP that can be used to assess the significance of excess IBD sharing at genomic loci,
thus identifying regions under positive selection.
First relatedness between isolates and SNP allele frequencies are accounted for, then normalization procedures are applied where we assume
our transformed summary statistic follows a chi-squared distribution with 1 degree of freedom.
This allows the calculation of -log10 (P-values) which we denote as the iR statistic.
SNPs with iR values greater than some threshold (i.e. -log10 (P-values) > -log10 (0.05)) provide evidence of positive selection.
getIBDiR
can return NA
iR statistics for a number of reasons, including trying to generate iR statistics when there are no IBD pairs
or when all pairs are IBD, or when only several isolates are analyzed.
getIBDiR(ped.genotypes, ibd.matrix, groups = NULL)
getIBDiR(ped.genotypes, ibd.matrix, groups = NULL)
ped.genotypes |
A list containing 2 objects. See the |
ibd.matrix |
A data frame containing the binary IBD information for each SNP and each pair.
See the returned |
groups |
A data frame with 3 columns of information:
where IBD proportions are calculated for
Group ID, for example, can be the geographic regions where the isolates were collected.
The default is |
A data frame the following 7 columns:
Chromosome (type "character"
, "numeric"
or "integer"
)
SNP identifiers (type "character"
)
Genetic map distance (centi morgans, cM) (type "numeric"
)
Base-pair position (type "integer"
)
Population (type "character"
or "numeric"
)
Subpopulation (type "character"
or "numeric"
)
iR statistic (type "numeric"
)
-log10 p vlaue (type "numeric"
)
where each row describes a unique SNP.
The column Population
is filled with 1s by default, while Subpopulation
contains the group IDs from groups
,
where the proportion of pairs IBD has been calculated for all pairs of isolates belonging to the same group as well as all pairs of
isolates where each isolate belongs to a different group.
If groups=NULL
then Subpopulation
will be filled with 0s also.
The population columns have been included for plotting purposes.
The data frame is headed chr, snp_id, pos_M, pos_bp, pop, subpop, iR
and log10_pvalue
respectively.
getGenotypes
, getIBDmatrix
and getIBDproportion
.
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the significance of IBD sharing my_iR <- getIBDiR(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL)
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the significance of IBD sharing my_iR <- getIBDiR(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL)
getIBDmatrix()
produces a binary matrix of IBD (1) and non-IBD (0) results for each SNP and isolate pair combination.
Each row identifies a unique SNP while each column identifies a unique isolate pair.
getIBDmatrix(ped.genotypes, ibd.segments)
getIBDmatrix(ped.genotypes, ibd.segments)
ped.genotypes |
A list containing 2 objects. See the |
ibd.segments |
A data frame containing the IBD segments detected by isoRelate.
See the |
A data frame with the first four columns:
Chromosome (type "character"
, "numeric"
or "integer"
)
SNP identifiers (type "character"
)
Genetic map distance (centi morgans, cM) (type "numeric"
)
Base-pair position (type "integer"
)
where each row describes a unique SNP. Columns 1-4 are headed chr, snp_id, pos_M
and pos_bp
respectively.
Columns 5 onwards contain the binary IBD information for each isolate pair, where a single column corresponds to a single pair.
These columns are labeled with merged family IDs and isolate IDs separated by a slash symbol (/). For example fid1/iid1/fid2/iid2.
getGenotypes
, getIBDsegments
, getIBDproportion
, getIBDiR
.
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd)
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd)
getIBDparameters()
estimates the number of meiosis and the probabilities of sharing 0, 1 and 2 alleles IBD between all pairwise
combinations of isolates.
getIBDparameters(ped.genotypes, number.cores = 1)
getIBDparameters(ped.genotypes, number.cores = 1)
ped.genotypes |
A list containing 2 objects. See the |
number.cores |
Positive integer. The number of cores used for parallel execution. |
A data frame with the following eight columns:
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
The number of meiosis separating the pair
Probability of sharing 0 alleles IBD
Probability of sharing 1 allele IBD
Probability of sharing 2 alleles IBD
where each row describes a unique pair of isolates. The data frame is headed fid1, iid1, fid2, iid2, m, ibd0, ibd1
and ibd2
respectively.
getGenotypes
and getIBDsegments
.
# following processing and filtering of genotype data, # we estimate the proportion of genome shared IBD my_parameters <- getIBDparameters(ped.genotypes = png_genotypes, number.cores = 1) head(my_parameters)
# following processing and filtering of genotype data, # we estimate the proportion of genome shared IBD my_parameters <- getIBDparameters(ped.genotypes = png_genotypes, number.cores = 1) head(my_parameters)
getIBDpclusters()
produces a network of clusters of isolates that share a minimum proportion of genome IBD.
Isolates that do not share a minimum proportion IBD are not included in the network or output.
The networks are created using R package igraph
.
getIBDpclusters(ped.genotypes, ibd.segments, prop = 1, hi.clust = FALSE)
getIBDpclusters(ped.genotypes, ibd.segments, prop = 1, hi.clust = FALSE)
ped.genotypes |
A list containing 2 objects. See the |
ibd.segments |
A data frame containing the IBD segments detected by isoRelate.
See the |
prop |
Numeric value between (0,1].
The minimum proportion of genome shared IBD between a pair of isolates in order for the pair to be included in the network.
For example, if |
hi.clust |
Logical. Whether to perform hierarchical clustering using the |
A list of three objects named clusters
, i.network
and hi.clust
:
A list where each object contains the names of isolates that form a disjoint cluster in the network. If hierarchical clustering has been performed then the clusters may not be disjoint.
An igraph
network used in the construction of network plots. See http://igraph.org/r/ doe more details.
Logical. Whether or not hierarchical clustering has been performed.
getGenotypes
, getIBDsegments
and getIBDiclusters
.
getIBDposterior()
calculates the posterior probabilities of IBD sharing between pairs of isolates.
getIBDposterior(ped.genotypes, parameters, number.cores = 1, error = 0.001)
getIBDposterior(ped.genotypes, parameters, number.cores = 1, error = 0.001)
ped.genotypes |
A list containing 2 objects.
See the |
parameters |
A data frame containing meioses and IBD probability estimates for all pairwise combinations of isolates.
See the |
number.cores |
Positive integer. The number of cores used for parallel execution. |
error |
The genotyping error rate. The default value is 0.001. |
A data frame with the first four columns:
Chromosome
SNP identifiers
Genetic map distance
Base-pair position
where each row describes a single SNP. These columns are headed chr, snp_id, pos_M
and pos_bp
respectively.
Columns 5 onwards contain the posterior probabilities for each pair of isolates, where a single column corresponds to one pair of isolates.
These columns are labeled with merged family IDs and isolate IDs separated by a slash symbol (/).
## Not run: # calculate the posterior probability of IBD sharing # note: this can take a while to run if there are many pairs my_posterior <- getIBDposterior(ped.genotypes = png_genotypes, parameters = png_parameters, number.cores = 1, error = 0.001) head(my_posterior[,1:10]) ## End(Not run)
## Not run: # calculate the posterior probability of IBD sharing # note: this can take a while to run if there are many pairs my_posterior <- getIBDposterior(ped.genotypes = png_genotypes, parameters = png_parameters, number.cores = 1, error = 0.001) head(my_posterior[,1:10]) ## End(Not run)
getIBDproportion()
calculates the proportion of pairs inferred IBD at each SNP.
getIBDproportion(ped.genotypes, ibd.matrix, groups = NULL)
getIBDproportion(ped.genotypes, ibd.matrix, groups = NULL)
ped.genotypes |
A list containing 2 objects. See the |
ibd.matrix |
A data frame containing the binary IBD information for each SNP and each pair.
See the returned |
groups |
A data frame with 3 columns of information:
where, if specified, IBD proportions are calculated for
Group ID, for example, can be the geographic regions where the isolates were collected.
The default is |
A data frame the following 7 columns:
Chromosome (type "character"
, "numeric"
or "integer"
)
SNP identifiers (type "character"
)
Genetic map distance (centi morgans, cM) (type "numeric"
)
Base-pair position (type "integer"
)
Population (type "character"
or "numeric"
)
Subpopulation (type "character"
or "numeric"
)
Proportion of pairs IBD (type "integer"
)
where each row describes a unique SNP.
The column Population
is filled with 1s by default, while Subpopulation
contains the group IDs from groups
,
where the proportion of pairs IBD has been calculated for all isolates belonging to the same group as well as all isolates from different groups.
If groups=NULL
then Subpopulation
will be filled with 1s also.
The population columns have been included for plotting purposes.
The data frame is headed chr, snp_id, pos_M, pos_bp, pop, subpop
and prop_ibd
respectively.
getGenotypes
, getIBDmatrix
and getIBDiR
.
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the proportion of pairs IBD at each SNP my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = my_groups) head(my_proportion)
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the proportion of pairs IBD at each SNP my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = my_groups) head(my_proportion)
getIBDsegments()
detects genomic regions shared IBD between all pairwise combinations of isolates.
getIBDsegments(ped.genotypes, parameters, number.cores = 1, minimum.snps = 20, minimum.length.bp = 50000, error = 0.001)
getIBDsegments(ped.genotypes, parameters, number.cores = 1, minimum.snps = 20, minimum.length.bp = 50000, error = 0.001)
ped.genotypes |
A list containing 2 objects.
See the |
parameters |
A data frame containing meioses and IBD probability estimates for all pairwise combinations of isolates.
See the |
number.cores |
Positive integer. The number of cores used for parallel execution. |
minimum.snps |
An integer value denoting the minimum number of SNPs in an IBD segment for it to be reported. The default value is 20 SNPs. |
minimum.length.bp |
The minimum length of a reported IBD segment. The default value is 50,000 bp. |
error |
The genotyping error rate. The default value is 0.001. |
A data frame with the following columns
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
Chromosome
Start SNP
End SNP
Start position bp
End position bp
Start position M
End position M
Number of SNPs
Length bp
Length M
IBD status (1 = one allele shared IBD, 2 = two alleles shared IBD)
where each row describes a unique IBD segment. The data frame is headed fid1, iid1, fid2, iid2, chr, start_snp, end_snp,
start_position_bp, end_position_bp, start_position_M, end_position_M, number_snps, length_bp, length_M
and ibd_status
respectively.
getGenotypes
and getIBDparameters
.
## Not run: # prior to IBD detection, parameter estimates must be estimated. # Assuming this has been done, IBD inference is performed my_ibd <- getIBDsegments(ped.genotypes = png_genotypes, parameters = png_parameters, number.cores = 1, minimum.snps = 20, minimum.length.bp = 50000, error = 0.001) head(my_ibd) ## End(Not run)
## Not run: # prior to IBD detection, parameter estimates must be estimated. # Assuming this has been done, IBD inference is performed my_ibd <- getIBDsegments(ped.genotypes = png_genotypes, parameters = png_parameters, number.cores = 1, minimum.snps = 20, minimum.length.bp = 50000, error = 0.001) head(my_ibd) ## End(Not run)
getIBDsummary()
prints a brief summary of the detected IBD segments to the console.
getIBDsummary(ped.genotypes, ibd.segments)
getIBDsummary(ped.genotypes, ibd.segments)
ped.genotypes |
A list containing 2 objects.
See the |
ibd.segments |
A data frame containing the IBD segments detected by isoRelate.
See the |
function to find IBD that overlap interval
getOverlap(region.1, region.2)
getOverlap(region.1, region.2)
region.1 |
IBD segment boundary |
region.2 |
interest interval |
Creates a data frame containing family IDs and isolate IDs for each pair to be analysed. Each row corresponds to a unique pair.
groupPairs(group)
groupPairs(group)
group |
A character vector of all family IDs |
haplotypeToGenotype
transforms PLINK haplotype data into genotype data of the form -1, 0, 1 and 2 representing missing genotypes,
homozygous reference, heterozygous and homozygous alternative respectively. Haploid isolates are coded as diploid although will not have
heterozygous genotypes.
haplotypeToGenotype(haplotypes, moi)
haplotypeToGenotype(haplotypes, moi)
haplotypes |
An integer matrix of haplotype data in PLINK format. A allele is denoted 1, B allele is denoted 2 and missing data is denoted 0 |
moi |
An integer vector of multiplicity of infection (MOI) estimates for each isoalte. Isolate MOI estimates should be ordered such that value |
A matrix with genotype calls where columns correspond to isolates and rows correspond to SNPs
Gene annotations for 31 commonly studied Plasmodium falciparum genes, including antimalarial drug resistance genes, vaccine candidates and var genes. Gene annotations are for the reference genome 3D7 were downloaded from http://www.plasmodb.org/common/downloads/Current_Release/Pfalciparum3D7/gff/data/, release PlasmoDB-28, last modified 23/03/2016.
highlight_genes
highlight_genes
A data frame with 5 columns of information
Chromosomes
Base-pair positions of start of genes
Base-pair positions of end of genes
Gene name, commonly abbreviated
Gene id
IBDLabel
is a function used to label unique IBD segments in a pair of isolates for a particular chromosome by determining
breakpoints in IBD vs non-IBD regions. IBD segments are labelled in sequential order genome wide.
IBDLabel(snp_id, number_snps)
IBDLabel(snp_id, number_snps)
snp_id |
A numeric vector of SNP identifiers for IBD segments on the chromosome of interest. |
number_snps |
integer. The number of IBD SNPs for the chromosome of interest. |
Creates a binary matrix of IBD (1) and non-IBD (0) with each row representing a single SNP and each column representing a unique pair. The number of rows is equal to the total number of SNPs and the number of columns is equal to the number of pairs.
IBDMatrix(chromosomes, positions_bp, number_pairs, ibd_pairs_colnumbers, ibd_chromosomes, ibd_start_bp, ibd_stop_bp)
IBDMatrix(chromosomes, positions_bp, number_pairs, ibd_pairs_colnumbers, ibd_chromosomes, ibd_start_bp, ibd_stop_bp)
chromosomes |
A character vector containing the corresponding chromosome for each SNP |
positions_bp |
A numeric vector containing the corresponding bp position for each SNP |
number_pairs |
Numeric. The total number of pairs analysed |
ibd_pairs_colnumbers |
A numeric vector corresponding to column numbers in the output matrix where each unique number refers to a unique pair with IBD inferred |
ibd_chromosomes |
A character vector containing the chromosome for each detected IBD segment |
ibd_start_bp |
A numeric vector containing the base-pair position for the start of each detected IBD segment |
ibd_stop_bp |
A numeric vector containing the base-pair position for the end of each detected IBD segment |
IBDparameters() calculates IBD probabilities then estimates the number of meiosis for an isolate pair.
IBDparameters(genotypes, pop_allele_freqs, gender_1, gender_2)
IBDparameters(genotypes, pop_allele_freqs, gender_1, gender_2)
genotypes |
An integer matrix of genotype calls for a pair of isolates. Each coloumn represents and isolate and each row represents a SNP. |
pop_allele_freqs |
A numeric vector of population allele frequencies for each SNP. |
gender_1 |
An integer denoting the MOI value of isoalte 1. |
gender_2 |
An integer denoting the MOI value of isoalte 2. |
A vector of 4 values representing the number of meiosis and the probabilities of sharing 0, 1 and 2 alleles IBD respectively.
IBDTable()
produces summaries of detected IBD segments for a single pair of isolates. These summaries include the genetic
map start and end
of IBD segments in bp, cM and SNP identifiers; and lengths of IBD segments in bp, cM and SNPs.
IBDTable(ibd.results)
IBDTable(ibd.results)
ibd.results |
A data frame containing family ID and isolate ID for isolate 1, family ID and isolate ID for isolate 2, numeric SNP identifiers, chromosome identifiers, genetic map positions of SNPs in Morgans (M) and base-pairs (bp) and the Viterbi results respectively, for all SNPs. |
A data frame containing a summary of all IBD segments inferred for this pair of isolates. The data frame contains the following columns:
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
Chromosome
SNP identifier
Start SNP
End SNP
Start position bp
End position bp
Start position M
End position M
Number of SNPs
Length bp
Length M
IBD status (1 = 1 allele shared IBD, 2 = 2 alleles shared IBD)
iRfunction()
calculates the iR statistic used to assess the significance of excess IBD sharing at loci
in the genome. The final statistic, -log10 (P-values), is returned for each SNP.
iRfunction(locus.matrix, frequency)
iRfunction(locus.matrix, frequency)
locus.matrix |
A data frame containing binary IBD values. See |
frequency |
A vector of population allele frequencies for each SNP. |
Creates a data frame containing family IDs and isolate IDs for each pair to be analysed. Each row corresponds to a unique pair.
isolatePairs(fid, iid)
isolatePairs(fid, iid)
fid |
A character vector of all family IDs |
iid |
A character vector of all individual ID |
merge_lists()
is a function used to merge summary IBD results for multiple pairs when running the IBD analysis on
multiple cores
merge_lists(A, B)
merge_lists(A, B)
A |
List with 2 objects for one pair. Object 1 is the IBD summaries for a pair and object 2 is the IBD posterior probabilities. |
B |
List with 2 objects for another pair. Object 1 is the IBD summaries for a pair and object 2 is the IBD posterior probabilities. |
A list with 2 objects containing merged lists from A
and B
above.
The IBD segments inferred using isoRelate with the parameter settings as in the Vignette.
my_ibd
my_ibd
A data frame with
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
Chromosome
IBD segment start SNP ID
IBD segment end SNP ID
IBD segment start SNP base-pair position
IBD segment end SNP base-pair position
IBD segment start SNP morgan position
IBD segment end SNP morgan position
Number of SNPs in IBD segment
Length of IBD segment in base-pairs
Length of IBD segment in morgans
The number of alleles shared IBD (either 1 or 2)
plotIBDclusters()
Produces a figure of an isoRelate cluster network, where unique isolates are represented by vertices and a line is
drawn between two vertices if the isolates have been inferred IBD via the criteria specified in either getIBDiclusters
or
getIBDpclusters
.The networks are created using the R package igraph
.
plotIBDclusters(ped.genotypes, clusters, groups = NULL, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE)
plotIBDclusters(ped.genotypes, clusters, groups = NULL, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE)
ped.genotypes |
A list containing 2 objects. See the |
clusters |
A named list of three objects containing network information.
See the |
groups |
A data frame with 3 columns of information:
Group ID, for example, can be the geographic regions where the isolates were collected.
If |
vertex.color |
A vector of characters or numeric values of the vertex colors in the network.
If |
vertex.frame.color |
Character string or numeric value. A single color that will be used as the vertex border. Default is |
vertex.size |
Numeric value indicating the size of the vertices in the network. Default is |
vertex.name |
Logical. Whether to add isolate names to the vertices. Default is |
edge.color |
Character string or numeric value. A single color to be used for all edges. Default is |
edge.width |
Numeric. A single value indicating the width of the edges. Default is |
mark.border |
Character string or numeric value. A single color to be used for all borders in hierarchical clustering groups. Default is |
mark.col |
Character string or numeric value. A single color to be used to fill hierarchical clustering groupings. Default is |
add.legend |
Logical. Whether to include a legend in the plot. Default is |
legend.x |
Numerical. A single value indicating the x-coordinate of the legend, with default |
legend.y |
Numerical. A single value indicating the y-coordinate of the legend, with default |
layout |
A matrix containing the x and y coordinates of the vertices, generated using the Fruchterman-Reingold force-directed layout. |
return.layout |
Logical. Whether or not to return the layout matrix (vertex positions) in the network.
This layout can be used as the input for the parameter |
getGenotypes
, getIBDpclusters
and getIBDiclusters
.
# generate the isolates who are IBD over the Plasmodium falciparum CRT gene my_i_clusters <- getIBDiclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3", 403222, 406317), prop=0, hi.clust = FALSE) str(my_i_clusters) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" # plot the network of clusters plotIBDclusters(ped.genotypes = png_genotypes, clusters = my_i_clusters, groups = my_groups, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE) # generate the isolates who share at least than 90% of their genome IBD my_p_clusters <- getIBDpclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, prop=0.9, hi.clust = FALSE) # plot the network of clusters plotIBDclusters(ped.genotypes = png_genotypes, clusters = my_p_clusters, groups = my_groups, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE)
# generate the isolates who are IBD over the Plasmodium falciparum CRT gene my_i_clusters <- getIBDiclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3", 403222, 406317), prop=0, hi.clust = FALSE) str(my_i_clusters) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" # plot the network of clusters plotIBDclusters(ped.genotypes = png_genotypes, clusters = my_i_clusters, groups = my_groups, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE) # generate the isolates who share at least than 90% of their genome IBD my_p_clusters <- getIBDpclusters(ped.genotypes = png_genotypes, ibd.segments = png_ibd, prop=0.9, hi.clust = FALSE) # plot the network of clusters plotIBDclusters(ped.genotypes = png_genotypes, clusters = my_p_clusters, groups = my_groups, vertex.color = NULL, vertex.frame.color = "white", vertex.size = 4, vertex.name = FALSE, edge.color = "gray60", edge.width = 0.8, mark.border = "white", mark.col = "gray94", add.legend = TRUE, legend.x = -1.5, legend.y = -0.25, layout = NULL, return.layout = FALSE)
plotIBDiR()
plots the -log10 (p-values) used to assess the significance of excess IBD sharing.
plotIBDiR(ibd.iR, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, point.size = 1, point.color = NULL, add.rug = FALSE, plot.title = NULL, add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed")
plotIBDiR(ibd.iR, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, point.size = 1, point.color = NULL, add.rug = FALSE, plot.title = NULL, add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed")
ibd.iR |
A data frame containing the iR summary statistics for each SNP.
See the returned |
interval |
A vector of length 3 containing the genomic locations of a specific region to plot. This vector should contain the
chromosome ID, the start of the interval in base-pairs and the end of the interval in base-pairs; in this order respectively.
The default is |
annotation.genes |
A data frame containing information on annotation genes to be included in the figure. This data frame must have at least 5 columns of information:
|
annotation.genes.color |
A vector of characters or numeric values containing the two colors according to gene stand (positive (+) or negative (-)) |
highlight.genes |
A data frame containing information of genes or regions to highlight. The data frame must have at least 4 columns of information:
|
highlight.genes.labels |
Logical. Whether to include gene names as labels in the figure. The default is |
highlight.genes.color |
Character string or numeric value. A single color that will be used to highlight a region/gene. The default is |
highlight.genes.alpha |
Numeric. A single value between 0 and 1 indicating the gene color transparency. The default is |
point.size |
Numeric. The size of the points in the figures. The default is |
point.color |
A vector of characters or numeric values denoting the color of points to be plotted.
If there are multiple subpopulations then the number of colors specified should equal the number of subpopulations.
The default is |
add.rug |
Logical. Whether to include SNP positions as a rug in the figure. The default is |
plot.title |
A character string of a title to be added to the figure The default is |
add.legend |
Logical. Whether a legend containing subpopulation information should be plotted. The default is |
facet.label |
Logical. Whether to include facet labels if multiple subpopulations (column name "subpop") are specified. |
facet.scales |
A character string of either |
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the significance of IBD sharing my_iR <- getIBDiR(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # plot the iR statistics plotIBDiR(ibd.iR = my_iR, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, point.size = 1, point.color = NULL, add.rug = FALSE, plot.title = "Significance of IBD sharing", add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed")
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the significance of IBD sharing my_iR <- getIBDiR(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # plot the iR statistics plotIBDiR(ibd.iR = my_iR, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, point.size = 1, point.color = NULL, add.rug = FALSE, plot.title = "Significance of IBD sharing", add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed")
plotIBDproportions()
plots the proportion of pairs IBD for each SNP across the genome.
plotIBDproportions(ibd.proportions, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, line.color = NULL, add.rug = TRUE, plot.title = NULL, add.legend = TRUE, facet.label = TRUE, facet.scales = "fixed", subpop.facet = FALSE)
plotIBDproportions(ibd.proportions, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, line.color = NULL, add.rug = TRUE, plot.title = NULL, add.legend = TRUE, facet.label = TRUE, facet.scales = "fixed", subpop.facet = FALSE)
ibd.proportions |
A data frame containing the proportion of pairs IBD at each SNP.
See the returned |
interval |
A vector of length 3 containing the genomic locations of a specific region to plot. This vector should contain the
chromosome ID, the start of the interval in base-pairs and the end of the interval in base-pairs; in this order respectively.
The default is |
annotation.genes |
A data frame containing information on annotation genes to be included in the figure. This data frame must have at least 5 columns of information:
|
annotation.genes.color |
A vector of characters or numeric values containing the two colors representing gene stand (positive (+) or negative (-)) |
highlight.genes |
A data frame containing information of genes or regions to highlight. The data frame must have at least 4 columns of information:
|
highlight.genes.labels |
Logical. Whether to include gene names as labels in the figure. The default is |
highlight.genes.color |
Character string or numeric value. A single color that will be used to highlight a region/gene. The default is |
highlight.genes.alpha |
Numeric. A single value between 0 and 1 indicating the gene color transparency. The default is |
line.color |
A vector of characters or numeric values denoting the color of lines to be plotted.
If there are multiple populations/subpopulations then the number of colors specified should equal the number of
unique populations/subpopulations combinations.
The default is |
add.rug |
Logical. Whether to include SNP positions as a rug in the figure. The default is |
plot.title |
A character string of a title to be added to the figure The default is |
add.legend |
Logical. Whether a legend containing subpopulation information should be plotted. The default is |
facet.label |
Logical. Whether to include facet labels if multiple populations/subpopulations (column names "pop" and "subpop") are specified. |
facet.scales |
A character string of either |
subpop.facet |
Logical. Whether to plot subpopulations in separate facets. The default is |
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the proportion of pairs IBD at each SNP my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # plot the proportion of pairs IBD plotIBDproportions(ibd.proportions = my_proportion, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, add.rug = FALSE, plot.title = "Proportion of pairs IBD in PNG", add.legend = FALSE, line.color = NULL, facet.label = TRUE, facet.scales = "fixed", subpop.facet = FALSE) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = my_groups) # plot the proportion of pairs IBD plotIBDproportions(ibd.proportions = my_proportion, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, line.color = NULL, add.rug = FALSE, plot.title = "Proportion of pairs IBD in PNG - with stratification", add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed", subpop.facet = TRUE)
# generate a binary IBD matrix my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes, ibd.segments = png_ibd) # calculate the proportion of pairs IBD at each SNP my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = NULL) # plot the proportion of pairs IBD plotIBDproportions(ibd.proportions = my_proportion, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, add.rug = FALSE, plot.title = "Proportion of pairs IBD in PNG", add.legend = FALSE, line.color = NULL, facet.label = TRUE, facet.scales = "fixed", subpop.facet = FALSE) # creating a stratification dataset my_groups <- png_genotypes[[1]][,1:3] my_groups[1:10,"pid"] <- "a" my_groups[11:25,"pid"] <- "b" my_groups[26:38,"pid"] <- "c" my_proportion <- getIBDproportion(ped.genotypes = png_genotypes, ibd.matrix = my_matrix, groups = my_groups) # plot the proportion of pairs IBD plotIBDproportions(ibd.proportions = my_proportion, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, line.color = NULL, add.rug = FALSE, plot.title = "Proportion of pairs IBD in PNG - with stratification", add.legend = FALSE, facet.label = TRUE, facet.scales = "fixed", subpop.facet = TRUE)
plotIBDsegments()
plots IBD segments for pairs across the genome. IBD segments are depicted by colored blocks.
plotIBDsegments(ped.genotypes, ibd.segments, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.5, segment.color = NULL, number.per.page = NULL, fid.label = TRUE, iid.label = TRUE, ylabel.size = 9, add.rug = FALSE, plot.title = NULL, add.legend = TRUE)
plotIBDsegments(ped.genotypes, ibd.segments, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = TRUE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.5, segment.color = NULL, number.per.page = NULL, fid.label = TRUE, iid.label = TRUE, ylabel.size = 9, add.rug = FALSE, plot.title = NULL, add.legend = TRUE)
ped.genotypes |
A list containing 3 objects. See the |
ibd.segments |
A data frame containing the IBD segments inferred from pairs of isolates
See the returned |
interval |
A vector of length 3 containing the genomic locations of a specific region to plot. This vector should contain the
chromosome ID, the start of the interval in base-pairs and the end of the interval in base-pairs; in this order respectively.
The default is |
annotation.genes |
A data frame containing information on annotation genes to be included in the figure. This data frame must have at least 5 columns of information:
|
annotation.genes.color |
A vector of characters or numeric values containing the two colors representing gene stand (positive (+) or negative (-)) |
highlight.genes |
A data frame containing information of genes or regions to highlight. The data frame must have at least 4 columns of information:
|
highlight.genes.labels |
Logical. Whether to include gene names as labels in the figure. The default is |
highlight.genes.color |
Character string or numeric value. A single color that will be used to highlight a region/gene. The default is |
highlight.genes.alpha |
Numeric. A single value between 0 and 1 indicating the gene color transparency. The default is |
segment.height |
A numeric value giving the hight of IBD segment blocks, such that 0 < segment.height <= 1. The default is |
segment.color |
A vector of characters or numeric values denoting the color of the segments to be plotted. Two colors must be specified, one for segments with 1 allele IBD and one for segments with 2 alleles IBD. |
number.per.page |
A numeric value indicating the maximum number of IBD pairs to plot in a single graphics window. The default is
|
fid.label |
Logical. If |
iid.label |
Logical. If |
ylabel.size |
A numeric value indicating the size of the y-axis labels if drawn. The default is |
add.rug |
Logical. Whether to include SNP positions as a rug in the figure. The default is |
plot.title |
A character string of a title to be added to the figure The default is |
add.legend |
Logical. If |
getGenotypes
and getIBDsegments
.
# plot IBD segments plotIBDsegments(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.6, number.per.page = NULL, fid.label = FALSE, iid.label = FALSE, ylabel.size = 9, add.rug = FALSE, plot.title = "Distribution of IBD segments in PNG", add.legend = TRUE, segment.color = NULL) # plot IBD segments over an interval: chromosome 7: 350000 - 550000 plotIBDsegments(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3",350000,550000), annotation.genes = annotation_genes, annotation.genes.color = NULL, highlight.genes = highlight_genes, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.8, number.per.page = NULL, fid.label = FALSE, iid.label = FALSE, ylabel.size = 9, add.rug = TRUE, plot.title = "Distribution of IBD segments in PNG", add.legend = TRUE, segment.color = c("purple","green"))
# plot IBD segments plotIBDsegments(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = NULL, annotation.genes = NULL, annotation.genes.color = NULL, highlight.genes = NULL, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.6, number.per.page = NULL, fid.label = FALSE, iid.label = FALSE, ylabel.size = 9, add.rug = FALSE, plot.title = "Distribution of IBD segments in PNG", add.legend = TRUE, segment.color = NULL) # plot IBD segments over an interval: chromosome 7: 350000 - 550000 plotIBDsegments(ped.genotypes = png_genotypes, ibd.segments = png_ibd, interval = c("Pf3D7_07_v3",350000,550000), annotation.genes = annotation_genes, annotation.genes.color = NULL, highlight.genes = highlight_genes, highlight.genes.labels = FALSE, highlight.genes.color = NULL, highlight.genes.alpha = 0.1, segment.height = 0.8, number.per.page = NULL, fid.label = FALSE, iid.label = FALSE, ylabel.size = 9, add.rug = TRUE, plot.title = "Distribution of IBD segments in PNG", add.legend = TRUE, segment.color = c("purple","green"))
Processed raw genotype data with the parameter settings as in the Vignette.
png_genotypes
png_genotypes
A list of two objects named pedigree
and genotypes
:
pedigree
is a data frame with the following information:
Family ID
Isolate ID
Paternal ID. This is not used by isoRelate and is set to zero.
Maternal ID. This is not used by isoRelate and is set to zero.
Multiplicity of infection - 1 for single infection and 2 for multiple infection
Affection status of isolate. This is set to 2 and is ignored by isoRelate.
genotypes
is a data frame with the first 5 columns:
Chromosome
SNP identifier
Genetic map distance
Base-pair position
Population allele frequency
where each row describes a single SNP. Columns 6 onwards contain the genotype data for each isolate, where a single column corresponds to a single isolate. These columns are labeled with merged family IDs and isolate IDs separated by a slash symbol (/).
The IBD segments inferred using isoRelate with the parameter settings as in the Vignette.
png_ibd
png_ibd
A data frame with
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
Chromosome
IBD segment start SNP ID
IBD segment end SNP ID
IBD segment start SNP base-pair position
IBD segment end SNP base-pair position
IBD segment start SNP morgan position
IBD segment end SNP morgan position
Number of SNPs in IBD segment
Length of IBD segment in base-pairs
Length of IBD segment in morgans
The number of alleles shared IBD (either 1 or 2)
The parameter estimates inferred using isoRelate with the parameter settings as in the Vignette.
png_parameters
png_parameters
A data frame with
Family 1 ID
Isolate 1 ID
Family 2 ID
Isolate 2 ID
The estimated number of meiosis separating each pair of isolates
The estimated proportion of genome with 0 alleles IBD
The estimated proportion of genome with 1 allele IBD
The estimated proportion of genome with 2 alleles IBD
A PED/MAP file containing gneotype information for 38 isolates from Madang, Papua New Guinea. This dataset was released as part of the MalariaGEN Pf3k consortium in VCF file format, which underwent extensive data processing to result in the final SNP set in the PED/MAP file. This data, and more, is available from https://www.malariagen.net/projects/pf3k.
png_pedmap
png_pedmap
List with two objects
PLINK PED data frame. See getGenotypes
for details.
PLINK MAP data frame. See getGenotypes
for details.
See getGenotypes
for details.
Round digits to specified decimal places
roundDecimal(number, digits)
roundDecimal(number, digits)
number |
A number to round |
digits |
The number of digits to round to |
The transition probabilities for 2 diploid chromosomes
transitionProbDD(omega_0, omega_1, omega_2, meiosis, dist_cM, ibd_current, ibd_previous)
transitionProbDD(omega_0, omega_1, omega_2, meiosis, dist_cM, ibd_current, ibd_previous)
omega_0 |
The probability of sharing 0 alleles IBD |
omega_1 |
The probability of sharing 1 allele IBD |
omega_2 |
The probability of sharing 2 alleles IBD |
meiosis |
The number of meiosis separating the two isoaltes |
dist_cM |
The genetic map distance (cM) between SNP i and SNP j |
ibd_current |
The IBD state of SNP j |
ibd_previous |
The IBD state of SNP i |
The transition probabilities for 1 haploid and 1 diploid chromosome
transitionProbHD(omega_0, meiosis, dist_cM, ibd_current, ibd_previous)
transitionProbHD(omega_0, meiosis, dist_cM, ibd_current, ibd_previous)
omega_0 |
The probability of sharing 0 alleles IBD |
meiosis |
The number of meiosis separating the two isoaltes |
dist_cM |
The genetic map distance (cM) between SNP i and SNP j |
ibd_current |
The IBD state of SNP j |
ibd_previous |
The IBD state of SNP i |
The transition probabilities for 2 haploid chromosomes
transitionProbHH(omega_0, meiosis, dist_cM, ibd_current, ibd_previous)
transitionProbHH(omega_0, meiosis, dist_cM, ibd_current, ibd_previous)
omega_0 |
The probability of sharing 0 alleles IBD |
meiosis |
The number of meiosis separating the two isoaltes |
dist_cM |
The genetic map distance (cM) between SNP i and SNP j |
ibd_current |
The IBD state of SNP j |
ibd_previous |
The IBD state of SNP i |
Matrices of all possible genotype combinations between pairs, given MOI
trueGenotypes(gender_1, gender_2)
trueGenotypes(gender_1, gender_2)
gender_1 |
The MOI estimate of isolate 1 |
gender_2 |
The MOI estimate of isolate 2 |