Title: | EM Estimation of Malaria Haplotype Probabilities from Multiply Infected Human Blood Samples |
---|---|
Description: | Using EM algorithm to estimate malaria haplotype probabilities and the number of infections from multiply infected human blood samples. Estimated haplotype probabilities and their standard error are reported. |
Authors: | Xiaohong Li |
Maintainer: | Xiaohong Li <[email protected]> |
License: | GPL-2 |
Version: | 2.0 |
Built: | 2025-02-05 19:26:03 UTC |
Source: | https://github.com/PlasmoGenEpi/malaria.em |
Compute the maximum likelihood estimates for malaria haplotype probabilities and number of infections based on the malaria genotype data using Expectation-Maximization approach.
malaria.em(geno, sizes=c(2), locus.label=NA)
malaria.em(geno, sizes=c(2), locus.label=NA)
geno |
matrix of genotypes. Each column represents the alleles observed in each of the locus. The observed alleles in each locus are separated by a space, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = K. Rows represent the genotypes for each subject. |
sizes |
An integer or a vector of possible number of parasite strains within individuals in observed data. For example, sizes=c(1:6) means the possible number of parasite strains is ranging from 1 to 6. If the length of sizes is greater than 1, the estimation will assume zero truncated Poisson distribution on the number of parasite strains. |
locus.label |
vector of labels for loci. |
This program is an extension of haplo.em() function in haplo.stats package, which computes the maximum likelihood estimates (MLEs) of haplotype probabilities in diploid population. Since in malaria setting, the number of the malaria parasite strains is unknown due to the possible multiple infections, we impose a probability distribution on this number. Parameter sizes determines the underlying probability distribution. Length of sizes more than one will lead to the estimation assuming a truncated Poisson distribution. In this case, the estimation of the Poisson rate and number of infections will be reported. Otherwise, if sizes is defined as an integer- C for example, the estimation will assume the number of strains is fixed at C. Although this assumption is not applicable for malaria setting, it is very useful for any genetic data, in which the number of chromosomes is same across all samples. By default, sizes=2, which reduces to haplo.em() and give the MLEs of the haplotype probabilities in diploid population.
haplo.prob.tab |
matrix of unique haplotypes, MLEs of estimated haplotype probabilities, and their standard errors. |
haplotype |
matrix of unique haplotypes. Each row represents a unique haplotype, and the number of columns is the number of loci. |
haplo.prob |
vector of MLEs of haplotype probabilities. The ith element of hap.prob corresponds to the ith row of haplotype. |
haplo.prob.std |
standard error of the estimated haplotype frequencies. |
lambda |
estimated Poisson parameter. |
NumofInfection |
estimated number of infections. |
haplo.sets |
List of all possible haplotype combinations and their posterior probability per subject. The first column named ids is a vector for row index of subjects after expanding to all possible haplotype combinations for each subject. If ids=i, then i is the ith row of geno. If the ith subject has n possible haplotype combinations that correspond to their marker genotype, then i is repeated n times. The value in the second column is the row numbers of the unique haplotypes in the returned haplotype matrix. |
n.haplo.set |
vector of maximum number of haplotype combinations per subject that are consistent with their marker data in the matrix geno. The length of n.haplo.set = nrow(geno). |
pred.haplo.set |
Predicted haplotype combination that is consistent with their marker data for each subject. The values in pred.haplo.set are the row numbers of the unique haplotypes in the returned haplotype matrix. |
Xiaohong Li
Li, X., Foulkes, A.S., Yucel, R. and Rich, S.M. (2007) An expectation maximization approach to estimate malaria haplotype frequencies in multiply infected children, Statistical Applications in Genetics and Molecular Biology, Vol. 6 : Iss. 1, Article 33.
data(geno) sizes<-c(2) ret1<-malaria.em(geno,sizes,locus.label=c("DQB","DRB") ) sizes<-c(1:3) ret2<-malaria.em(geno,sizes,locus.label=c("DQB","DRB") )
data(geno) sizes<-c(2) ret1<-malaria.em(geno,sizes,locus.label=c("DQB","DRB") ) sizes<-c(1:3) ret2<-malaria.em(geno,sizes,locus.label=c("DQB","DRB") )