This vignette documents the latest developments in
hierfstat
. Refer to the hierfstat
article and a step by step tutorial
for an introduction to the package
Data can be imported in hierfstat
many different ways
(fstat format, tabular format, dosage data, even VCF format), as
described in the import vignette. hierfstat
can now also
read genind
objects (from package adegenet
).
Note however that only some genetic data types will be properly
converted and used. The alleles need to be encoded either as integer (up
to three digits per allele), or as nucleotides
(c("a","c","g","t","A","C","G","T")
).
## [1] TRUE
The function you are the most likely to want using is
basic.stats
(it calculates HO, HS, FIS,
FST
etc…).
## $perloc
## Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest
## fca8 0.6670 0.7790 0.8619 0.0829 0.8671 0.0881 0.0962 0.1016 0.1438 0.3987
## fca23 0.6838 0.7439 0.7994 0.0555 0.8029 0.0589 0.0694 0.0734 0.0809 0.2302
## fca43 0.6814 0.7442 0.7937 0.0495 0.7968 0.0526 0.0623 0.0660 0.0844 0.2054
## fca45 0.7100 0.7085 0.7642 0.0557 0.7679 0.0594 0.0729 0.0774 -0.0021 0.2039
## fca77 0.6295 0.7828 0.8659 0.0831 0.8711 0.0883 0.0960 0.1014 0.1958 0.4067
## fca78 0.5773 0.6339 0.6773 0.0434 0.6801 0.0462 0.0641 0.0679 0.0893 0.1261
## fca90 0.6454 0.7408 0.8144 0.0736 0.8190 0.0782 0.0904 0.0955 0.1287 0.3017
## fca96 0.6259 0.6747 0.7657 0.0910 0.7714 0.0967 0.1189 0.1254 0.0723 0.2973
## fca37 0.4485 0.5671 0.6027 0.0356 0.6049 0.0379 0.0591 0.0626 0.2091 0.0874
##
## $overall
## Ho Hs Ht Dst Htp Dstp Fst Fstp Fis Dest
## 0.6299 0.7083 0.7717 0.0634 0.7757 0.0674 0.0821 0.0869 0.1108 0.2310
You can also get e.g. allele.count
and
allelic.richness
, a rarefied measure of the number of
alleles at each locus and in each population. For instance, below is a
boxplot of the allelic richness for the 5 first loci in the nancycats
dataset
Population statistics are obtained through basic.stats
or wc
(varcomp.glob
can also be used and will
give the same result as wc
for a one level hierarchy). For
instance, FIS and
FST in
the Galba truncatula dataset provided with
hierfstat
are obtained as:
## $FST
## [1] 0.540894
##
## $FIS
## [1] 0.8154694
## gtrunchier...1. Ind
## Total 0.540894 0.9152809
## gtrunchier...1. 0.000000 0.8154694
Confidence intervals on these statistics can be obtained via
boot.vc
:
## H-Total F-Pop/Total F-Ind/Total H-Pop F-Ind/Pop Hobs
## 2.5% 0.6506 0.4816 0.9046 0.2652 0.7664 0.0558
## 50% 0.7470 0.5394 0.9150 0.3414 0.8155 0.0630
## 97.5% 0.8163 0.6188 0.9263 0.4121 0.8499 0.0699
boot.ppfis
and boot.ppfst
provide bootstrap
confidence intervals (bootstrapping over loci) for population specific
FIS and
pairwise FST
respectively.
genet.dist
estimates one of 10 different genetic
distances between populations as described mostly in Takezaki & Nei
(1996)
## 1 2 3 4 5
## 2 0.4272210
## 3 1.1402899 1.8235430
## 4 0.8387367 0.9540338 1.6638485
## 5 0.6967425 0.6205417 2.5798363 0.8767008
## 6 0.9411656 0.9742812 1.1553423 0.5243353 1.1911894
Principal coordinate analysis can be carried out on this genetic distances:
Function betas
will give estimates of population
specific FSTi
for data in the fstat
format. For instance, using the
nancycats
dataset:
Functions fs.dosage, fst.dosage
and
fis.dosage
give estimates of population specific FSTi
and FISi
for dosage data, fs.dosage
also outputs the matrix of FSTXY,
as defined in Weir
and Goudet (2017) and Weir
and Hill (2002). We use the gtrunchier
dataset to
illustrate the output of fs.dosage
. We first need to
convert gtrunchier
to a dosage format via
fstat2dos
gt.dos<-fstat2dos(gtrunchier[,-c(1:2)]) #converts fstat format to dosage
fs.gt<-fs.dosage(gt.dos,pop=gtrunchier[,2])
image(1:29,1:29,fs.gt$FsM,main=expression(F[ST]^{XY}))
We clearly see the block structure of patches within locality, with the second block along the diagonal showing higher similarity than the others. Blocks off the main diagonal are lighter, showing less genetic similarity between localities than within.
Functions pi.dosage
, theta.Watt.dosage
and
TajimaD.dosage
calculate nucleotide diversity, Watterson’s
θW and
Tajima’s D respectively, from
dosage data.
indpca
carries out Principal component analysis on
individual genotypes. To illustrate, we use the gtrunchier
datasets, with individuals colored according to their locality of
origin:
Functions betas
and beta.dosage
give
estimates of individual inbreeding coefficients and kinships between
individuals, the former for genotypes in the fstat
format,
the latter for dosage data:
This example is just for illustrating purposes, I do not advise using these individual statistics unless you have a large number of polymorphic markers ( ≥ 1000 at least, better if ≥ 10000, see Goudet, Kay & Weir (2018)).
It is now possible to simulate genetic data from a continent islands
model, either at equilibrium via sim.genot
or for a given
number of generations via sim.genot.t
. These two functions
have several arguments, allowing to look at the effect of population
sizes, inbreeding, migration and mutation. the number of independent
loci and number of alleles per loci can also be specified. It is also
possible to simulate data from a finite island model using
sim.genot.t
, by specifying that argument IIM
is FALSE
. Last,sim.genot.metapop.t
generates
genetic data with any migration matrix, population size and inbreeding
level, while still assuming independence of the genetic markers.
(refer to the import vignette to import data from other programs)
Other than the fstat
format, hierfstat
can
now export to files in format suitable for Bayescan, plink and structure.
The functions to export to these programs are
write.bayescan
, write.ped
and
write.struct
respectively.
A function to detect sex-biased dispersal, sexbias.test
based on Goudet
et al. (2002) has been implemented. To illustrate its use, load the
Crocidura russula data set. It consists of the genotypes and
sexes of 140 shrews studied by Favre
et al. (1997). In this species, mark -recapture showed an excess of
dispersal from females, an unusual pattern in mammals. This is confirmed
using genetic data:
## F M
## -1.1602396 0.9770438
## $call
## sexbias.test(dat = crocrussula$genot, sex = crocrussula$sex)
##
## $statistic
## t
## -4.117605
##
## $p.value
## [1] 8.097862e-05