Diversity statistics¶
In the module stats, a number of tools are provided to
compute diversity statistics out of Site
or Align
instances. Some statistics are applicable to individual sites, some
to sets of sites, and some to phased sequences alignments. Note that
the objects may indifferently contain nucleotide sequences, protein
sequences, microsatellite alleles encoded (or not) as allele length, or
any arbitrary representation of allelic diversity.
The alphabets define lists of alleles alleles and their representation, but won’t be used to decide what statistics can be computed or not. What is important to note that EggLib will compute any statistic you request out of your data, even if it is meaningless. Special attention should be granted to statistics requiring a phase, since you can easily load unphased data to objects that can be used to compute those statistics.
In many cases, not computable statistics are returned as None
, but
this is only when they are technically not computable (due to missing
data or unvailability of a specific feature such as outgroup sequences
or subpopulations).
In the sections of this chapter, we will present statistics available in
the stats module. Statistics will be grouped by families
(a family of statistics being a group of statistics that require the
same type of data and the same kind of information). Most of the statistics
are computed by stats.ComputeStats
(see this tutorial section
for an introduction), or by other functions available in the same module.
Outgroup¶
Some of the statistics require an outgroup to be computed. The outgroup
should be included in the analysed dataset (Site
or Align
instance) but identified by the means of a Structure
instance.
There might be more than one outgroup samples. The ougroup information will
be used to identify the ancestral variant (that is, the one which is
shared with the outgroup) if the outgroup has one of the
alleles present in the main sample (the ingroup) and, if there are several outgroup samples,
all of them have the same all. If you outgroup has an allele not found
in the outgroup, or if the outgroup contains several alleles, then the
site will be considerer not orientable and won’t be used for statistics
requiring an outgroup. Statistics not requiring an outgroup will be
computed normally, though.
Population structure¶
Many statistics require that several populations are present, some
require that an individual structure is defined, and a single statistic in
stats.ComputeStats
requires clusters of populations. Like the
outgroup, the structure of samples is described by Structure
instances (see here for an introduction). If the
appropriate level of structure is not defined in the Structure
provided to the class or function computing statistics (or if no
Structure
is provided), the concerned statistics will be
None
.
Here is the list of families of statistics that are described in the following sections: