# Diversity statistics¶

In the module stats, a number of tools are provided to
compute diversity statistics out of `Site`

or `Align`

instances. Some statistics are applicable to individual sites, some
to sets of sites, and some to phased sequences alignments. Note that
the objects may indifferently contain nucleotide sequences, protein
sequences, microsatellite alleles encoded (or not) as allele length, or
any arbitrary representation of allelic diversity.

The alphabets define lists of alleles alleles and their representation, but won’t be used to decide what statistics can be computed or not. What is important to note that EggLib will compute any statistic you request out of your data, even if it is meaningless. Special attention should be granted to statistics requiring a phase, since you can easily load unphased data to objects that can be used to compute those statistics.

In many cases, not computable statistics are returned as `None`

, but
this is only when they are technically not computable (due to missing
data or unvailability of a specific feature such as outgroup sequences
or subpopulations).

In the sections of this chapter, we will present statistics available in
the stats module. Statistics will be grouped by families
(a family of statistics being a group of statistics that require the
same type of data and the same kind of information). Most of the statistics
are computed by `stats.ComputeStats`

(see this tutorial section
for an introduction), or by other functions available in the same module.

## Outgroup¶

Some of the statistics require an outgroup to be computed. The outgroup
should be included in the analysed dataset (`Site`

or `Align`

instance) but identified by the means of a `Structure`

instance.
There might be more than one outgroup samples. The ougroup information will
be used to identify the ancestral variant (that is, the one which is
shared with the outgroup) if the outgroup has one of the
alleles present in the main sample (the ingroup) and, if there are several outgroup samples,
all of them have the same all. If you outgroup has an allele not found
in the outgroup, or if the outgroup contains several alleles, then the
site will be considerer not orientable and won’t be used for statistics
requiring an outgroup. Statistics not requiring an outgroup will be
computed normally, though.

## Population structure¶

Many statistics require that several populations are present, some
require that an individual structure is defined, and a single statistic in
`stats.ComputeStats`

requires clusters of populations. Like the
outgroup, the structure of samples is described by `Structure`

instances (see here for an introduction). If the
appropriate level of structure is not defined in the `Structure`

provided to the class or function computing statistics (or if no
`Structure`

is provided), the concerned statistics will be
`None`

.

Here is the list of families of statistics that are described in the following sections: