Egglib 2.1.11
C++ library reference manual
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
NucleotideDiversity Class Reference

Performs analyzes of population genetics. More...

#include <NucleotideDiversity.hpp>

Inheritance diagram for NucleotideDiversity:
BaseDiversity

Public Member Functions

 NucleotideDiversity ()
 Builds an object. More...
 
virtual ~NucleotideDiversity ()
 Destroys an object. More...
 
virtual void load (CharMatrix &data, bool allowMultipleMutations=false, double minimumExploitableData=1., unsigned int ignoreFrequency=0, std::string characterMapping=dnaMapping, bool useZeroAsAncestral=false)
 Identifies polymorphic sites and computes basis statistics. More...
 
unsigned int S () const
 Number of polymorphic sites.
 
unsigned int So () const
 Number of polymorphic orientable sites.
 
unsigned int eta () const
 Minimum number of mutations.
 
double nseff () const
 Average of per-site number of sequences effectively used.
 
unsigned int lseff () const
 Number of sites effectively used.
 
double nseffo () const
 Average of number of sequences effectively used at orientable sites.
 
unsigned int lseffo () const
 Number of orientable sites.
 
unsigned int npop () const
 Number of detected populations.
 
unsigned int popLabel (unsigned int popIndex) const
 Label of the population with given index (unsecure)
 
double Pi ()
 Nucleotide diversity.
 
double thetaW ()
 Watterson estimator of theta.
 
double average_Pi ()
 Average of Pi over populations.
 
double pop_Pi (unsigned int popIndex)
 Pi of a given population (unsecure)
 
double D ()
 Tajima's D.
 
double thetaH ()
 Fay and Wu estimator of theta.
 
double thetaL ()
 Zeng et al. estimator of theta.
 
double H ()
 Fay and Wu's H.
 
double Z ()
 Standardized H.
 
double E ()
 Zeng et al.'s E.
 
unsigned int FixedDifferences ()
 Number of sites with at least one fixed difference.
 
unsigned int CommonAlleles ()
 Number of sites with at least one allele shared among at least two populations.
 
unsigned int SharedAlleles ()
 Number of sites with at least one non-fixed allele shared among at least two populations.
 
unsigned int SpecificAlleles ()
 Number of sites with at least one allele specific to one population.
 
unsigned int SpecificDerivedAlleles ()
 Number of sites with at least one derived allele specific to one population.
 
unsigned int Polymorphisms (unsigned int pop)
 Number of polymorphisms in a given population (unsecure)
 
unsigned int SpecificAlleles (unsigned int pop)
 Number of specific alleles for a given population (unsecure)
 
unsigned int SpecificDerivedAlleles (unsigned int pop)
 Number of specific derived allele for a given population (unsecure)
 
unsigned int FixedDifferences (unsigned int pop1, unsigned int pop2)
 Number of fixed differences between a given pair of populations (unsecure; pop2 must be larger than pop1)
 
unsigned int CommonAlleles (unsigned int pop1, unsigned int pop2)
 Number of common alleles between a given pair of populations (unsecure; pop2 must be larger than pop1)
 
unsigned int SharedAlleles (unsigned int pop1, unsigned int pop2)
 Number of shared non-fixed alleles between a given pair of populations (unsecure; pop2 must be larger than pop1)
 
unsigned int triConfiguration (unsigned int index)
 Number falling into one of the possible site configurations. More...
 
std::vector< unsigned int > polymorphic_positions () const
 Builds and returns the vector of positions of all polymorphic sites.
 
std::vector< unsigned int > singleton_positions () const
 Builds and returns the vector of positions of all singleton sites. More...
 
- Public Member Functions inherited from BaseDiversity
 BaseDiversity ()
 Constructor. More...
 
virtual ~BaseDiversity ()
 Destructor. More...
 
virtual void reserve (unsigned int numberOfSites)
 Reserve sufficient memory for a given number of polymorphic sites. More...
 
const SitePolymorphismget_site (unsigned int index) const
 Gets a site.
 
unsigned int get_position (unsigned int index) const
 Gets a site position.
 
virtual void reset ()
 Clears and re-initializes object.
 

Protected Member Functions

 NucleotideDiversity (const NucleotideDiversity &source)
 This class cannot be copied. More...
 
NucleotideDiversityoperator= (const NucleotideDiversity &source)
 This class cannot be copied. More...
 
void init ()
 
void clear ()
 
void diversity ()
 
void outgroupDiversity ()
 
void differentiation ()
 
void triConfigurations ()
 
- Protected Member Functions inherited from BaseDiversity
void importSites (CharMatrix &data, bool allowMultipleMutations, double minimumExploitableData, unsigned int ignoreFrequency, std::string characterMapping, bool useZeroAsAncestral, bool ignoreOutgroup)
 
void analyzeSite (CharMatrix &data, unsigned int index, double maxMissingData, bool ignoreOutgroup)
 
unsigned int getPopIndex (unsigned int label) const
 

Protected Attributes

bool b_analysisSites
 
bool b_diversity
 
double v_Pi
 
double v_thetaW
 
double v_average_Pi
 
double * v_pop_Pi
 
double v_D
 
bool b_outgroupDiversity
 
double v_thetaH
 
double v_thetaL
 
double v_H
 
double v_Z
 
double v_E
 
bool b_differentiation
 
unsigned int * v_pairwiseFixedDifferences
 
unsigned int * v_pairwiseCommonAlleles
 
unsigned int * v_pairwiseSharedAlleles
 
unsigned int * v_popPolymorphic
 
unsigned int * v_popSpecific
 
unsigned int * v_popSpecificDerived
 
unsigned int v_countFixedDifferences
 
unsigned int v_countCommonAlleles
 
unsigned int v_countSharedAlleles
 
unsigned int v_countSpecificAlleles
 
unsigned int v_countSpecificDerivedAlleles
 
bool b_triConfigurations
 
unsigned int * v_triConfigurations
 
- Protected Attributes inherited from BaseDiversity
SitePolymorphism ** v_sites
 
bool * v_orientables
 
unsigned int * v_sitePositions
 
unsigned int v_reserved
 
unsigned int v_ns
 
unsigned int v_S
 
unsigned int v_So
 
unsigned int v_eta
 
double v_nseff
 
unsigned int v_lseff
 
double v_nseffo
 
unsigned int v_lseffo
 
unsigned int v_npop
 
unsigned int * v_popLabel
 
bool p_allowMultipleMutations
 
double p_minimumExploitableData
 
std::string p_characterMapping
 
unsigned int p_pos_sep_mapping
 
bool p_useZeroAsAncestral
 
unsigned int p_ignoreFrequency
 

Additional Inherited Members

- Static Public Attributes inherited from BaseDiversity
static const std::string dnaMapping = "ACGT MRWSYKBDHVN?-"
 Predefined mapping string for DNA data. More...
 
static const std::string rnaMapping = "ACGU MRWSYKBDHVN?-"
 Predefined mapping string for RNA data. More...
 
static const std::string aaMapping = "GALMFWKQESPVICYHRNDT X-"
 Predefined mapping string for amino acid data. More...
 

Detailed Description

Performs analyzes of population genetics.

This class computes several summary statistics based on nucleotide analysis. Note that it is possible to use the same object to analyze different data set. Calling the load() method erases all data preivously computed (if any). Calling the load() method is absolutely required to compute any statistics. Some statistics are not computed by default, but are if the corresponding accessor is used (only load() is required).

Note that "unsecure" accessors don't perform out-of-bound checks.

S is the number of varying sites (only in sites that were not rejected).

eta is the minimum number of mutations, that is the sum of the number of alleles minus 1 for each varying site. eta = S if all sites have no variant or 2 alleles. eta is computed independently of the option multiple and IS NOT computed over lseff sites.

Pi is the average number of pairwise differences between sequences (expressed here per site) or (as computed here) the mean per site (unbiased) heterozygosity. Pi is zero if no polymorphic sites.

D is the Tajima's test of neutrality Ref. Tajima F.: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123:585-595. It is arbitrary set to 0 if no polymorphic sites.

tW: thetaW: estimator of theta based on polymorphic sites (ref. e.g. Watterson 1975 Theor. Pop. Biol.). Both D and thetaW are computed assuming that rounded nseff samples have been sampled. The variance of D is computed using rounded nseff instead of ns.

H is the Fay and Wu's test of neutrality. Z is the standardized version and E a similar test. Ref. Fay J. C., Wu C.-I.: Hitchhiking under positive Darwinian selection. Genetics 2000, 155:1405-1413. and Zeng K., Fu Y. X., Shi S., Wu C.-I.: Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 2006, 174:1431-9. Both are arbitrary set to 0 if no polymorphic or orientable sites.

tH and tL: theta H: estimators of theta based on derived polymorphic sites (ref in Fay and Wu and Zeng al.). The variance of H/Z are computed assuming that rounded nseff samples have been sampled.

Constructor & Destructor Documentation

Builds an object.

~NucleotideDiversity ( )
virtual

Destroys an object.

NucleotideDiversity ( const NucleotideDiversity source)
inlineprotected

This class cannot be copied.

Member Function Documentation

void load ( CharMatrix data,
bool  allowMultipleMutations = false,
double  minimumExploitableData = 1.,
unsigned int  ignoreFrequency = 0,
std::string  characterMapping = dnaMapping,
bool  useZeroAsAncestral = false 
)
virtual

Identifies polymorphic sites and computes basis statistics.

Parameters
dataan alignment object (subclass of CharMatrix). The presence of outgroup or of different populations will be detected based on the populationLabel members of the passed object. The populationLabel 999 will be interpreted as outgroups. If several outgroups are passed, sites were the outgroups are not consistent will be treated as "non- orientable".
allowMultipleMutationsif true, sites with more than two alleles will not be ignored. The sum of the frequencies of all alleles not matching the outgroup will treated as the derived allele frequency (for orientable sites).
minimumExploitableDatasites where the non-missing data (as defined by characterMapping) are at a frequency larger than this value will be removed from the analysis. Use 1. to take only 'complete' sites into account and 0. to use all sites. (The outgroup is not considered in this computation.)
ignoreFrequencyremoves sites that are polymorph because of an allele at absolute frequency smaller than or equal to this value. If ignoreFrequency=1, no sites are removed, if ignoreFrequency=0, singleton sites are ignored. Such sites are completely removed from the analysis (not counted in lseff). Note that if more than one mutation is allowed, the site is removed only if all the alleles but one are smaller than or equal to this value. For example, an alignment column AAAAAAGAAT is ignored with an ignoreFrequency of 1, but AAAAAAGGAT is conserved (including the third allele T which is a singleton).
characterMappinga string giving the list of characters that should be considered as valid data. If a space is present in the string, the characters left of the space will be treated as valid data and the characters right of the space will be treated as missing data, that is tolerated but ignored. All characters not in the string will cause an EggInvalidCharacterError to be raised.
useZeroAsAncestralif true, all outgroups (if present) will be ignored and the character "0" will be considered as ancestral for all sites, whatever the character mapping.
NucleotideDiversity& operator= ( const NucleotideDiversity source)
inlineprotected

This class cannot be copied.

std::vector< unsigned int > singleton_positions ( ) const

Builds and returns the vector of positions of all singleton sites.

A site singleton when it is polymorphic according to parameter of the diversity analysis, when it has exactly two alleles and one of them is at absolute frequency 1 (one copy) disregarding the outgroup.

unsigned int triConfiguration ( unsigned int  index)

Number falling into one of the possible site configurations.

The statistics are limited to three populations. Assuming an unrooted A/G polymorphism (A and G can be substitued), the site configurations are:

  • 0: A&G A A specific 1
  • 1: A&G A G specific 1 + fixed 2-3
  • 2: A A&G A specific 2
  • 3: A A&G G specific 2 + fixed 1-3
  • 4: A A A&G specific 3
  • 5: A G A&G specific 3 + fixed 1-2
  • 6: A&G A&G A shared 1-2
  • 7: A&G A A&G shared 1-3
  • 8: A A&G A&G shared 2-3
  • 9: A&G A&G A&G shared 1-2-3
  • 10: A G G fixed 1
  • 11: A G A fixed 2
  • 12: A A G fixed 3
Parameters
indexmust be an index from 0 to 12.

The documentation for this class was generated from the following files:

Hosted by 
Get EggLib at SourceForge.net. Fast, secure and Free Open Source software downloads