Egglib 2.1.11
C++ library reference manual
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
HaplotypeDiversity Class Reference

Computes diversity based on haplotype analysis. More...

#include <HaplotypeDiversity.hpp>

Inheritance diagram for HaplotypeDiversity:
BaseDiversity

Public Member Functions

 HaplotypeDiversity ()
 Constructor. More...
 
virtual ~HaplotypeDiversity ()
 Destructor. More...
 
void load (CharMatrix &data, bool allowMultipleMutations=false, unsigned int ignoreFrequency=0, std::string characterMapping=dnaMapping)
 Identifies polymorphic sites and computes basis statistics. More...
 
unsigned int K () const
 Number of distinct haplotypes.
 
double He () const
 Haplotype diversity (unbiased)
 
unsigned int haplotypeIndex (unsigned int) const
 Returns the allele number of a given sequence. More...
 
double Kst () const
 Population differenciation, based on nucleotides (Hudson 1992a)
 
double Fst () const
 Population differenciation, based on nucleotides (Hudson 1992b)
 
double Gst () const
 Population differenciation, based on haplotypes (Nei version)
 
double Hst () const
 Population differenciation, based on haplotypes (Hudson et al. version)
 
double Snn () const
 Hudson's Snn (nearest neighbor statistics)
 
- Public Member Functions inherited from BaseDiversity
 BaseDiversity ()
 Constructor. More...
 
virtual ~BaseDiversity ()
 Destructor. More...
 
virtual void reserve (unsigned int numberOfSites)
 Reserve sufficient memory for a given number of polymorphic sites. More...
 
const SitePolymorphismget_site (unsigned int index) const
 Gets a site.
 
unsigned int get_position (unsigned int index) const
 Gets a site position.
 
virtual void reset ()
 Clears and re-initializes object.
 

Protected Member Functions

void init ()
 
void clear ()
 
unsigned int diff (CharMatrix &data, unsigned int ind1, unsigned int ind2) const
 
- Protected Member Functions inherited from BaseDiversity
void importSites (CharMatrix &data, bool allowMultipleMutations, double minimumExploitableData, unsigned int ignoreFrequency, std::string characterMapping, bool useZeroAsAncestral, bool ignoreOutgroup)
 
void analyzeSite (CharMatrix &data, unsigned int index, double maxMissingData, bool ignoreOutgroup)
 
unsigned int getPopIndex (unsigned int label) const
 

Protected Attributes

bool m_loaded
 
unsigned int m_K
 
double m_He
 
double m_Kst
 
double m_Fst
 
double m_Gst
 
double m_Hst
 
double m_Snn
 
unsigned int * m_haplotypeIndex
 
- Protected Attributes inherited from BaseDiversity
SitePolymorphism ** v_sites
 
bool * v_orientables
 
unsigned int * v_sitePositions
 
unsigned int v_reserved
 
unsigned int v_ns
 
unsigned int v_S
 
unsigned int v_So
 
unsigned int v_eta
 
double v_nseff
 
unsigned int v_lseff
 
double v_nseffo
 
unsigned int v_lseffo
 
unsigned int v_npop
 
unsigned int * v_popLabel
 
bool p_allowMultipleMutations
 
double p_minimumExploitableData
 
std::string p_characterMapping
 
unsigned int p_pos_sep_mapping
 
bool p_useZeroAsAncestral
 
unsigned int p_ignoreFrequency
 

Additional Inherited Members

- Static Public Attributes inherited from BaseDiversity
static const std::string dnaMapping = "ACGT MRWSYKBDHVN?-"
 Predefined mapping string for DNA data. More...
 
static const std::string rnaMapping = "ACGU MRWSYKBDHVN?-"
 Predefined mapping string for RNA data. More...
 
static const std::string aaMapping = "GALMFWKQESPVICYHRNDT X-"
 Predefined mapping string for amino acid data. More...
 

Detailed Description

Computes diversity based on haplotype analysis.

This class relies on detection of polymorphic sites, as does NucleotideDiversity, with the exception that sites with missing data cannot be processed (minimumExploitableData is enforced to 1.).

Like NucleotideDiversity, the same object can be used to analyze different data sets. Only the call to load() is required before accessing the data.

Hst, Gst and Kst are between population differenciation indices. They are respectively defined in equations 2, 5-6 and 9 of Hudson et al. 1992a (Molecular Biology and Evolution 9:138-151). Also, Fst is defined in equation 3 of Hudson et al. 1992b (Genetics 132:583-589). Finally, Snn is from Hudson 2000 Genetics. It is computed as the average of Xi for all sequences. Where Xi is the ratio of nearest neighbours from the same group to the number of nearest neighbours. Nearest neigbours are all the sequences with the lowest number of differences to the focal sequence. NOTE: Gst/Hst are quite similar, but Fst and Kst are more different. Snn is a different statistic. Gst and Hst are two ways to estimate the between-population fraction of haplotypic diversity.

Constructor & Destructor Documentation

Constructor.

~HaplotypeDiversity ( )
virtual

Destructor.

Member Function Documentation

unsigned int haplotypeIndex ( unsigned int  index) const

Returns the allele number of a given sequence.

The passed index must be given ignoring any outgroup sequence.

void load ( CharMatrix data,
bool  allowMultipleMutations = false,
unsigned int  ignoreFrequency = 0,
std::string  characterMapping = dnaMapping 
)

Identifies polymorphic sites and computes basis statistics.

Parameters
dataan alignment object (subclass of CharMatrix). The presence of outgroup or of different populations will be detected based on the populationLabel members of the passed object. The populationLabel 999 will be interpreted as outgroups. If several outgroups are passed, sites were the outgroups are not consistent will be treated as "non- orientable".
allowMultipleMutationsif true, sites with more than two alleles will not be ignored. The sum of the frequencies of all alleles not matching the outgroup will treated as the derived allele frequency (for orientable sites).
ignoreFrequencyremoves sites that are polymorph because of an allele at absolute frequency smaller than or equal to this value. If ignoreFrequency=1, no sites are removed, if ignoreFrequency=1, singleton sites are ignored. Such sites are completely removed from the analysis (not counted in lseff). Note that if more than one mutation is allowed, the site is removed only if all the alleles but one are smaller than or equal to this value. For example, an alignment column AAAAAAGAAT is ignored with an ignoreFrequency of 1, but AAAAAAGGAT is conserved (including the third allele T which is a singleton).
characterMappinga string giving the list of characters that should be considered as valid data. If a space is present in the string, the characters left of the space will be treated as valid data and the characters right of the space will be treated as missing data, that is tolerated but ignored. All characters not in the string will cause an EggInvalidCharacterError to be raised.

The documentation for this class was generated from the following files:

Hosted by 
Get EggLib at SourceForge.net. Fast, secure and Free Open Source software downloads