This module wraps the C++-implemented coalescent simulator. To perform coalescent simulations, the user must create a CoalesceParamSet instance with the desired parameter values (and call the appropriate methods to set optional demographic changes), and an instance of one of the subclasses of CoalesceMutator, and then pass these two objects to the simulation function. The function will return a list of Align or SSR instances.
Note
Only polymorphic sites are returned. This can be significant when computed polymorphism statistics be expressed per site. Such statistics should then be multiplied by the length of the corresponding alignments (that is, the number of segregating sites) to obtain a gene-wise value.
Generates simulated datasets, using the incorporated coalescent simulator.
paramSet must be a CoalesceParamSet instance; it holds parameters controlling the demographic process (and therefore the shape of generated genealogical trees).
mutator must be an instance of a subclass of CoalesceMutator (but not CoalesceMutator itself); it holds parameters controlling the generation of genetic data (sequence or microsatellites).
repets gives the number of repetitions to perform.
random controls the pseudo-random number generator. This argument is polymorph: if None, the generator is automatically set (using the system clock to determine seed values); if random is a sequence of two numbers, they will be used a seeds of the random number generator (therefore allowing to replicate results with a given set of parameters); alternatively, seeds can be a Random instance.
maxNumberOfIterations sets the maximum number of steps in the coalescent algorithm (provided as a safeguard).
convert if True, converts the generated DataMatrix objects to high-level types (Align or SSR). Otherwise, returns them as is.
forceSSR if True and if convert is True, return SSR instances whatever options were passed regarding the mutation model (that is, even if a finite-allele model with 4 or less alleles was used). The value of this option is ignored if convert is True.
Returns a list of simulated data sets. Data sets are instances of DataMatrix if convert is False. Otherwise, they are automatically converted to SSR instances or to Align instances. Conversion to Align is only performed if the mutator is of type CoalesceFiniteAlleleMutator, the number of alleles is smaller than or equal to 4 and if forceSSR is False.
All returned instances have non-default members providing simulation statistics:
- tMRCA (the time to the most recent common ancestor),
- totLength (sum of tree branch lengths),
- nMutations (number of mutations that occurred),
- nRecomb (number of recombination events).
New in version 2.0.1.
Changed in version 2.1.0: forceSSR is added, and Align instances are no longer returned if the number of alleles is above 4.
Holder for coalescent parameters related to the reconstruction of genealogical trees. The value of different demographic parameters are passed to the constructor, and the user can subsequently add demographic changes. The order in which demographic changes are entered is not important (they are sorted automatically by date).
New in version 2.0.1.
Constructor arguments:
singleSamples and doubleSamples specify the number of sampled genes. Since the underlying model is diploid, it is possible to sample either one or both chromosomes of a given individual. singleSamples specifies the number of individuals of which one (random) chromosome was sampled. doubleSamples specifies the number of individuals of which both chromosomes were sampled. For each of these options, an int, a sequence can be passed, as well as None, which means that no samples of this type were collected. The number of populations in the model is implied by these two options. The number of populations is given by the length of the passed list (one population if integers are passed). The number of populations implied by these two options must be consistent, except if one of them is None (in what a list of 0 is assumed). An example is singleSamples=[10, 20], doubleSamples=[5, 0] which sums up to 20 genes in both populations.
s gives the selfing rate. s=0. means total panmixia and s=1. total autogamy.
rho is the recombination rate, expressed as 4Nc where N is the population size (number of diploid individuals) and c the per-gene instantaneous recombination rate. If rho is >0, it is required to provide a value of nsites >1.
nsites is the number of recombining sites (ignored if rho is zero).
alpha is the exponential growth/decline rate. A value of alpha greater than 0 means that the population size was smaller in the past. If a float is passed, the same value will be applied to all populations. If a sequence is passed, is specifies individual, population-specific, rates and the length of the sequence must match the length of singleSamples and/or doubleSamples.
M is the migration rate, expressed as 4Nm where N is the population size (number of diploid individuals) and m the instantaneous migration rate (the probability that a given individual changes deme). If M is a float, all non- diagonal values of the migration matrix will be set to M/(k-1) where k is the number of populations in the system (the proportion of migrants will always be M). Otherwise M must be a sequence of sequences of dimensions kxk where k is the number of populations. It is illegal to set the diagonal of the matrix, and therefore it is enforced that all values of the diagonal are None. Here is an example for 2 populations with asymetric migration rates: M=[[None, 0.1], [0.2, None]], and an example for 3 populations with a “stepping stone” model of migration: M=[[None, 0.1, 0.0], [0.1, None, 0.1], [0., 0.1, None]].
N is the relative size of all populations. It is possible to pass a float, what sets all populations to the same size and corresponds to rescaling the time scale. Otherwise a sequence of float must be passed, and the length of this sequence must match the length of singleSamples and/or doubleSamples.
At time date, apply a bottleneck of strength strength to all populations. The bottleneck strength corresponds to an amount of time where the time counted is blocked and only coalescences are allowed (resulting in a given - and random - number of instantaneous coalescence with branches of length 0).
At time date, change all growth rates to value.
At time date, change all pairwise migration rates to value
At time date, change all populations sizes to value
At time date, change all the migration rate from population source to the population dest to the value migr. It is illegal to change a value of the diagonal.
At time date, change all selfing rate to value.
At time date, change growth rate of population population to value.
At time date, change size of population population to value
At time date, all the lineages from the population daughter are moved to the population mother and all mutation rates to the population daughter are cancelled.
A the time given by date, the population population is split in two. An additional population (incremented from the current total number of populations) is created and lineages are randomly picked from population population and moved to the new population, with probability probability. If probability is 0., the simulator creates an empty population (thinking forward in time, this corresponds to a population extinction). Note: the properties of the newly introduced population (size, migration rates, growth rate etc.) are taken from defaults and not inherited from the source population (the migration rate is arbitrarily fixed to the migration rate from the first pair of populations). The source population is not modified.
Returns a list of (d,s) tuples (one per population), with d the number of diploid samples and s the number of haploid samples from a given population
At time date, apply a bottleneck of strength strength to population population. The bottleneck strength corresponds to an amount of time where the time counted is blocked and only coalescences are allowed (resulting in a given - and random - number of instantaneous coalescence with branches of length 0).
Base class for mutator objects.
New in version 2.0.1.
Fix the final number of mutations to number. It is not guaranteed that it equals to the number of polymorphic sites unless the number of mutable sites is infinite. It is required that the object was created with theta = 0 to use this method.
Returns the number of mutations yielded by the last simulation (returns 0 by default).
sites gives the number of mutable sites. If it equals to 0 or False or has length 0, an infinite number of sites will be assumed and all mutations will hit a different site. If sites is an integer, it specifies the number of mutable sites. They all will be placed at equal distance along a virtual chromosome bound by 0 and 1 and have equal mutation weight. Otherwise, sites must be a sequence. The length of the sequence gives the number of sites. Each site must be represented by a sequence of two items: the site position (between 0. and 1.) and its weight. The weight can be any positive number and gives the relative probability that a given mutation hits that particular site.
Bases: egglib.simul.CoalesceMutator
Represents a mutation model with fixed number of alleles. At each mutation, alleles are drawn from a finite set of possible values. It is possible to set no equiprobable transitions to the different transition using the method transitionWeights(). This model sets by default an infinite number of mutable sites.
Constructors arguments: theta is the mutation rate, expressed as 4Nu where N is the number of diploid individuals of a population and u is the per-gene mutation rate. alleles is the number of possible alleles. If randomAncestralState, the ancestral state is drawn randomly from possible states (otherwise, it will be zero).
Fix the final number of mutations to number. It is not guaranteed that it equals to the number of polymorphic sites unless the number of mutable sites is infinite. It is required that the object was created with theta = 0 to use this method.
Returns the number of mutations yielded by the last simulation (returns 0 by default).
sites gives the number of mutable sites. If it equals to 0 or False or has length 0, an infinite number of sites will be assumed and all mutations will hit a different site. If sites is an integer, it specifies the number of mutable sites. They all will be placed at equal distance along a virtual chromosome bound by 0 and 1 and have equal mutation weight. Otherwise, sites must be a sequence. The length of the sequence gives the number of sites. Each site must be represented by a sequence of two items: the site position (between 0. and 1.) and its weight. The weight can be any positive number and gives the relative probability that a given mutation hits that particular site.
Sets the weights to apply to each possible transition. Here, a transition is taken as any mutation from one character to an other. If the number of alleles is k, matrix must be a sequence of k sequences of k weights (strictly positive values). The higher the weight, the more likely the transition. The values on the diagonal are required to be None.
Bases: egglib.simul.CoalesceMutator
Represents a mutation model with an infinite number of alleles. Each mutation creates a different new allele. This model sets by default one possible mutable site.
theta is the mutation rate, expressed as 4Nu where N is the number of diploid individuals of a population and u is the per-gene mutation rate.
Fix the final number of mutations to number. It is not guaranteed that it equals to the number of polymorphic sites unless the number of mutable sites is infinite. It is required that the object was created with theta = 0 to use this method.
Returns the number of mutations yielded by the last simulation (returns 0 by default).
sites gives the number of mutable sites. If it equals to 0 or False or has length 0, an infinite number of sites will be assumed and all mutations will hit a different site. If sites is an integer, it specifies the number of mutable sites. They all will be placed at equal distance along a virtual chromosome bound by 0 and 1 and have equal mutation weight. Otherwise, sites must be a sequence. The length of the sequence gives the number of sites. Each site must be represented by a sequence of two items: the site position (between 0. and 1.) and its weight. The weight can be any positive number and gives the relative probability that a given mutation hits that particular site.
Bases: egglib.simul.CoalesceMutator
Represents the stepwise mutation model. Each mutation randomly increments or decrements the current allele value with a step of one unit. This model sets by default one possible mutable site.
theta is the mutation rate, expressed as 4Nu where N is the number of diploid individuals of a population and u is the per-gene mutation rate.
Fix the final number of mutations to number. It is not guaranteed that it equals to the number of polymorphic sites unless the number of mutable sites is infinite. It is required that the object was created with theta = 0 to use this method.
Returns the number of mutations yielded by the last simulation (returns 0 by default).
sites gives the number of mutable sites. If it equals to 0 or False or has length 0, an infinite number of sites will be assumed and all mutations will hit a different site. If sites is an integer, it specifies the number of mutable sites. They all will be placed at equal distance along a virtual chromosome bound by 0 and 1 and have equal mutation weight. Otherwise, sites must be a sequence. The length of the sequence gives the number of sites. Each site must be represented by a sequence of two items: the site position (between 0. and 1.) and its weight. The weight can be any positive number and gives the relative probability that a given mutation hits that particular site.
Bases: egglib.simul.CoalesceMutator
Represents the two-phase mutation model. Each mutation randomly increments or decrements the current allele value with a variable step. With a given probability, the step is 1; otherwise it is drawn from a gamma distribution. This model sets by default one possible mutable site.
theta is the mutation rate, expressed as 4Nu where N is the number of diploid individuals of a population and u is the per-gene mutation rate. proba gives the probability of drawing the mutation step from the gamma distribution of parameter param. Both proba and param must be in range [0,1].
Fix the final number of mutations to number. It is not guaranteed that it equals to the number of polymorphic sites unless the number of mutable sites is infinite. It is required that the object was created with theta = 0 to use this method.
Returns the number of mutations yielded by the last simulation (returns 0 by default).
sites gives the number of mutable sites. If it equals to 0 or False or has length 0, an infinite number of sites will be assumed and all mutations will hit a different site. If sites is an integer, it specifies the number of mutable sites. They all will be placed at equal distance along a virtual chromosome bound by 0 and 1 and have equal mutation weight. Otherwise, sites must be a sequence. The length of the sequence gives the number of sites. Each site must be represented by a sequence of two items: the site position (between 0. and 1.) and its weight. The weight can be any positive number and gives the relative probability that a given mutation hits that particular site.