Paralog diversityΒΆ

Following Innan (Genetics 2003 163:803-810), these statistics are specifically designed to compute nucleotide diversity for each paralog in a multigene family (\(\pi_w\)), as well as between-paralog divergence for all pairs (\(\pi_b\)). They are availabled from the function stats.paralog_pi().

For a given paralog, we have:

\[\pi_w = \sum_i^L \frac{2}{n_i (n_i-1)}k_i\]

with \(L\) the number of sites, \(n_i\) the number of exploitable samples for this paralog at site \(i\) and \(k_i\) the number of pairwise differences at this site.

And for a given pair of paralogs:

\[\pi_b = \sum_i^L \frac{d_i}{n_{ai} n_{bi}}\]

with \(d_i\) the number of differences between the two paralogs and \(n_{ai}\) and \(n_{bi}\) the respective numbers of exploitable samples for the two paralogs at site \(i\).