Next: , Previous: , Up: Descriptive   [Contents]

2.3.1.9 Site frequency spectrum

The SiteFrequencySpectrum computes the site frequency spectrum for each block. Only positions in the alignment with only two states are considered. The proportions of bi-allelic sites are then computed by bins. Lets consider the following example with 7 sequences:

ACGT
ACTT
AGTT
AGTT
TCTT
TCTT
TCTT

It contains one site with 4/7 (eq 3/7), one site with 2/7 (eq 5/7), one site with 1/7 (eq 6/7) and one site 0/7 (eq 7/7). With seven sequences, there are actually 4 possibles frequencies: 1/7, 2/7, 3/7 and 4/7 (plus the 0/7 frequency, corresponding to a site with no-mutation). It is possible to compute all these frequencies individually by settingthe bounds parameter to

SiteFrequencySpectrum(bounds=(-0.5,0.5,1.5,2.5,3.5,4.5), ...)

which will output the number of site with 0, 1, 2, 3 or 4 minor states. Sites with more than 2 states are always counted separately, as well as sites containing unresolved characters. If one want to count only constant sites for instance, one can simply type

SiteFrequencySpectrum(bounds=(-0.5,0.5), ...)

The remaining sites will be pulled in a column called “Ignored”.

Synopsis:

maf.filter=                                 \
    [...],                                   
    SequenceStatistics(                     \
        statistics=(\                       \
            [...],                                                    
            SiteFrequencySpectrum(          \
                bounds=(-0.5, 0.5, 1.5),    \
                ingroup=(pop1, pop2, pop3), \
                outgroup=species2,          \
            [...]),                         \
        ref_species=pop1,                   \
        file=data.statistics.csv),          \
    [...]

Arguments:

bounds={list of double}

The bounds delimiting the bins of the spectrum to be computed.

ingroup={list}

A list of species forming the ingroup and on which the statistics should be calculated.

outgroup=species

A species name saying which sequence should be used to determine the ancestral state. If non-empty, the unfolded frequency spectrum will be computed. If empty, the folded frequency spectrum will be returned (1/5 and 4/5 frequences will be pulled, so that the maximum frequency will be 1/2).