Next: , Previous: , Up: Descriptive   [Contents]

2.3.1.10 Polymorphism statistics

Compute various statistics describing sequence polymorphism. Two aligned sets of “species” are compared, and the number of polymorphic / fixed sites are computed:

F

Number of sites fixed in the two sets

FF

Number of sites fixed in the two sets, yet with a distinct state

P

Number of sites that are polymorphic in the two sets

FP

Number of sites that are fixed in set 1 but polymorhic in set 2

PF

Number of sites that are fixed in set 2 but polymorphic in set 1

Positions containing a gap or an unresolved character in one set are considered ambiguous. Such positions are counted separately in the following quantities:

X

Number of sites unresolved in the two sets

FX

Number of sites fixed in set 1 and unresolved in set 2

XF

Number of sites fixed in set 2 and unresolved in set 1

PX

Number of sites polymorphic in set 1 and unresolved in set 2

XP

Number of sites polymorphic in set 2 and unresolved in set 1

Synopsis:

maf.filter=                                 \
    [...],                                   
    SequenceStatistics(                     \
        statistics=(\                       \
            [...],                                                    
            SiteFrequencySpectrum(          \
                bounds=(-0.5, 0.5, 1.5),    \
                ingroup=(pop1, pop2, pop3), \
                outgroup=species2,          \
            [...]),                         \
        ref_species=pop1,                   \
        file=data.statistics.csv),          \
    [...]

Arguments:

species1={list}

A list of species for set 1.

species2={list}

A list of species for set 2.

Note that the “species” terminology relates to multispecies alignments, as originally implemented in the MultiZ aligner. These statistics will however be most relevant when the aligned sequences are actually from individuals from the same population / species. The term “species” is here therefore to be taken in terchnical terms (a sequence id in the alignment), and not biological.