Next: , Previous: , Up: Cleaning   [Contents]

2.2.5 Alignment filtering

Split alignment blocks by removing regions with ambiguous alignments. The local uncertainty in the alignment is determined through a sliding window based approach. For each window, the number of gap characters and the total entropy are computed. Any window for which both the entropy and number of gaps exceed the given thresholds will be removed from the alignment, and the corresponding block split accordingly.

Synopsis:

maf.filter=                                 \
    [...],
    AlnFilter(                              \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.gap=9,                          \
        max.ent=0.2,                        \
        missing_as_gap=yes,                 \
        relative=no,                        \
        file=data.trash_aln.maf.gz,         \
        compression=gzip),                  \
    [...]

Arguments:

species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.

window.size={int>0}

The width, in bp, of the sliding window.

window.step={int>0}

The step by which the window is moved, in bp.

relative={boolean}

Tell if maximum amount of gap is relative (that is, as a proportion of the total amount of character in each window).

max.gap={int>0|1>double>0}

The maximum number of gaps allowed in each window (if relative is set to no), or the maximum proportion of gaps (if relative is set to yes)

max.ent={float}

The maximum entropy allowed in each window.

missing_as_gap={yes/no}

Tell if missing sequences should be considered as gaps.

file={none|{path}}

An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.

compression={none|gzip|zip|bzip2}

Compression format for output file (if file != none).