Next: , Previous: , Up: Cleaning   [Contents]

2.2.6 Alignment filtering 2

This is another algorithm for cleaning alignment blocks (see AlnFilter), using sliding windows. The number of gaps in each alignment column in the window is counted, and the column is masked if it contains more than a given threshold of gaps. consecutive patterns in the window are only counted ones. In the follwing 10nt window:

AATCGGGCGT
AA---GCGGA
AA---CGGGT
CA---CGGGA

positions 3, 4 and 5 will be masked if the maximum number of gaps allowed is 2 or less. The three columns will however count as only one indel event. The window is then discarded if it contains more than a given number of indel events.

Synopsis:

maf.filter=                                 \
    [...],
    AlnFilter2(                             \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.gap=1,                          \
        max.pos=1,                          \
        relative=no,                        \
        missing_as_gap=yes,                 \
        file=data.trash_aln.maf.gz,         \
        compression=gzip),                  \
    [...]

Arguments:

species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.

window.size={int>0}

The width, in bp, of the sliding window.

window.step={int>0}

The step by which the window is moved, in bp.

relative={boolean}

Tell if maximum amount of gap is relative (that is, as a proportion of the total amount of character in each site).

max.gap={int>0|1>double>0}

The maximum number of gaps allowed in each site (if relative is set to no), or the maximum proportion of gaps (if relative is set to yes)

max.pos={int>0}

The maximum number of positions with gaps (“indel events”).

missing_as_gap={yes/no}

Tell if missing sequences should be considered as gaps.

file={none|{path}}

An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.

compression={none|gzip|zip|bzip2}

Compression format for output file (if file != none).