Next: , Previous: , Up: Cleaning   [Contents]

2.2.8 Masked sequences filtering

The MaskFilter split alignment blocks by removing regions with too many masked positions (typically showing repeat-content), using sliding windows. A masked position is identified by a lower case letter in the original sequence. Windows with more than a given amount of lower case characters will be discarded, and the corresponding block split.

Synopsis:

maf.filter=                                 \
    [...],
    MaskFilter(                             \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.masked=2,                       \
        file=data.trash_msk.maf.gz,         \
        compression=gzip),                  \
    [...]

Arguments:

species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.

window.size={int>0}

The width, in bp, of the sliding window.

window.step={int>0}

The step by which the window is moved, in bp.

max.masked={int>0}

The maximum number of lower-case characters allowed in each window.

file={none|{path}}

An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.

compresion={none|gzip|zip|bzip2}

Compression format for output file (if file != none).