Next: , Previous: , Up: Cleaning   [Contents]

2.2.8 Masked sequences filtering

The MaskFilter split alignment blocks by removing regions with too many masked positions (typically showing repeat-content), using sliding windows. A masked position is identified by a lower case letter in the original sequence. Windows with more than a given amount of lower case characters will be discarded, and the corresponding block split.


maf.filter=                                 \
    MaskFilter(                             \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.masked=2,                       \
        file=data.trash_msk.maf.gz,         \
        compression=gzip),                  \


species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.


The width, in bp, of the sliding window.


The step by which the window is moved, in bp.


The maximum number of lower-case characters allowed in each window.


An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.


Compression format for output file (if file != none).