Next: , Previous: , Up: Cleaning   [Contents]

2.2.7 Entropy filtering

This filter removes highly variable regions and split the blocks accordingly. It uses a sliding windows, and compute the entropy for each site in the window. The window is then discarded if it containes more than ’p’ sites with an entropy higher that a user-specified threshold. The alignment block is then split into separate block accordingly.


maf.filter=                                 \
    EntropyFilter(                          \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.ent=0.2,                        \
        max.pos=3,                          \
        missing_as_gap=yes,                 \
        ignore_gaps=yes,                    \
        file=data.trash_ent.maf.gz,         \
        compression=gzip),                  \


species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.


The width, in bp, of the sliding window.


The step by which the window is moved, in bp.


The maximum entropy allowed at each site.


The maximum number of positions with an entropy higher than the given threshold.


Tell if unresolved characters should be counted as gaps.


Tell if gaps should not be counted in entropy calculation. If no, then gaps are counted as a “fifth” state.


An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.


Compression format for output file (if file != none).