Next: , Previous: , Up: Cleaning   [Contents]

2.2.7 Entropy filtering

This filter removes highly variable regions and split the blocks accordingly. It uses a sliding windows, and compute the entropy for each site in the window. The window is then discarded if it containes more than ’p’ sites with an entropy higher that a user-specified threshold. The alignment block is then split into separate block accordingly.

Synopsis:

maf.filter=                                 \
    [...],
    EntropyFilter(                          \
        species=(species1,species2,etc),    \
        window.size=10,                     \
        window.step=1,                      \
        max.ent=0.2,                        \
        max.pos=3,                          \
        missing_as_gap=yes,                 \
        ignore_gaps=yes,                    \
        file=data.trash_ent.maf.gz,         \
        compression=gzip),                  \
    [...]

Arguments:

species=(species1, species2, etc)

A coma separated, within parentheses, list of species. All calculations will be performed on the sub-alignment corresponding to these species only.

window.size={int>0}

The width, in bp, of the sliding window.

window.step={int>0}

The step by which the window is moved, in bp.

max.ent={float}

The maximum entropy allowed at each site.

max.pos={int>0}

The maximum number of positions with an entropy higher than the given threshold.

missing_as_gap={yes/no}

Tell if unresolved characters should be counted as gaps.

ignore_gaps={yes/no}

Tell if gaps should not be counted in entropy calculation. If no, then gaps are counted as a “fifth” state.

file={none|{path}}

An optional file were removed alignment parts will be stored, in the MAF format. This can be helpful for visual inspection and fine tuning of the filter parameters.

compression={none|gzip|zip|bzip2}

Compression format for output file (if file != none).