MafFilter Manual 1.3.1: DistanceEstimation

2.3.2.1 Distance matrix estimation

Estimates a pairwise distance matrix.

Synopsis:

maf.filter=                                 \
    [...],
    DistanceEstimation(                     \
        method=count,                       \
        gap_option=no_double_gap,           \
        unresolved_as_gap=no,               \
        extended_names=yes),                \
    [...]

maf.filter=                                 \
    [...],
    DistanceEstimation(                     \
        method=ml,                          \
        model=K80(kappa=2),                 \
        rate=Gamma(n=4, alpha=0.5),         \
        parameter_estimation=initial,       \
        max_freq_gaps=0.33,                 \
        gaps_as_unresolved=yes,             \
        profiler=none,                      \
        message_handler=none,               \
        extended_names=yes),                \
    [...]

Arguments:

method={count|ml}: Method used to estimate distance, either observed count or maximum likelihood estimate.

Further arguments for the observed counts:

gap_option={string}

Specifies how to deal with gaps:

all: All positions are used. Gaps are considered as a fifth character.
no_full_gap: Positions only made of gaps in the alignment block are ignored. Alternatively, a gap in the two sequences is ocnsidered as a match (gap are a “fifth” charcater).
no_double_gap: For each pairwise comparison, positions where a gap is found in both sequences are ignored.
no_gap: For each pairwise comparison, any gap-containing position is ignored. This is the recommended option for building phylogenies.

unresolved_as_gap={yes|no}

Tell is unresolved characters should be treated as gaps (usually in order to be ignored).

extended_names={boolean}

Tell if sequence coordinates should be included in the sequence names stored in the output matrix.

Further arguments for the ML method:

model={substitution model description}

See the Bio++ Program Suite manual for a description of substitution models available. Only nucleotide models can be used.

model=JC
model=K80(kappa=2)
model=T92(kappa=2, theta=0.5)
model=GTR

rate={rate distribution description}

See the Bio++ Program Suite manual for a description of available options.

rate=Constant
rate=Gamma(n=4, alpha=0.5)

profiler={none|std|{path}}

Where to print optimization steps (nowhere, standard output or to a given file).

message_handler={none|std|{path}}

Where to log optimization (nowhere, standard output or to a given file).

parameter_estimation={initial|pairwise}

How to estimate substitution process parameters (for instance kappa and alpha). Available options are either to leave them to their initial values, or to estimate them for each pair of sequences.

max_freq_gaps={float}

The maximum proportion of gaps for a site to be included in the analysis.

gaps_as_unresolved={yes/no}

Tell if remaining gaps should be converted to ’N’ before likelihood computation. This should be ’yes’ unless you specify a substitution model which explicitely allows for gaps.

Extra-data:

CountDistance or MLDistance: The estimated pairwise distance matrix.