augur.frequency_estimators module¶

class augur.frequency_estimators.AlignmentKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶

Bases: augur.frequency_estimators.KdeFrequencies

KDE frequency estimator for multiple sequence alignments

Estimates frequencies for samples provided as sequences in a multiple sequence alignment and corresponding observation dates for each sequence.

estimate(alignment, observations)¶: Estimate frequencies of bases/residues at each site in a given multiple sequence alignment based on the observation dates associated with each sequence in the alignment.

class augur.frequency_estimators.KdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶

Bases: object

Methods to estimate clade frequencies for phylogenetic trees by creating normal distributions from timestamped tips in the tree and building a kernel density estimate across discrete time points from these tip observations for each clade in the tree.

classmethod estimate_frequencies(tip_dates, pivots, normalize_to=1.0, max_date=None, **kwargs)¶: Estimate frequencies of the given observations across the given pivots.

classmethod from_json(json_dict)¶: Returns an instance populated with parameters and data from the given JSON dictionary.

classmethod get_densities_for_observations(observations, pivots, max_date=None, **kwargs)¶

Create a matrix of densities for one or more observations across the given pivots with one row per observation and one column per pivot.

Observations can be optionally filtered by a maximum date such that all densities are estimated to be zero after that date.

classmethod get_density_for_observation(mu, pivots, sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, **kwargs)¶: Build a normal distribution centered across the given floating point date, mu, with a standard deviation based on the given sigma value and return the probability mass at each pivot. These mass values per pivot will form the input for a kernel density estimate across multiple observations.

get_params()¶

Returns the parameters used to define the current instance.

Returns: parameters that define the current instance and that can be used to create a new instance
Return type: dict

classmethod normalize_to_frequencies(density_matrix, normalize_to=1.0)¶: Normalize the values of a given density matrix to 1 across all columns (time points) with non-zero sums. This converts kernal PDF mass into a frequency estimate.

to_json()¶: Returns a dictionary for the current instance that can be serialized in a JSON file.

class augur.frequency_estimators.TreeKdeFrequencies(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶

Bases: augur.frequency_estimators.KdeFrequencies

KDE frequency estimator for phylogenetic trees

Estimates frequencies for samples provided as annotated tips on a given tree and optionally reports clade-level frequencies based on the sum of each clade’s respective tips.

estimate(tree)¶

Estimate frequencies for a given tree using the parameters defined for this instance.

If weights are defined, frequencies represent a weighted mean across the values in attribute defined by self.weights_attribute.

Parameters: tree (Bio.Phylo) – annotated tree whose nodes all have an attr attribute with at least “num_date” key
Returns: node frequencies by clade
Return type: frequencies (dict)

estimate_tip_frequencies_to_proportion(tips, proportion)¶

Estimate frequencies for a given set of tips up to a given proportion of total frequencies.

Parameters

tips (list) – a list of Bio.Phylo terminals annotated with attributes in tip.attr
proportion (float) – the proportion of the total frequency that the given tips should represent

Returns

frequencies of given tips indexed by tip name

Return type

dict

tip_passes_filters(tip)¶

Returns a boolean indicating whether a given tip passes the node filters defined for the current instance.

If no filters are defined, returns True.

Parameters: tip (Bio.Phylo) – tip from a Bio.Phylo tree annotated with attributes in tip.attr
Returns: whether the given tip passes the defined filters or not
Return type: bool

exception augur.frequency_estimators.TreeKdeFrequenciesError¶

Bases: Exception

Represents an error estimating KDE frequencies for a tree.

class augur.frequency_estimators.alignment_frequencies(aln, tps, pivots, **kwargs)¶

Bases: object

calculates frequencies of mutations in an alignment. uses nested frequencies for mutations at a particular site such that mutations are forced to add up to one.

calc_confidence()¶

calculate a crude binomial sampling confidence interval of the frequency estimate. This ignores autocorrelation of the trajectory and returns one standard deviation (68%).

Returns: dictionary of standard deviations for each estimated trajectory
Return type: dict

estimate_genotype_frequency(gt)¶

slice an alignment at possibly multiple positions and calculate the frequency trajectory of this multi-locus genotype

Parameters: gt (list) – a list of (position, state) tuples specifying the genotype whose frequency is to be estimated
Returns: frequency trajectory
Return type: np.array

mutation_frequencies(min_freq=0.01, include_set=None, ignore_char='')¶

estimate frequencies of single site mutations for each alignment column. This function populates a dictionary class.frequencies with the frequency trajectories.

Parameters

min_freq (float, optional) – minimal all-time frequency for an aligment column to be considered
include_set (list/set, optional) – set of alignment column that will be used regardless of variation
ignore_char (str, optional) – ignore this character in an alignment column (missing data)

augur.frequency_estimators.count_observations(pivots, tps)¶

augur.frequency_estimators.fix_freq(freq, pc)¶

restricts frequencies to the interval [pc, 1-pc] removes np.nan values and avoids taking logarithms of 0 or divisions by 0

Parameters

freq (np.array) – frequency trajectory to be thresholded
pc (float) – threshold value

Returns

thresholded frequency trajectory

Return type

np.array

augur.frequency_estimators.float_to_datestring(time)¶

Convert a floating point date to a date string

>>> float_to_datestring(2010.75)
'2010-10-01'
>>> float_to_datestring(2011.25)
'2011-04-01'
>>> float_to_datestring(2011.0)
'2011-01-01'
>>> float_to_datestring(2011.0 + 11.0 / 12)
'2011-12-01'

In some cases, the given float value can be truncated leading to unexpected conversion between floating point and integer values. This function should account for these errors by rounding months to the nearest integer.

>>> float_to_datestring(2011.9166666666665)
'2011-12-01'
>>> float_to_datestring(2016.9609856262834)
'2016-12-01'

class augur.frequency_estimators.freq_est_clipped(tps, obs, pivots, dtps=None, **kwargs)¶

Bases: object

simple wrapper for the frequency estimator that attempts to estimate a frequency trajectory on a sensible range of pivots to avoid frequency optimization at points without data

dtps¶

Description

Type: TYPE

fe¶

Description

Type: TYPE

good_pivots¶

Description

Type: TYPE

good_tps¶

Description

Type: TYPE

obs¶

Description

Type: TYPE

pivot_freq¶

Description

Type: TYPE

pivot_lower_cutoff¶

Description

Type: TYPE

pivot_upper_cutoff¶

Description

Type: TYPE

pivots¶

Description

Type: TYPE

tps¶

Description

Type: TYPE

valid¶

Description

Type: bool

learn()¶

class augur.frequency_estimators.frequency_estimator(tps, obs, pivots, stiffness=20.0, inertia=0.0, tol=0.001, pc=0.0001, ws=100, method='powell', **kwargs)¶

Bases: object

estimates a smooth frequency trajectory given a series of time stamped 0/1 observations. The most likely set of frequencies at specified pivot values is determined by numerical minimization. Likelihood consist of a bernoulli sampling term as well as a term penalizing rapid frequency shifts. this term is motivated by genetic drift, i.e., sampling variation.

initial_guess(pc=0.01)¶

learn(initial_guess=None)¶

stiffLH()¶

augur.frequency_estimators.get_pivots(observations, pivot_interval, start_date=None, end_date=None)¶

Calculate pivots for a given list of floating point observation dates and interval between pivots.

Start and end pivots will be based on the range of given observed dates, unless a start or end date are provided to override these defaults.

Parameters

observations (list) – a list of observed floating point dates per sample
pivot_interval (int) – number of months between pivots
start_date (float) – optional start of the pivots interval
end_date (float) – optional end of the pivots interval

Returns

pivots – floating point pivots spanning the given the dates

Return type

ndarray

augur.frequency_estimators.logit_inv(logit_freq, pc)¶

augur.frequency_estimators.logit_transform(freq, pc)¶

augur.frequency_estimators.make_pivots(pivots, tps)¶

if pivots is a scalar, make a grid of pivot points covering the entire range

Parameters

pivots (scalar or iterable) – either number of pivots (a scalar) or the actual pivots (will be cast to array and returned)
tps (np.array) – observation time points. Will generate pivots spanning min/max

Returns

pivots – array of pivot values

Return type

np.array

class augur.frequency_estimators.nested_frequencies(tps, obs, pivots, **kwargs)¶

Bases: object

estimates frequencies of mutually exclusive events such as mutations at a particular genomic position or subclades in a tree

calc_freqs()¶

augur.frequency_estimators.pq(p)¶

augur.frequency_estimators.running_average(obs, ws)¶

calculates a running average obs – observations ws – window size (number of points to average)

Parameters

obs (list/np.array(bool)) – observations
ws (int) – window size as measured in number of consecutive points

Returns

running average of the boolean observations

Return type

np.array(float)

augur.frequency_estimators.test_nested_estimator()¶

augur.frequency_estimators.test_simple_estimator()¶

augur.frequency_estimators.timestamp_to_float(time)¶

Convert a pandas timestamp to a floating point date.

>>> import datetime
>>> time = datetime.date(2010, 10, 1)
>>> timestamp_to_float(time)
2010.75
>>> time = datetime.date(2011, 4, 1)
>>> timestamp_to_float(time)
2011.25
>>> timestamp_to_float(datetime.date(2011, 1, 1))
2011.0
>>> timestamp_to_float(datetime.date(2011, 12, 1)) == (2011.0 + 11.0 / 12)
True

class augur.frequency_estimators.tree_frequencies(tree, pivots, node_filter=None, min_clades=10, verbose=0, pc=0.0001, **kwargs)¶

Bases: object

class that estimates frequencies for nodes in the tree. each internal node is assumed to be named with an attribute clade, of root doesn’t have such an attribute, clades will be numbered in preorder. Each node is assumed to have an attribute attr with a key “num_date”.

calc_confidence()¶

for each frequency trajectory, calculate the bernouilli sampling error – in reality we should look at the inverse hessian, but this is a useful approximation in most cases

Returns: dictionary with estimated confidence intervals
Return type: dict

estimate_clade_frequencies()¶

prepare()¶