augur.frequency_estimators module¶
-
class
augur.frequency_estimators.
AlignmentKdeFrequencies
(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶ Bases:
augur.frequency_estimators.KdeFrequencies
KDE frequency estimator for multiple sequence alignments
Estimates frequencies for samples provided as sequences in a multiple sequence alignment and corresponding observation dates for each sequence.
-
estimate
(alignment, observations)¶ Estimate frequencies of bases/residues at each site in a given multiple sequence alignment based on the observation dates associated with each sequence in the alignment.
-
-
class
augur.frequency_estimators.
KdeFrequencies
(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶ Bases:
object
Methods to estimate clade frequencies for phylogenetic trees by creating normal distributions from timestamped tips in the tree and building a kernel density estimate across discrete time points from these tip observations for each clade in the tree.
-
classmethod
estimate_frequencies
(tip_dates, pivots, normalize_to=1.0, max_date=None, **kwargs)¶ Estimate frequencies of the given observations across the given pivots.
-
classmethod
from_json
(json_dict)¶ Returns an instance populated with parameters and data from the given JSON dictionary.
-
classmethod
get_densities_for_observations
(observations, pivots, max_date=None, **kwargs)¶ Create a matrix of densities for one or more observations across the given pivots with one row per observation and one column per pivot.
Observations can be optionally filtered by a maximum date such that all densities are estimated to be zero after that date.
-
classmethod
get_density_for_observation
(mu, pivots, sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, **kwargs)¶ Build a normal distribution centered across the given floating point date, mu, with a standard deviation based on the given sigma value and return the probability mass at each pivot. These mass values per pivot will form the input for a kernel density estimate across multiple observations.
-
get_params
()¶ Returns the parameters used to define the current instance.
- Returns
parameters that define the current instance and that can be used to create a new instance
- Return type
dict
-
classmethod
normalize_to_frequencies
(density_matrix, normalize_to=1.0)¶ Normalize the values of a given density matrix to 1 across all columns (time points) with non-zero sums. This converts kernal PDF mass into a frequency estimate.
-
to_json
()¶ Returns a dictionary for the current instance that can be serialized in a JSON file.
-
classmethod
-
class
augur.frequency_estimators.
TreeKdeFrequencies
(sigma_narrow=0.08333333333333333, sigma_wide=0.25, proportion_wide=0.2, pivot_frequency=1, start_date=None, end_date=None, weights=None, weights_attribute=None, node_filters=None, max_date=None, include_internal_nodes=False, censored=False)¶ Bases:
augur.frequency_estimators.KdeFrequencies
KDE frequency estimator for phylogenetic trees
Estimates frequencies for samples provided as annotated tips on a given tree and optionally reports clade-level frequencies based on the sum of each clade’s respective tips.
-
estimate
(tree)¶ Estimate frequencies for a given tree using the parameters defined for this instance.
If weights are defined, frequencies represent a weighted mean across the values in attribute defined by self.weights_attribute.
- Parameters
tree (Bio.Phylo) – annotated tree whose nodes all have an attr attribute with at least “num_date” key
- Returns
node frequencies by clade
- Return type
frequencies (dict)
-
estimate_tip_frequencies_to_proportion
(tips, proportion)¶ Estimate frequencies for a given set of tips up to a given proportion of total frequencies.
- Parameters
tips (list) – a list of Bio.Phylo terminals annotated with attributes in tip.attr
proportion (float) – the proportion of the total frequency that the given tips should represent
- Returns
frequencies of given tips indexed by tip name
- Return type
dict
-
tip_passes_filters
(tip)¶ Returns a boolean indicating whether a given tip passes the node filters defined for the current instance.
If no filters are defined, returns True.
- Parameters
tip (Bio.Phylo) – tip from a Bio.Phylo tree annotated with attributes in tip.attr
- Returns
whether the given tip passes the defined filters or not
- Return type
bool
-
-
exception
augur.frequency_estimators.
TreeKdeFrequenciesError
¶ Bases:
Exception
Represents an error estimating KDE frequencies for a tree.
-
class
augur.frequency_estimators.
alignment_frequencies
(aln, tps, pivots, **kwargs)¶ Bases:
object
calculates frequencies of mutations in an alignment. uses nested frequencies for mutations at a particular site such that mutations are forced to add up to one.
-
calc_confidence
()¶ calculate a crude binomial sampling confidence interval of the frequency estimate. This ignores autocorrelation of the trajectory and returns one standard deviation (68%).
- Returns
dictionary of standard deviations for each estimated trajectory
- Return type
dict
-
estimate_genotype_frequency
(gt)¶ slice an alignment at possibly multiple positions and calculate the frequency trajectory of this multi-locus genotype
- Parameters
gt (list) – a list of (position, state) tuples specifying the genotype whose frequency is to be estimated
- Returns
frequency trajectory
- Return type
np.array
-
mutation_frequencies
(min_freq=0.01, include_set=None, ignore_char='')¶ estimate frequencies of single site mutations for each alignment column. This function populates a dictionary class.frequencies with the frequency trajectories.
- Parameters
min_freq (float, optional) – minimal all-time frequency for an aligment column to be considered
include_set (list/set, optional) – set of alignment column that will be used regardless of variation
ignore_char (str, optional) – ignore this character in an alignment column (missing data)
-
-
augur.frequency_estimators.
count_observations
(pivots, tps)¶
-
augur.frequency_estimators.
fix_freq
(freq, pc)¶ restricts frequencies to the interval [pc, 1-pc] removes np.nan values and avoids taking logarithms of 0 or divisions by 0
- Parameters
freq (np.array) – frequency trajectory to be thresholded
pc (float) – threshold value
- Returns
thresholded frequency trajectory
- Return type
np.array
-
augur.frequency_estimators.
float_to_datestring
(time)¶ Convert a floating point date to a date string
>>> float_to_datestring(2010.75) '2010-10-01' >>> float_to_datestring(2011.25) '2011-04-01' >>> float_to_datestring(2011.0) '2011-01-01' >>> float_to_datestring(2011.0 + 11.0 / 12) '2011-12-01'
In some cases, the given float value can be truncated leading to unexpected conversion between floating point and integer values. This function should account for these errors by rounding months to the nearest integer.
>>> float_to_datestring(2011.9166666666665) '2011-12-01' >>> float_to_datestring(2016.9609856262834) '2016-12-01'
-
class
augur.frequency_estimators.
freq_est_clipped
(tps, obs, pivots, dtps=None, **kwargs)¶ Bases:
object
simple wrapper for the frequency estimator that attempts to estimate a frequency trajectory on a sensible range of pivots to avoid frequency optimization at points without data
-
dtps
¶ Description
- Type
TYPE
-
fe
¶ Description
- Type
TYPE
-
good_pivots
¶ Description
- Type
TYPE
-
good_tps
¶ Description
- Type
TYPE
-
obs
¶ Description
- Type
TYPE
-
pivot_freq
¶ Description
- Type
TYPE
-
pivot_lower_cutoff
¶ Description
- Type
TYPE
-
pivot_upper_cutoff
¶ Description
- Type
TYPE
-
pivots
¶ Description
- Type
TYPE
-
tps
¶ Description
- Type
TYPE
-
valid
¶ Description
- Type
bool
-
learn
()¶
-
-
class
augur.frequency_estimators.
frequency_estimator
(tps, obs, pivots, stiffness=20.0, inertia=0.0, tol=0.001, pc=0.0001, ws=100, method='powell', **kwargs)¶ Bases:
object
estimates a smooth frequency trajectory given a series of time stamped 0/1 observations. The most likely set of frequencies at specified pivot values is determined by numerical minimization. Likelihood consist of a bernoulli sampling term as well as a term penalizing rapid frequency shifts. this term is motivated by genetic drift, i.e., sampling variation.
-
initial_guess
(pc=0.01)¶
-
learn
(initial_guess=None)¶
-
stiffLH
()¶
-
-
augur.frequency_estimators.
get_pivots
(observations, pivot_interval, start_date=None, end_date=None)¶ Calculate pivots for a given list of floating point observation dates and interval between pivots.
Start and end pivots will be based on the range of given observed dates, unless a start or end date are provided to override these defaults.
- Parameters
observations (list) – a list of observed floating point dates per sample
pivot_interval (int) – number of months between pivots
start_date (float) – optional start of the pivots interval
end_date (float) – optional end of the pivots interval
- Returns
pivots – floating point pivots spanning the given the dates
- Return type
ndarray
-
augur.frequency_estimators.
logit_inv
(logit_freq, pc)¶
-
augur.frequency_estimators.
logit_transform
(freq, pc)¶
-
augur.frequency_estimators.
make_pivots
(pivots, tps)¶ if pivots is a scalar, make a grid of pivot points covering the entire range
- Parameters
pivots (scalar or iterable) – either number of pivots (a scalar) or the actual pivots (will be cast to array and returned)
tps (np.array) – observation time points. Will generate pivots spanning min/max
- Returns
pivots – array of pivot values
- Return type
np.array
-
class
augur.frequency_estimators.
nested_frequencies
(tps, obs, pivots, **kwargs)¶ Bases:
object
estimates frequencies of mutually exclusive events such as mutations at a particular genomic position or subclades in a tree
-
calc_freqs
()¶
-
-
augur.frequency_estimators.
pq
(p)¶
-
augur.frequency_estimators.
running_average
(obs, ws)¶ calculates a running average obs – observations ws – window size (number of points to average)
- Parameters
obs (list/np.array(bool)) – observations
ws (int) – window size as measured in number of consecutive points
- Returns
running average of the boolean observations
- Return type
np.array(float)
-
augur.frequency_estimators.
test_nested_estimator
()¶
-
augur.frequency_estimators.
test_simple_estimator
()¶
-
augur.frequency_estimators.
timestamp_to_float
(time)¶ Convert a pandas timestamp to a floating point date.
>>> import datetime >>> time = datetime.date(2010, 10, 1) >>> timestamp_to_float(time) 2010.75 >>> time = datetime.date(2011, 4, 1) >>> timestamp_to_float(time) 2011.25 >>> timestamp_to_float(datetime.date(2011, 1, 1)) 2011.0 >>> timestamp_to_float(datetime.date(2011, 12, 1)) == (2011.0 + 11.0 / 12) True
-
class
augur.frequency_estimators.
tree_frequencies
(tree, pivots, node_filter=None, min_clades=10, verbose=0, pc=0.0001, **kwargs)¶ Bases:
object
class that estimates frequencies for nodes in the tree. each internal node is assumed to be named with an attribute clade, of root doesn’t have such an attribute, clades will be numbered in preorder. Each node is assumed to have an attribute attr with a key “num_date”.
-
calc_confidence
()¶ for each frequency trajectory, calculate the bernouilli sampling error – in reality we should look at the inverse hessian, but this is a useful approximation in most cases
- Returns
dictionary with estimated confidence intervals
- Return type
dict
-
estimate_clade_frequencies
()¶
-
prepare
()¶
-