augur.titer_model module¶
-
exception
augur.titer_model.
InsufficientDataException
¶ Bases:
Exception
-
class
augur.titer_model.
SubstitutionModel
(alignments, titers, *args, **kwargs)¶ Bases:
augur.titer_model.TiterModel
substitution_model extends titers and implements a model that seeks to describe titer differences by sums of contributions of substitions separating the test and reference viruses. Sequences are assumed to be attached to each terminal node in the tree as node.translations
-
annotate_tree
(tree)¶ Annotates antigenic advance attributes to nodes of a given tree built from the same sequences used to train the model.
- Parameters
tree (Bio.Phylo) –
- Returns
input tree instance with nodes annotated by per-branch and cumulative antigenic advance attributes dTiterSub and cTiterSub
- Return type
Bio.Phylo
-
collapse_colinear_mutations
(colin_thres)¶ find colinear columns of the design matrix, collapse them into clusters
- Parameters
colin_thres (TYPE) – Description
-
compile_substitution_effects
(cutoff=0.0001)¶ compile a flat json of substitution effects for visualization, prune mutation without effect
- Parameters
cutoff (float, optional) – Description
- Returns
Description
- Return type
TYPE
-
determine_relevant_mutations
(min_count=10)¶
-
get_mutations
(strain1, strain2)¶ return amino acid mutations between viruses specified by strain names as tuples (HA1, F159S)
- Parameters
strain1 (TYPE) – Description
strain2 (TYPE) – Description
- Returns
Description
- Return type
TYPE
-
make_seqgraph
(colin_thres=5)¶ code amino acid differences between sequences into a matrix the matrix has dimensions #measurements x #observed mutations
- Parameters
colin_thres (int, optional) – Description
-
predict_titer
(virus, serum, cutoff=0.0)¶
-
prepare
(**kwargs)¶
-
train
(**kwargs)¶ determine the model parameters. the result will be stored in self.substitution_effect
- Parameters
**kwargs – Description
-
-
class
augur.titer_model.
TiterCollection
(titers, **kwargs)¶ Bases:
object
Container for raw titer values and methods for analyzing these values.
-
static
count_strains
(titers)¶ Count test and reference virus strains in the given titers.
- Parameters
titers (defaultdict) – titer measurements indexed by test, reference, and serum
- Returns
dict – number of measurements per strain
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> titer_counts = TiterCollection.count_strains(measurements)
>>> titer_counts[“A/Acores/11/2013”]
6
>>> titer_counts[“A/Acores/SU43/2012”]
3
>>> titer_counts[“A/Cairo/63/2012”]
2
-
determine_autologous_titers
()¶ scan the titer measurements for autologous (self) titers and make a dictionary stored in self to look them up later. If no autologous titer is found, use the maximum titer. This follows the rationale that test titers are generally lower than autologous titers and the highest test titer is often a reasonably approximation of the autologous titer.
-
static
filter_strains
(titers, strains)¶ Filter the given titers to only include values from the given strains (test or reference).
- Parameters
titers (dict) – titer values indexed by test and reference strain and serum
strains (list) – names of strains to keep titers for
- Returns
dict – reduced dictionary of titer measurements containing only those were test and reference virus are part of the strain list
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> len(measurements)
11
Test the case when a test strain exists in the subset but the none of
its corresponding reference strains do.
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”]))
0
Test when both the test and reference strains exist in the subset.
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”, “A/Alabama/5/2010”, “A/Athens/112/2012”]))
2
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”, “A/Acores/SU43/2012”, “A/Alabama/5/2010”, “A/Athens/112/2012”]))
3
>>> len(TiterCollection.filter_strains(measurements, []))
0
-
static
load_from_file
(filenames, excluded_sources=None)¶ Load titers from a tab-delimited file.
- Parameters
filename (str) – tab-delimited file containing titer strains, serum, and values
excluded_sources (list of str) – sources in the titers file to exclude
- Returns
tuple (dict, list, list) – tuple of a dict of titer measurements, list of strains, list of sources
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> type(measurements)
<class ‘dict’>
>>> len(measurements)
11
>>> measurements[(“A/Acores/11/2013”, (“A/Alabama/5/2010”, “F27/10”))]
[80.0]
>>> len(strains)
13
>>> len(sources)
5
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”, excluded_sources=[“NIMR_Sep2013_7-11.csv”])
>>> len(measurements)
5
>>> measurements.get((“A/Acores/11/2013”, (“A/Alabama/5/2010”, “F27/10”)))
>>>
>>> output = TiterCollection.load_from_file(“tests/data/titer_model/missing.tsv”)
Traceback (most recent call last) –
- File “<ipython-input-2-0ea96a90d45d>”, line 1, in <module>
open(“tests/data/titer_model/missing.tsv”, “r”)
FileNotFoundError ([Errno 2] No such file or directory: ‘tests/data/titer_model/missing.tsv’)
-
normalize
(ref, val)¶ take the log2 difference of test titers and autologous titers
- Parameters
ref (TYPE) – Description
val (TYPE) – Description
- Returns
Description
- Return type
TYPE
-
normalize_titers
()¶ convert the titer measurements into the log2 difference between the average titer measured between test virus and reference serum and the average homologous titer. all measurements relative to sera without homologous titer are excluded
-
read_titers
(fname)¶
-
strain_census
(titers)¶ make lists of reference viruses, test viruses and sera (there are often multiple sera per reference virus)
>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv") >>> titers = TiterCollection(measurements) >>> sera, ref_strains, test_strains = titers.strain_census(measurements) >>> len(sera) 9 >>> len(ref_strains) 9 >>> len(test_strains) 13
- Parameters
titers (TYPE) – Description
- Returns
Description
- Return type
TYPE
-
static
-
class
augur.titer_model.
TiterModel
(serum_Kc=0, **kwargs)¶ Bases:
object
this class fits a linear model to titer measurements using different models that describe titer differences in a parsimonious way. Two additive models are currently implemented, the tree and the substitution model. The tree model describes titer drops as a sum of terms associated with branches in the tree, while the substitution model attributes titer drops to amino acid mutations. More details on the methods can be found in Neher et al, PNAS, 2016
-
assign_titers
(titers, strains)¶
-
compile_potencies
()¶ compile a json structure containing potencies for visualization we need rapid access to all sera for a given reference virus, hence the structure is organized by [ref][serum]
- Returns
Description
- Return type
TYPE
-
compile_titers
()¶ compiles titer measurements into a json file organized by reference virus during visualization, we need the average distance of a test virus from a reference virus across sera. hence the hierarchy [ref][test][serum] NOTE: this uses node.name instead of node.clade
- Returns
Description
- Return type
TYPE
-
compile_virus_effects
()¶ compile a json structure containing virus_effects for visualization
- Returns
Description
- Return type
TYPE
-
fit_func
()¶
-
fit_l1reg
()¶ regularize genetic parameters with an l1 norm regardless of sign
- Returns
Description
- Return type
TYPE
-
fit_nnl1reg
()¶ l1 regularization of titer drops with non-negativity constraints
- Returns
Description
- Return type
TYPE
-
fit_nnl2reg
()¶
-
fit_nnls
()¶
-
make_training_set
(training_fraction=1.0, subset_strains=False, **kwargs)¶
-
reference_virus_statistic
()¶ count measurements for every reference virus and serum
-
titer_stats
()¶
-
validate
(plot=False, cutoff=0.0, validation_set=None, fname=None)¶ predict titers of the validation set (separate set of test_titers aside previously) and compare against known values. If requested by plot=True, a figure comparing predicted and measured titers is produced
Compute basic error metrics for actual vs. predicted titer values. Return a dictionary of {‘metric’: computed_metric, ‘values’: [(actual, predicted), …]}, save a copy in self.validation
- Parameters
plot (bool, optional) – Description
cutoff (float, optional) – Description
validation_set (None, optional) – Description
fname (None, optional) – Description
- Returns
Description
- Return type
TYPE
-
-
class
augur.titer_model.
TreeModel
(tree, titers, *args, **kwargs)¶ Bases:
augur.titer_model.TiterModel
tree_model extends titers and fits the antigenic differences in terms of contributions on the branches of the phylogenetic tree. nodes in the tree are decorated with attributes ‘dTiter’ that contain the estimated titer drops across the branch
-
cross_validate
(n, **kwargs)¶ For each of n iterations, randomly re-allocate titers to training and test set. Fit the model using training titers, assess performance using test titers (see TiterModel.validate) Append dictionaries of {‘abs_error’: , ‘rms_error’: , ‘values’: [(actual, predicted), …], etc.} for each iteration to the model_performance list. Return model_performance, and save a copy in self.cross_validation
- Parameters
n (TYPE) – Description
**kwargs – Description
- Returns
Description
- Return type
TYPE
-
find_titer_splits
(criterium=None)¶ - walk through the tree, mark all branches that are to be included as model variables
no terminals
- criterium: callable that can be used to exclude branches e.g. if
amino acid mutations map to this branch.
- Parameters
criterium (None, optional) – Description
-
get_path_no_terminals
(v1, v2)¶ returns the path between two tips in the tree excluding the terminal branches.
- Parameters
v1 (TYPE) – Description
v2 (TYPE) – Description
- Returns
Description
- Return type
TYPE
-
make_treegraph
()¶ code the path between serum and test virus of each HI measurement into a matrix the matrix has dimensions #measurements x #tree branches with HI info if the path between test and serum goes through a branch, the corresponding matrix element is 1, 0 otherwise
-
predict_titer
(virus, serum, cutoff=0.0)¶
-
prepare
(**kwargs)¶
-
prepare_tree
(tree)¶
-
train
(**kwargs)¶
-