augur.titer_model module¶

exception augur.titer_model.InsufficientDataException¶: Bases: Exception

class augur.titer_model.SubstitutionModel(alignments, titers, *args, **kwargs)¶

Bases: augur.titer_model.TiterModel

substitution_model extends titers and implements a model that seeks to describe titer differences by sums of contributions of substitions separating the test and reference viruses. Sequences are assumed to be attached to each terminal node in the tree as node.translations

annotate_tree(tree)¶

Annotates antigenic advance attributes to nodes of a given tree built from the same sequences used to train the model.

Parameters: tree (Bio.Phylo) –
Returns: input tree instance with nodes annotated by per-branch and cumulative antigenic advance attributes dTiterSub and cTiterSub
Return type: Bio.Phylo

collapse_colinear_mutations(colin_thres)¶

find colinear columns of the design matrix, collapse them into clusters

Parameters: colin_thres (TYPE) – Description

compile_substitution_effects(cutoff=0.0001)¶

compile a flat json of substitution effects for visualization, prune mutation without effect

Parameters: cutoff (float, optional) – Description
Returns: Description
Return type: TYPE

determine_relevant_mutations(min_count=10)¶

get_mutations(strain1, strain2)¶

return amino acid mutations between viruses specified by strain names as tuples (HA1, F159S)

Parameters

strain1 (TYPE) – Description
strain2 (TYPE) – Description

Returns

Description

Return type

TYPE

make_seqgraph(colin_thres=5)¶

code amino acid differences between sequences into a matrix the matrix has dimensions #measurements x #observed mutations

Parameters: colin_thres (int, optional) – Description

predict_titer(virus, serum, cutoff=0.0)¶

prepare(**kwargs)¶

train(**kwargs)¶

determine the model parameters. the result will be stored in self.substitution_effect

Parameters: **kwargs – Description

class augur.titer_model.TiterCollection(titers, **kwargs)¶

Bases: object

Container for raw titer values and methods for analyzing these values.

static count_strains(titers)¶

Count test and reference virus strains in the given titers.

Parameters

titers (defaultdict) – titer measurements indexed by test, reference, and serum

Returns

dict – number of measurements per strain
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> titer_counts = TiterCollection.count_strains(measurements)
>>> titer_counts[“A/Acores/11/2013”]
6
>>> titer_counts[“A/Acores/SU43/2012”]
3
>>> titer_counts[“A/Cairo/63/2012”]
2

determine_autologous_titers()¶: scan the titer measurements for autologous (self) titers and make a dictionary stored in self to look them up later. If no autologous titer is found, use the maximum titer. This follows the rationale that test titers are generally lower than autologous titers and the highest test titer is often a reasonably approximation of the autologous titer.

static filter_strains(titers, strains)¶

Filter the given titers to only include values from the given strains (test or reference).

Parameters

titers (dict) – titer values indexed by test and reference strain and serum
strains (list) – names of strains to keep titers for

Returns

dict – reduced dictionary of titer measurements containing only those were test and reference virus are part of the strain list
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> len(measurements)
11
Test the case when a test strain exists in the subset but the none of
its corresponding reference strains do.
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”]))
0
Test when both the test and reference strains exist in the subset.
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”, “A/Alabama/5/2010”, “A/Athens/112/2012”]))
2
>>> len(TiterCollection.filter_strains(measurements, [“A/Acores/11/2013”, “A/Acores/SU43/2012”, “A/Alabama/5/2010”, “A/Athens/112/2012”]))
3
>>> len(TiterCollection.filter_strains(measurements, []))
0

static load_from_file(filenames, excluded_sources=None)¶

Load titers from a tab-delimited file.

Parameters

filename (str) – tab-delimited file containing titer strains, serum, and values
excluded_sources (list of str) – sources in the titers file to exclude

Returns

tuple (dict, list, list) – tuple of a dict of titer measurements, list of strains, list of sources
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”)
>>> type(measurements)
<class ‘dict’>
>>> len(measurements)
11
>>> measurements[(“A/Acores/11/2013”, (“A/Alabama/5/2010”, “F27/10”))]
[80.0]
>>> len(strains)
13
>>> len(sources)
5
>>> measurements, strains, sources = TiterCollection.load_from_file(“tests/data/titer_model/h3n2_titers_subset.tsv”, excluded_sources=[“NIMR_Sep2013_7-11.csv”])
>>> len(measurements)
5
>>> measurements.get((“A/Acores/11/2013”, (“A/Alabama/5/2010”, “F27/10”)))
>>>
>>> output = TiterCollection.load_from_file(“tests/data/titer_model/missing.tsv”)
Traceback (most recent call last) –

File “<ipython-input-2-0ea96a90d45d>”, line 1, in <module>
open(“tests/data/titer_model/missing.tsv”, “r”)
FileNotFoundError ([Errno 2] No such file or directory: ‘tests/data/titer_model/missing.tsv’)

normalize(ref, val)¶

take the log2 difference of test titers and autologous titers

Parameters

ref (TYPE) – Description
val (TYPE) – Description

Returns

Description

Return type

TYPE

normalize_titers()¶: convert the titer measurements into the log2 difference between the average titer measured between test virus and reference serum and the average homologous titer. all measurements relative to sera without homologous titer are excluded

read_titers(fname)¶

strain_census(titers)¶

make lists of reference viruses, test viruses and sera (there are often multiple sera per reference virus)

>>> measurements, strains, sources = TiterCollection.load_from_file("tests/data/titer_model/h3n2_titers_subset.tsv")
>>> titers = TiterCollection(measurements)
>>> sera, ref_strains, test_strains = titers.strain_census(measurements)
>>> len(sera)
9
>>> len(ref_strains)
9
>>> len(test_strains)
13

Parameters: titers (TYPE) – Description
Returns: Description
Return type: TYPE

class augur.titer_model.TiterModel(serum_Kc=0, **kwargs)¶

Bases: object

this class fits a linear model to titer measurements using different models that describe titer differences in a parsimonious way. Two additive models are currently implemented, the tree and the substitution model. The tree model describes titer drops as a sum of terms associated with branches in the tree, while the substitution model attributes titer drops to amino acid mutations. More details on the methods can be found in Neher et al, PNAS, 2016

assign_titers(titers, strains)¶

compile_potencies()¶

compile a json structure containing potencies for visualization we need rapid access to all sera for a given reference virus, hence the structure is organized by [ref][serum]

Returns: Description
Return type: TYPE

compile_titers()¶

compiles titer measurements into a json file organized by reference virus during visualization, we need the average distance of a test virus from a reference virus across sera. hence the hierarchy [ref][test][serum] NOTE: this uses node.name instead of node.clade

Returns: Description
Return type: TYPE

compile_virus_effects()¶

compile a json structure containing virus_effects for visualization

Returns: Description
Return type: TYPE

fit_func()¶

fit_l1reg()¶

regularize genetic parameters with an l1 norm regardless of sign

Returns: Description
Return type: TYPE

fit_nnl1reg()¶

l1 regularization of titer drops with non-negativity constraints

Returns: Description
Return type: TYPE

fit_nnl2reg()¶

fit_nnls()¶

make_training_set(training_fraction=1.0, subset_strains=False, **kwargs)¶

reference_virus_statistic()¶: count measurements for every reference virus and serum

titer_stats()¶

validate(plot=False, cutoff=0.0, validation_set=None, fname=None)¶

predict titers of the validation set (separate set of test_titers aside previously) and compare against known values. If requested by plot=True, a figure comparing predicted and measured titers is produced

Compute basic error metrics for actual vs. predicted titer values. Return a dictionary of {‘metric’: computed_metric, ‘values’: [(actual, predicted), …]}, save a copy in self.validation

Parameters

plot (bool, optional) – Description
cutoff (float, optional) – Description
validation_set (None, optional) – Description
fname (None, optional) – Description

Returns

Description

Return type

TYPE

class augur.titer_model.TreeModel(tree, titers, *args, **kwargs)¶

Bases: augur.titer_model.TiterModel

tree_model extends titers and fits the antigenic differences in terms of contributions on the branches of the phylogenetic tree. nodes in the tree are decorated with attributes ‘dTiter’ that contain the estimated titer drops across the branch

cross_validate(n, **kwargs)¶

For each of n iterations, randomly re-allocate titers to training and test set. Fit the model using training titers, assess performance using test titers (see TiterModel.validate) Append dictionaries of {‘abs_error’: , ‘rms_error’: , ‘values’: [(actual, predicted), …], etc.} for each iteration to the model_performance list. Return model_performance, and save a copy in self.cross_validation

Parameters

n (TYPE) – Description
**kwargs – Description

Returns

Description

Return type

TYPE

find_titer_splits(criterium=None)¶

walk through the tree, mark all branches that are to be included as model variables

no terminals
criterium: callable that can be used to exclude branches e.g. if
amino acid mutations map to this branch.

Parameters: criterium (None, optional) – Description

get_path_no_terminals(v1, v2)¶

returns the path between two tips in the tree excluding the terminal branches.

Parameters

v1 (TYPE) – Description
v2 (TYPE) – Description

Returns

Description

Return type

TYPE

make_treegraph()¶: code the path between serum and test virus of each HI measurement into a matrix the matrix has dimensions #measurements x #tree branches with HI info if the path between test and serum goes through a branch, the corresponding matrix element is 1, 0 otherwise

predict_titer(virus, serum, cutoff=0.0)¶

prepare(**kwargs)¶

prepare_tree(tree)¶

train(**kwargs)¶