augur.ancestral

Infer ancestral sequences based on a tree.

The ancestral sequences are inferred using TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node then gets assigned a list of nucleotide mutations for any position that has a mismatch between its own sequence and its parent’s sequence. The node sequences and mutations are output to a node-data JSON file.

Note

The mutation positions in the node-data JSON are one-based.

augur.ancestral.ancestral_sequence_inference(tree=None, aln=None, ref=None, infer_gtr=True, marginal=False, fill_overhangs=True, infer_tips=False, alphabet='nuc')

infer ancestral sequences using TreeTime

Parameters
  • tree (Bio.Phylo.BaseTree.Tree or str) – tree or filename of tree

  • aln (Bio.Align.MultipleSeqAlignment or str) – alignment or filename of alignment

  • ref (str, optional) – reference sequence to pass to TreeTime’s TreeAnc class

  • infer_gtr (bool, optional) – Description

  • marginal (bool, optional) – Description

  • fill_overhangs (bool) – In some cases, the missing data on both ends of the alignment is filled with the gap character (‘-‘). If set to True, these end-gaps are converted to “ambiguous” characters (‘N’ for nucleotides, ‘X’ for aminoacids). Otherwise, the alignment is treated as-is

  • infer_tips (bool) – Since v0.7, TreeTime does not reconstruct tip states by default. This is only relevant when tip-state are not exactly specified, e.g. via characters that signify ambiguous states. To replace those with the most-likely state, set infer_tips=True

  • alphabet (str) – alphabet to use for ancestral sequence inference. Default is the nucleotide alphabet that included a gap character ‘nuc’. Alternative is aa for amino acids.

Returns

treetime.TreeAnc instance

Return type

treetime.TreeAnc

augur.ancestral.collect_mutations_and_sequences(tt, infer_tips=False, full_sequences=False, character_map=None, is_vcf=False)

iterates of the tree and produces dictionaries with mutations and sequences for each node.

Parameters
  • tt (treetime.TreeTime) – instance of treetime with valid ancestral reconstruction

  • infer_tips (bool, optional) – if true, request the reconstructed tip sequences from treetime, otherwise retain input ambiguities

  • full_sequences (bool, optional) – if true, add the full sequences

  • character_map (None, optional) – optional dictionary to map characters to a custom set.

Returns

dictionary of mutations and sequences

Return type

dict

augur.ancestral.register_parser(parent_subparsers)
augur.ancestral.run(args)
augur.ancestral.run_ancestral(T, aln, root_sequence=None, is_vcf=False, full_sequences=False, fill_overhangs=False, infer_ambiguous=False, marginal=False, alphabet='nuc')