augur.ancestral

Infer ancestral sequences based on a tree.

The ancestral sequences are inferred using TreeTime. Each internal node gets assigned a nucleotide sequence that maximizes a likelihood on the tree given its descendants and its parent node. Each node then gets assigned a list of nucleotide mutations for any position that has a mismatch between its own sequence and its parent’s sequence. The node sequences and mutations are output to a node-data JSON file.

If amino acid options are provided, the ancestral amino acid sequences for each requested gene are inferred with the same method as the nucleotide sequences described above. The inferred amino acid mutations will be included in the output node-data JSON file, with the format equivalent to the output of augur translate.

The nucleotide and amino acid sequences are inferred separately in this command, which can potentially result in mismatches between the nucleotide and amino acid mutations. If you want amino acid mutations based on the inferred nucleotide sequences, please use augur translate.

Note

The mutation positions in the node-data JSON are one-based.

augur.ancestral.ancestral_sequence_inference(tree=None, aln=None, ref=None, infer_gtr=True, marginal=False, fill_overhangs=True, infer_tips=False, alphabet='nuc')

infer ancestral sequences using TreeTime

Parameters
  • tree (Bio.Phylo.BaseTree.Tree or str) – tree or filename of tree

  • aln (Bio.Align.MultipleSeqAlignment or str) – alignment or filename of alignment

  • ref (str, optional) – reference sequence to pass to TreeTime’s TreeAnc class

  • infer_gtr (bool, optional) – Description

  • marginal (bool, optional) – Description

  • fill_overhangs (bool) – In some cases, the missing data on both ends of the alignment is filled with the gap character (‘-‘). If set to True, these end-gaps are converted to “ambiguous” characters (‘N’ for nucleotides, ‘X’ for aminoacids). Otherwise, the alignment is treated as-is

  • infer_tips (bool) – Since v0.7, TreeTime does not reconstruct tip states by default. This is only relevant when tip-state are not exactly specified, e.g. via characters that signify ambiguous states. To replace those with the most-likely state, set infer_tips=True

  • alphabet (str) – alphabet to use for ancestral sequence inference. Default is the nucleotide alphabet that included a gap character ‘nuc’. Alternative is aa for amino acids.

Returns

treetime.TreeAnc instance

Return type

treetime.TreeAnc

augur.ancestral.collect_mutations_and_sequences(tt, infer_tips=False, full_sequences=False, character_map=None, is_vcf=False)

iterates of the tree and produces dictionaries with mutations and sequences for each node.

Parameters
  • tt (treetime.TreeTime) – instance of treetime with valid ancestral reconstruction

  • infer_tips (bool, optional) – if true, request the reconstructed tip sequences from treetime, otherwise retain input ambiguities

  • full_sequences (bool, optional) – if true, add the full sequences

  • character_map (None, optional) – optional dictionary to map characters to a custom set.

Returns

dictionary of mutations and sequences

Return type

dict

augur.ancestral.register_parser(parent_subparsers)
augur.ancestral.run(args)
augur.ancestral.run_ancestral(T, aln, root_sequence=None, is_vcf=False, full_sequences=False, fill_overhangs=False, infer_ambiguous=False, marginal=False, alphabet='nuc')