Augur command

All of Augur’s commands are accessed through the augur program. For example, to infer ancestral sequences from a tree, you’d run augur ancestral. Each command is documented below. You can also run each command with the --help option, for example augur tree --help, for more information at the command-line.

usage: augur [-h]



Parse delimited fields from FASTA sequence names into a TSV and FASTA file.

augur parse [-h] --sequences SEQUENCES [--output-sequences OUTPUT_SEQUENCES]
            [--output-metadata OUTPUT_METADATA] [--fields FIELDS [FIELDS ...]]
            [--separator SEPARATOR] [--fix-dates {dayfirst,monthfirst}]

Named Arguments

--sequences, -s

sequences in fasta or VCF format


output sequences file


output metadata file


fields in fasta header


separator of fasta header

Default: “|”


Possible choices: dayfirst, monthfirst

attempt to parse non-standard dates and output them in standard YYYY-MM-DD format


Filter and subsample a sequence set.

augur filter [-h] --sequences SEQUENCES --metadata METADATA
             [--min-date MIN_DATE] [--max-date MAX_DATE]
             [--min-length MIN_LENGTH] [--non-nucleotide] [--exclude EXCLUDE]
             [--include INCLUDE] [--priority PRIORITY]
             [--sequences-per-group SEQUENCES_PER_GROUP]
             [--group-by GROUP_BY [GROUP_BY ...]]
             [--exclude-where EXCLUDE_WHERE [EXCLUDE_WHERE ...]]
             [--include-where INCLUDE_WHERE [INCLUDE_WHERE ...]] --output

Named Arguments

--sequences, -s

sequences in fasta or VCF format


metadata associated with sequences


minimal cutoff for numerical date


maximal cutoff for numerical date


minimal length of the sequences


exclude sequences that contain illegal characters

Default: False


file with list of strains that are to be excluded


file with list of strains that are to be included regardless of priorities or subsampling


file with list priority scores for sequences (strain priority)


subsample to no more than this number of sequences per category


categories with respect to subsample; two virtual fields, “month” and “year”, are supported if they don’t already exist as real fields but a “date” field does exist


Exclude samples matching these conditions. Ex: “host=rat” or “host!=rat”. Multiple values are processed as OR (matching any of those specified will be excluded), not AND


Include samples with these values. ex: host=rat. Multiple values are processed as OR (having any of those specified will be included), not AND. This rule is applied last and ensures any sequences matching these rules will be included.

--output, -o

output file


Mask specified sites from a VCF file.

augur mask [-h] --sequences SEQUENCES --mask MASK [--output OUTPUT]

Named Arguments

--sequences, -s

sequences in VCF format


locations to be masked in BED file format

--output, -o

output file


Align multiple sequences from FASTA.

augur align [-h] --sequences SEQUENCES [--output OUTPUT] [--nthreads NTHREADS]
            [--method {mafft}] [--reference-name REFERENCE_NAME]
            [--reference-sequence REFERENCE_SEQUENCE] [--remove-reference]

Named Arguments

--sequences, -s

sequences in fasta or VCF format

--output, -o

output file


number of threads to use; specifying the value ‘auto’ will cause the number of available CPU cores on your system, if determinable, to be used

Default: 1


Possible choices: mafft

alignment program to use

Default: “mafft”


strip insertions relative to reference sequence; use if the reference is already in the input sequences


strip insertions relative to reference sequence; use if the reference is NOT already in the input sequences


remove reference sequence from the alignment

Default: False


if gaps represent missing data rather than true indels, replace by N after aligning

Default: False


Build a tree using a variety of methods.

augur tree [-h] --alignment ALIGNMENT [--method {fasttree,raxml,iqtree}]
           [--output OUTPUT]
           [--substitution-model {HKY,GTR,HKY+G,GTR+G,GTR+R10}]
           [--nthreads NTHREADS] [--vcf-reference VCF_REFERENCE]
           [--exclude-sites EXCLUDE_SITES]
           [--tree-builder-args TREE_BUILDER_ARGS]

Named Arguments

--alignment, -a

alignment in fasta or VCF format


Possible choices: fasttree, raxml, iqtree

tree builder to use

Default: “iqtree”

--output, -o

file name to write tree to


Possible choices: HKY, GTR, HKY+G, GTR+G, GTR+R10

substitution model to use. Specify ‘none’ to run ModelTest. Currently, only available for IQTREE.

Default: “GTR”


number of threads to use; specifying the value ‘auto’ will cause the number of available CPU cores on your system, if determinable, to be used

Default: 1


fasta file of the sequence the VCF was mapped to


file name of one-based sites to exclude for raw tree building (BED format in .bed files, DRM format in tab-delimited files, or one position per line)


extra arguments to be passed directly to the executable of the requested tree method (e.g., –tree-builder-args=”-czb”)

Default: “”


Refine an initial tree using sequence metadata.

augur refine [-h] [--alignment ALIGNMENT] --tree TREE [--metadata METADATA]
             [--output-tree OUTPUT_TREE] [--output-node-data OUTPUT_NODE_DATA]
             [--timetree] [--coalescent COALESCENT] [--clock-rate CLOCK_RATE]
             [--clock-std-dev CLOCK_STD_DEV] [--root ROOT [ROOT ...]]
             [--keep-root] [--covariance] [--no-covariance]
             [--keep-polytomies] [--date-format DATE_FORMAT]
             [--date-confidence] [--date-inference {joint,marginal}]
             [--branch-length-inference {auto,joint,marginal,input}]
             [--clock-filter-iqd CLOCK_FILTER_IQD]
             [--vcf-reference VCF_REFERENCE]
             [--year-bounds YEAR_BOUNDS [YEAR_BOUNDS ...]]

Named Arguments

--alignment, -a

alignment in fasta or VCF format

--tree, -t

prebuilt Newick


tsv/csv table with meta data for sequences


file name to write tree to


file name to write branch lengths as node data


produce timetree using treetime

Default: False


coalescent time scale in units of inverse clock rate (float), optimize as scalar (‘opt’), or skyline (‘skyline’)


fixed clock rate


standard deviation of the fixed clock_rate estimate


rooting mechanism (‘best’, least-squares’, ‘min_dev’, ‘oldest’) OR node to root by OR two nodes indicating a monophyletic group to root by. Run treetime -h for definitions of rooting methods.

Default: “best”


do not reroot the tree; use it as-is. Overrides anything specified by –root.

Default: False


Account for covariation when estimating rates and/or rerooting. Use –no-covariance to turn off.

Default: True


Default: True


Do not attempt to resolve polytomies

Default: False


date format

Default: “%Y-%m-%d”


calculate confidence intervals for node dates

Default: False


Possible choices: joint, marginal

assign internal nodes to their marginally most likely dates, not jointly most likely

Default: “joint”


Possible choices: auto, joint, marginal, input

branch length mode of treetime to use

Default: “auto”


clock-filter: remove tips that deviate more than n_iqd interquartile ranges from the root-to-tip vs time regression


fasta file of the sequence the VCF was mapped to


specify min or max & min prediction bounds for samples with XX in year


Infer ancestral sequences based on a tree.

augur ancestral [-h] --tree TREE [--alignment ALIGNMENT] [--output OUTPUT]
                [--output-node-data OUTPUT_NODE_DATA]
                [--output-sequences OUTPUT_SEQUENCES]
                [--inference {joint,marginal}] [--vcf-reference VCF_REFERENCE]
                [--output-vcf OUTPUT_VCF] [--keep-ambiguous]

Named Arguments

--tree, -t

prebuilt Newick

--alignment, -a

alignment in fasta or VCF format

--output, -o

name of JSON file to save mutations and ancestral sequences to


name of JSON file to save mutations and ancestral sequences to


name of FASTA file to save ancestral sequences to (FASTA alignments only)


Possible choices: joint, marginal

calculate joint or marginal maximum likelihood ancestral sequence states

Default: “joint”


fasta file of the sequence the VCF was mapped to


name of output VCF file which will include ancestral seqs


do not infer nucleotides at ambiguous (N) sites on tip sequences (leave as N). Always true for VCF input.

Default: False


do not infer nucleotides for gaps (-) on either side of the alignment

Default: False


Translate gene regions from nucleotides to amino acids.

augur translate [-h] [--tree TREE] [--ancestral-sequences ANCESTRAL_SEQUENCES]
                --reference-sequence REFERENCE_SEQUENCE
                [--genes GENES [GENES ...]] [--output OUTPUT]
                [--alignment-output ALIGNMENT_OUTPUT]
                [--vcf-reference-output VCF_REFERENCE_OUTPUT]
                [--vcf-reference VCF_REFERENCE]

Named Arguments


prebuilt Newick – no tree will be built if provided


JSON (fasta input) or VCF (VCF input) containing ancestral and tip sequences


GenBank or GFF file containing the annotation


genes to translate (list or file containing list)


name of JSON files for aa mutations


write out translated gene alignments. If a VCF-input, a .vcf or .vcf.gz will be output here (depending on file ending). If fasta-input, specify the file name like so: ‘my_alignment_%GENE.fasta’, where ‘%GENE’ will be replaced by the name of the gene


fasta file where reference sequence translations for VCF input will be written


fasta file of the sequence the VCF was mapped to


Reconstruct alignments from mutations inferred on the tree

augur reconstruct-sequences [-h] --tree TREE [--gene GENE] --mutations
                            MUTATIONS [--vcf-aa-reference VCF_AA_REFERENCE]
                            [--internal-nodes] [--output OUTPUT]

Named Arguments


tree as Newick file


gene to translate (list or file containing list)


json file containing mutations mapped to each branch and the sequence of the root.


fasta file of the reference gene translations for VCF format


include sequences of internal nodes in output

Default: False



Assign clades to nodes in a tree based on amino-acid or nucleotide signatures.

augur clades [-h] [--tree TREE] [--mutations MUTATIONS [MUTATIONS ...]]
             [--reference REFERENCE [REFERENCE ...]] [--clades CLADES]
             [--output OUTPUT]

Named Arguments


prebuilt Newick – no tree will be built if provided


JSON(s) containing ancestral and tip nucleotide and/or amino-acid mutations


fasta files containing reference and tip nucleotide and/or amino-acid sequences


TSV file containing clade definitions by amino-acid


name of JSON files for clades


Infer ancestral traits based on a tree.

augur traits [-h] --tree TREE --metadata METADATA --columns COLUMNS
             [COLUMNS ...] [--confidence]
             [--sampling-bias-correction SAMPLING_BIAS_CORRECTION]
             [--output OUTPUT]

Named Arguments

--tree, -t

tree to perform trait reconstruction on


tsv/csv table with meta data


metadata fields to perform discrete reconstruction on


record the distribution of subleading mugration states

Default: False


a rough estimate of how many more events would have been observed if sequences represented an even sample. This should be roughly the (1-sum_i p_i^2)/(1-sum_i t_i^2), where p_i are the equilibrium frequencies and t_i are apparent ones.(or rather the time spent in a particular state on the tree)

--output, -o

Default: “traits.json”


Annotate sequences based on amino-acid or nucleotide signatures.

augur sequence-traits [-h] [--ancestral-sequences ANCESTRAL_SEQUENCES]
                      [--translations TRANSLATIONS]
                      [--vcf-reference VCF_REFERENCE]
                      [--vcf-translate-reference VCF_TRANSLATE_REFERENCE]
                      [--features FEATURES] [--count {traits,mutations}]
                      [--label LABEL] [--output OUTPUT]

Named Arguments


nucleotide alignment to search for sequence traits in


AA alignment to search for sequence traits in (can include ancestral sequences)


fasta file of the sequence the nucleotide VCF was mapped to


fasta file of the sequence the translated VCF was mapped to


file that specifies sites defining the features in a tab-delimited format: “GENE SITE ALT DISPLAY_NAME FEATURE”. For nucleotide sites, GENE can be “nuc” (or column excluded entirely for all-nuc sites). “DISPLAY_NAME” can be blank or excluded entirely.


Possible choices: traits, mutations

Whether to count traits (ex: # drugs resistant to) or mutations

Default: “traits”


How to label the counts (ex: Drug_Resistance)

Default: “# Traits”

--output, -o

output json with sequence features


Calculate LBI for a given tree and one or more sets of parameters.

augur lbi [-h] --tree TREE --branch-lengths BRANCH_LENGTHS --output OUTPUT
          --attribute-names ATTRIBUTE_NAMES [ATTRIBUTE_NAMES ...] --tau TAU
          [TAU ...] --window WINDOW [WINDOW ...]

Named Arguments


Newick tree


JSON with branch lengths and internal node dates estimated by TreeTime


JSON file with calculated distances stored by node name and attribute name


names to store distances associated with the corresponding masks


tau value(s) defining the neighborhood of each clade


time window(s) to calculate LBI across


Calculate the distance between sequences across entire genes or at a predefined subset of sites.

Distance calculations require selection of a comparison method (to determine which sequences to compare) and a distance map (to determine the weight of a mismatch between any two sequences).

Comparison methods

Comparison methods include:

  1. root: the root and all nodes in the tree (the previous default for all distances)

  2. ancestor: each tip from a current season and its immediate ancestor (optionally, from a previous season)

  3. pairwise: all tips pairwise (optionally, all tips from a current season against all tips in previous seasons)

Ancestor and pairwise comparisons can be calculated with or without information about the current season. When no dates are provided, the ancestor comparison calculates the distance between each tip and its immediate ancestor in the given tree. Similarly, the pairwise comparison calculates the distance between all pairs of tips in the tree.

When the user provides a “latest date”, all tips sampled after that date belong to the current season and all tips sampled on that date or prior belong to previous seasons. When this information is available, the ancestor comparison calculates the distance between each tip in the current season and its last ancestor from a previous season. The pairwise comparison only calculates the distances between tips in the current season and those from previous seasons.

When the user also provides an “earliest date”, pairwise comparisons exclude tips sampled from previous seasons prior to the given date. These two date parameters allow users to specify a fixed time interval for pairwise calculations, limiting the computationally complexity of the comparisons.

Distance maps

Distance maps are defined in JSON format with two required top-level keys. The default key specifies the numeric value (integer or float) to assign to all mismatches by default. The map key specifies a dictionary of weights to use for distance calculations. These weights are indexed hierarchically by gene name and one-based gene coordinate and are assigned in either a sequence-independent or sequence-dependent manner. The simplest possible distance map calculates Hamming distance between sequences without any site-specific weights, as shown below:

    "name": "Hamming distance",
    "default": 1,
    "map": {}

Sequence-independent distances are defined by gene and position using a numeric value of the same type as the default value (integer or float). The following example is a distance map for antigenic amino acid substitutions near influenza A/H3N2 HA’s receptor binding sites. This map calculates the Hamming distance between amino acid sequences only at seven positions in the HA1 gene:

    "name": "Koel epitope sites",
    "default": 0,
    "map": {
        "HA1": {
            "145": 1,
            "155": 1,
            "156": 1,
            "158": 1,
            "159": 1,
            "189": 1,
            "193": 1

Sequence-dependent distances are defined by gene, position, and sequence pairs where the from sequence in each pair is interpreted as the ancestral state and the to sequence as the derived state. The following example is a distance map that assigns asymmetric weights to specific amino acid substitutions at a specific position in the influenza gene HA1:

    "default": 0.0,
    "map": {
       "HA1": {
           "112": [
                   "from": "V",
                   "to": "I",
                   "weight": 1.192
                   "from": "I",
                   "to": "V",
                   "weight": 0.002

The distance command produces a JSON output file in standard “node data” format that can be passed to augur export. In addition to the standard nodes field, the JSON includes a params field that describes the mapping of attribute names to requested comparisons and distance maps and any date parameters specified by the user. The following example JSON shows a sample output when the distance command is run with multiple comparisons and distance maps:

    "params": {
        "attributes": ["ep", "ne", "ne_star", "ep_pairwise"],
        "compare_to": ["root", "root", "ancestor", "pairwise"],
        "map_name": [
        "latest_date": "2009-10-01"
    "nodes": {
        "A/Afghanistan/AF1171/2008": {
            "ep": 7,
            "ne": 6,
            "ne_star": 1,
            "ep_pairwise": {
                "A/Aichi/78/2007": 1,
                "A/Argentina/3509/2006": 2
augur distance [-h] --tree TREE --alignment ALIGNMENT [ALIGNMENT ...]
               --gene-names GENE_NAMES [GENE_NAMES ...] --attribute-name
               ATTRIBUTE_NAME [ATTRIBUTE_NAME ...] --compare-to
               {root,ancestor,pairwise} [{root,ancestor,pairwise} ...] --map
               MAP [MAP ...] [--date-annotations DATE_ANNOTATIONS]
               [--earliest-date EARLIEST_DATE] [--latest-date LATEST_DATE]
               --output OUTPUT

Named Arguments


Newick tree


sequence(s) to be used, supplied as FASTA files


names of the sequences in the alignment, same order assumed


name to store distances associated with the given distance map; multiple attribute names are linked to corresponding positional comparison method and distance map arguments


Possible choices: root, ancestor, pairwise

type of comparison between samples in the given tree including comparison of all nodes to the root (root), all tips to their last ancestor from a previous season (ancestor), or all tips from the current season to all tips in previous seasons (pairwise)


JSON providing the distance map between sites and, optionally, sequences present at those sites; the distance map JSON minimally requires a ‘default’ field defining a default numeric distance and a ‘map’ field defining a dictionary of genes and one-based coordinates


JSON of branch lengths and date annotations from augur refine for samples in the given tree; required for comparisons to earliest or latest date


earliest date at which samples are considered to be from previous seasons (e.g., 2019-01-01). This date is only used in pairwise comparisons. If omitted, all samples prior to the latest date will be considered.


latest date at which samples are considered to be from previous seasons (e.g., 2019-01-01); samples from any date after this are considered part of the current season


JSON file with calculated distances stored by node name and attribute name


Annotate a tree with actual and inferred titer measurements.

augur titers [-h] {tree,sub} ...



tree model

augur titers tree [-h] --titers TITERS [TITERS ...] --tree TREE
                  [--allow-empty-model] --output OUTPUT
Named Arguments

file with titer measurements

--tree, -t

tree to perform fit titer model to


allow model to be empty

Default: False

--output, -o

JSON file to save titer model


substitution model

augur titers sub [-h] --titers TITERS [TITERS ...] --alignment ALIGNMENT
                 [ALIGNMENT ...] --gene-names GENE_NAMES [GENE_NAMES ...]
                 [--tree TREE] [--allow-empty-model] --output OUTPUT
Named Arguments

file with titer measurements


sequence to be used in the substitution model, supplied as fasta files


names of the sequences in the alignment, same order assumed

--tree, -t

optional tree to annotate fit titer model to


allow model to be empty

Default: False

--output, -o

JSON file to save titer model


infer frequencies of mutations or clades

augur frequencies [-h] --method {diffusion,kde} --metadata METADATA
                  [--regions REGIONS [REGIONS ...]]
                  [--pivot-interval PIVOT_INTERVAL] [--min-date MIN_DATE]
                  [--max-date MAX_DATE] [--tree TREE]
                  [--minimal-clade-size MINIMAL_CLADE_SIZE]
                  [--alignments ALIGNMENTS [ALIGNMENTS ...]]
                  [--gene-names GENE_NAMES [GENE_NAMES ...]]
                  [--ignore-char IGNORE_CHAR]
                  [--minimal-frequency MINIMAL_FREQUENCY]
                  [--narrow-bandwidth NARROW_BANDWIDTH]
                  [--wide-bandwidth WIDE_BANDWIDTH]
                  [--proportion-wide PROPORTION_WIDE] [--weights WEIGHTS]
                  [--weights-attribute WEIGHTS_ATTRIBUTE] [--censored]
                  [--stiffness STIFFNESS] [--inertia INERTIA]
                  [--output-format {auspice,nextflu}] [--output OUTPUT]

Named Arguments


Possible choices: diffusion, kde

method by which frequencies should be estimated


tab-delimited metadata including dates for given samples


region to subsample to

Default: [‘global’]


number of months between pivots

Default: 3


minimal pivot value


maximal pivot value

--tree, -t

tree to estimate clade frequencies for


calculate frequencies for internal nodes as well as tips

Default: False


minimal size of a clade to have frequencies estimated

Default: 0


alignments to estimate mutations frequencies for


names of the sequences in the alignment, same order assumed


character to be ignored in frequency calculations

Default: “”


minimal all-time frequencies for a trajectory to be estimates

Default: 0.05


the bandwidth for the narrow KDE

Default: 0.08333333333333333


the bandwidth for the wide KDE

Default: 0.25


the proportion of the wide bandwidth to use in the KDE mixture model

Default: 0.2


a dictionary of key/value mappings in JSON format used to weight KDE tip frequencies


name of the attribute on each tip whose values map to the given weights dictionary


calculate censored frequencies at each pivot

Default: False


parameter penalizing curvature of the frequency trajectory

Default: 10.0


determines how frequencies continue in absense of data (inertia=0 -> go flat, inertia=1.0 -> continue current trend)

Default: 0.0


Possible choices: auspice, nextflu

format to export frequencies JSON depending on the viewing interface

Default: “auspice”

--output, -o

JSON file to save estimated frequencies to


Export JSON files suitable for visualization with auspice.

augur export [-h] --tree TREE --metadata METADATA [--reference REFERENCE]
             [--reference-translations REFERENCE_TRANSLATIONS] --node-data
             NODE_DATA [NODE_DATA ...] [--auspice-config AUSPICE_CONFIG]
             [--colors COLORS] [--lat-longs LAT_LONGS] [--new-schema]
             [--output-main OUTPUT_MAIN] [--output-tree OUTPUT_TREE]
             [--output-sequence OUTPUT_SEQUENCE] [--output-meta OUTPUT_META]
             [--title TITLE] [--maintainers MAINTAINERS [MAINTAINERS ...]]
             [--maintainer-urls MAINTAINER_URLS [MAINTAINER_URLS ...]]
             [--geography-traits GEOGRAPHY_TRAITS [GEOGRAPHY_TRAITS ...]]
             [--extra-traits EXTRA_TRAITS [EXTRA_TRAITS ...]]
             [--panels PANELS [PANELS ...]] [--minify-json]

Named Arguments

--tree, -t

tree to perform trait reconstruction on


tsv file with sequence meta data


reference sequence for export to browser, only vcf


reference translations for export to browser, only vcf


JSON files with meta data for each node


file with auspice configuration


file with color definitions


file latitudes and longitudes, overrides built in mappings


export JSONs using nexflu schema

Default: False


Main JSON file name that is passed on to auspice (e.g., zika.json).


JSON file name that is passed on to auspice (e.g., zika_tree.json). Only used with –nextflu-schema


JSON file name that is passed on to auspice (e.g., zika_seq.json). Only used with –nextflu-schema


JSON file name that is passed on to auspice (e.g., zika_meta.json). Only used with –nextflu-schema


Title to be displayed by auspice

Default: “Analysis”


Analysis maintained by

Default: [‘’]


URL of maintainers

Default: [‘’]


What location traits are used to plot on map


Metadata columns not run through ‘traits’ to be added to tree


What panels to display in auspice. Options are : xxx

Default: [‘tree’, ‘map’, ‘entropy’]


export JSONs without indentation or line returns

Default: False


Validate a set of JSON files intended for visualization in auspice.

augur validate [-h] --json JSON [JSON ...] [--new-schema]

Named Arguments


JSONs to validate


use nexflu JSON schema

Default: False


Print the version of augur.

augur version [-h]