Augur: A bioinformatics toolkit for phylogenetic analysis

One held to foretell events by omens. (Merriam-Webster)

Augur is a bioinformatics toolkit to track evolution from sequence and serological data. It provides a collection of commands which are designed to be composable into larger processing pipelines. Augur originated as part of Nextstrain, an open-source project to harness the scientific and public health potential of pathogen genome data.

Note

We have just released version 6 of augur – check our upgrading guide

Augur is composed of a series of modules and different workflows will use different parts of the pipeline. A selection of augur modules and different possible entry points are illustrated below.

_images/augur_analysis_sketch.png

The canonical pipeline would ingest sequences and metadata such as dates and sampling locations, filter the data, align the sequences, infer a tree, and export the results in a format that can be visualized by auspice.

In some cases, you might start with a manually curated alignment and want to start the workflow at the tree building step. Or you already have a tree inferred. In this case, you only need to feed you tree through the refine and export steps. The refine step is necessary to ensure that cross-referencing between tree nodes and meta data works as expected.

The different augur modules can be strung together by workflow managers like snakemake and nextflow. The nextstrain team uses snakemake to run and manage the different analysis that you see on nextstrain.org.