11.3.0 (19 March 2021)¶
io: Add new
write_sequencesfunctions that support compressed inputs and outputs #652
parse, index, filter, mask: Add support for compressed inputs/outputs #652
export v2: Add optional
data_provenancefield to auspice JSON output for better provenance reporting in Auspice #705
11.2.0 (8 March 2021)¶
filter: Enable filtering by metadata only such that sequence inputs/outputs are optional and metadata/strain list outputs are now possible #679
filter: Enable extraction of sequences from multiple lists of strains with a new
--exclude-allflag and support for multiple inputs to the
11.1.2 (16 February 2021)¶
11.1.1 (16 February 2021)¶
11.1.0 (12 February 2021)¶
11.0.0 (22 January 2021)¶
filter: Use probabilistic sampling by default when requesting a maximum number of sequences to subsample with
--no-probabilistic-samplingflag to disable this default behavior and prevent users from requesting fewer maximum sequences than there are subsampling groups. #659
10.3.0 (14 January 2021)¶
10.2.0 (1 January 2021)¶
--probablistic-samplingflag to allow subsampling with
--subsample-max-sequenceswhen the number of groups exceeds the requested number of samples #629
scripts: Add script to identify emerging clades from existing Nextstrain build JSONs #653
docs: Add instructions to update conda installations prior to installing Augur #655
10.1.1 (16 November 2020)¶
10.1.0 (13 November 2020)¶
10.0.4 (6 November 2020)¶
10.0.3 (23 October 2020)¶
10.0.2 (8 September 2020)¶
10.0.1 (8 September 2020)¶
10.0.0 (17 August 2020)¶
Remove Snakemake as a dependency of the augur Python package #557
raises an exception when the requested color file is missing instead of printing a warning to stdout
splits out logic to parse colors file into separate classes (
util_support/color_parser_line.py) with unit tests
exits with a nonzero code when node data node names don’t match tree nodes and when the input tree cannot be loaded
refactors logic to read node data into separate classes with unit tests
ancestral: Fix docstring for
parse: Fix date parsing bug caused by a change in the API for
parse_time_stringin pandas 1.1.0 #601
refine: Enable divergence unit scaling without timetree e9b3eec
tree: Use IQ-TREE’s
-nt AUTOmode when users request more threads than there are input sequences, avoiding an IQ-TREE error #598
9.0.0 (29 June 2020)¶
align: The API to the
read_sequencesfunction now returns a list of sequences instead of a dictionary #536
align: Prevent duplicate strains warning when using
docs: Sync and deduplicate installation documentation from README to main docs #578
export: Flexibly disambiguate multiple publications by the same author #581
frequencies: Avoid interpolation of a single data point during frequency estimation with sparse data #569
parse: Actually remove commas during prettify when this behavior is requested #573
tests: Always use the local helper script (
bin/augur) to run tests instead of any globally installed augur executables #527
tree: Keep log files after trees are built #572
utils: Do not attempt to parse dates with only ambiguous months (e.g., 2020-XX-01) #532
namecolumn of metadata as a data field instead of a pandas DataFrame attribute #564
docs: Updates description of how missing data are handled by
filter: Add support for ISO 8601 dates (YYYY-MM-DD) for
tree: Allow VCF input without an
8.0.0 (8 June 2020)¶
utils: Add a consolidated generic
load_mask_sitesfunction and specific
read_bed_filefunctions for reading masking sites from files. Changes the Python API by moving mask-loading functionality out of augur mask and tree into utils #514 and #550
mask: Parse BED files as zero-indexed, half-open intervals #512
align: Report insertions stripped during alignment #449
Require minimum pandas version of 1.0.0 #488
parse: Reduce memory use and clarify code with standard Python idioms #496
mask: Allow masking of specific sites passed by the user with
--mask-sitesand masking of a fixed number of sites from the beginning or end of each sequence with
clades, import: Use
defaultdictto simplify code #533
tests: Add initial functional tests of the augur command line interface using Cram #542
refine: Add a
--seedargument to set the random seed for more reproducible outputs across runs #542
ancestral, refine, and traits: Print the version of TreeTime being used for these commands #552
filter: Add support for flexible pandas-style queries with new
export: Allow display defaults for transmission lines #561
7.0.2 (7 April 2020)¶
7.0.0 (7 April 2020)¶
improve testing by
align: reverse complement sequences when necessary using mafft’s autodirection flag #467
align: speed up replacement of gaps with “ambiguous” bases #474
mask: add support for FASTA input files #493
traits: bump TreeTime version to 0.7.4 and increase maximum number of unique traits allowed from 180 to 300 #495
align: enable filling gaps in input sequences even if no reference is provided instead of throwing an exception #466
align: detect duplicate sequences by comparing sequence objects instead of (often truncated) string representations of those objects #468
import_beast: use raw strings for regular expressions to avoid syntax errors in future versions of Python #469
scripts: update exception syntax to new style #484
filter: fail loudly when a given priority file is invalid and exit instead of just printing an error #487
6.4.3 (25 March 2020)¶
align: Remove reference sequence from alignments even when no gaps exist in input sequences relative to the reference. Thank you @danielsoneg! #456
6.4.2 (17 March 2020)¶
Require Snakemake less than 5.11 to avoid a breaking change. The
--coresargument is now required by 5.11, which will affect many existing augur-based workflows. Reported upstream as snakemake/snakemake#283.
align: Run mafft with the
--nomemsaveoption. This makes alignments of sequences over 10k in length run much, much faster in the general case and shouldn’t cause issues for most modern hardware. We may end up needing to add an off-switch for this mode if it causes issues for other users of augur, but the hope is that it will make things just magically run faster for most folks! There is likely more tuning that could be done with mafft, but this is a huge improvement in our testing. #458
align: Ignore blank lines in
--includefiles. Thanks @CameronDevine! #451
align: Properly quote filenames when invoking mafft. Thanks @CameronDevine! #452
6.4.1 (4 March 2020)¶
6.4.0 (26 February 2020)¶
align: New sequences can now be added to an existing alignment. #422
align: Multiple sequence files can be provided as input. #422
align: Extra debugging files such as
*.post_aligner.fastaare no longer produced by default. To request them, pass the
align: De-duplicate input sequences, with a warning. #422
export v2: Add support for the
display_defaults, which was recently added to Auspice. #445
align: Exits with an error earlier if arguments are invalid instead of only printing a warning. #422
align: Performs more error checking and clarifies the help and error messages. #422
export v2: Traits which are filters but not colorings are now exported as well, instead of being left out. #442
export v2: Exits non-zero when validation fails, instead of masking errors. #441
validate: In order to improve clarity, messages now include the filenames involved and distinguish between schema validation and internal consistency checks. #441
6.3.0 (13 February 2020)¶
ancestral: New options to either
--infer-ambiguous. If using
--infer-ambiguousthe previous behavior will be maintained in which tips with
Nwill have their nucleotide state inferred. If using
--keep-ambiguous, these tips will be left as
N. With this upgrade, we are still defaulting to
--infer-ambiguous, however, we plan to swap default to
--keep-ambiguousin the future. If this distintion matters to you, we would suggest that you explicitly record
--infer-ambiguousin your build process. Also part of PR 431
traits: Allow input of
--weightswhich references a
.tsvfile in the following format:
division Hubei 10.0 division Jiangxi 1.0 division Chongqing 1.0
where these weights represent equilibrium frequencies in the CTMC transition model. We imagine the primary use of user-specified weights to correct for strong sampling biases in available data. See PR 443
6.2.0 (25 January 2020)¶
--divergence-unitsoption to distinguish between
mutations-per-siteas default behavior. See PR 435
6.1.1 (17 December 2019)¶
6.1.0 (13 December 2019)¶
6.0.0 (10 December 2019)¶
Version 6 is a major release of augur affecting many augur commands. The format
of the exported JSON (v2) has changed and now merges the previously separate
files containing tree and meta information. To maintain backward compatibility,
the export command was split into
export v1 (old) and
export v2 (new).
Detailed release notes are provided in the augur documentation on
For a migration guide, consult
Major features / changes¶
export: Swap from a separate
_meta.jsonto a single “unified”
export: Include additional command line options to alleviate need for Auspice config
export: Include option for reference sequence output
export: Move to GFF-style annotations
export: Validate exported JSONs against schema
ancestral: Allow output of FASTA and JSON files
import beastcommand to import labeled BEAST MCC tree
--prettify-fieldsoption to cleanup metadata fields
Minor features / changes¶
colors.tsv: Allow whitespace, but insist on tab delimiting
lat_longs.tsv: Allow whitespace, but insist on tab delimiting
Remove code for old “non-modular” augur, old “non-modular” builds and Python tests
Improve test builds
filter: More interpretable output of how many sequences have been filtered
filter: Additional flag
--subsample-seedto seed the random number generator and thereby make subsampling reproducible
sequence-traits: Numerical output as originally intended, but required an Auspice bugfix
traits: Explanation of what is considered missing data & how it is interpreted
traits: GTR models are exported in the output JSON for better accountability & reproducibility
5.4.1 (12 November 2019)¶
5.4.0 (7 November 2019)¶
v1subcommand to allow forwards compatibiliy with Augur v6 builds. See PR 398
5.3.0 (9 September 2019)¶
export: Improve printing of error messages with missing or conflicting author data. See issue 274
filter: Improve printing of dropped strains to include reasons why strains were dropped. See PR 367
refine: Add support for command line flag
--keep-polytomiesto not resolve polytomies when producing a time tree. See PR 345
Small fixes in geographic coordinate file
5.2.1 (4 August 2019)¶
Significantly relax version requirements specified in setup.py for biopython, pandas, etc… Additionally, move lesser used packages (cvxopt, matplotlib, seaborn) into an “extras_require” field. This should reduce conflicts with other pip installed packages. See PR 323
5.2.0 (23 July 2019)¶
ancestral: Adds a new flag
--output-sequencesand logic to support saving ancestral sequences and leaves from the given tree to a FASTA file. Also adds a redundant, more specific flag
--output-node-datathat will replace the current
--outputflag in the next major version release of augur. For now, we issue a deprecation warning when the
--outputflag is used. Note that FASTA output is only allowed for FASTA inputs and not for VCFs. We don’t allow FASTA output for VCFs anywhere else and, if we did here, the output files would be very large. See PR 293
--method kdeflag to compute frequencies via KDE kernels. This complements existing method of
--method diffusion. Generally, KDE frequencies should be more robust and faster to run, but will not project as well when forecasting frequencies into the future. See PR 271
Document environment variables respected by Augur
Remove matplotlib and seaborn from
setup.pyinstall. These are still called a few places in augur (like
titers.validate()), but it was deemed rare enough that remove this from
setup.pywould ease general install for most users. Additionally, the ipdb debugger has been moved to dev dependencies. See PR 291
Refactor logic to read trees from multiple formats into a function. Adds a new function
utilsmodule that tries to safely handle reading trees in multiple input formats. See PR 310
5.1.1 (1 July 2019)¶
tree: Add support for the GTR+R10 substitution model.
tree: Support parentheses in node names when using IQ-TREE.
Use the center of the UK for its coordinates instead of London.
--outputrequired, which it always was but wasn’t marked.
filter: Avoid error when no excluded strains file is provided.
export: Fix for preliminary version 2 schema support.
refine: Correct error handling when the tree file is missing or empty.
Add examples of Augur usage in the wild.
Rename and reorganize CLI and Python API pages a little bit to make “where do I start learning to use Augur?” clearer to non-devs.
Relax version requirements of pandas and seaborn. The hope is this will make installation smoother (particularly alongside other packages which require newer pandas versions) while not encountering breaking changes in newer versions ourselves.
5.1.0 (29 May 2019)¶
Documentation is now available online for the augur CLI and Python API via Read The Docs: https://nextstrain-augur.readthedocs.io. The latest version on RTD points to the git master branch, and the stable version to the most recent tagged release. Instructions for building the docs locally are in the README.
5.0.0 (26 May 2019)¶
ancestral: New option to
--keep-ambiguous, which will not infer nucleotides at ambiguous (N) sites on tip sequences and instead leave as ‘N’ See PR 280.
ancestral: New option to
--keep-overhangs, which will not infer nucleotides for gaps on either side of the alignment and instead leave as ‘-‘. See PR 286.
clades: This module has been reconfigured to identify clade defining mutations on top of a reference rather than identifying mutations along the tree. The command line arguments are the same except for the addition of
--reference, which explicitly passes in a reference sequence. If
--referenceis not defined, then reference will be drawn from the root node of the phylogeny by looking for
sequenceattribute attached to root node of
--tree. See PR 288.
refine: Revise rooting behavior. Previously
--roottook ‘best’, ‘residual’, ‘rsq’ and ‘min_dev’ as options. In this update
--roottakes ‘best’, least-squares’, ‘min_dev’ and ‘oldest’ as rooting options. This eliminates ‘residual’ and ‘rsq’ as options. This is a backwards-incompatible change. This requires updating TreeTime to version 0.5.4 or above. See PR 263.
--keep-rootoption that overrides
--rootspecification to preserve tree rooting. See PR 263.
--no-covarianceoptions that specify TreeTime behavior. See PR 263.
titers: This command now throws an
InsufficientDataExceptionif there are not sufficient titers to infer a model. This is paired with a new
--allow-empty-modelflag that proceeds past the
InsufficientDataExceptionand writes out a model JSON corresponding to an ‘empty’ model. See PR 281.
By default JSONs are written with
index=1to give a pretty-printed JSON. However, this adds significant file size to large tree JSONs. If the environment variable
AUGUR_MINIFY_JSONis set then minified JSONs are printed instead. This mirror the explicit
--minify-jsonargument available to
augur export. See PR 278.
export: Cast numeric values to strings for export. See issue 287.
export: Legend order preserves ordering passed in by user for traits that have default colorings (‘country’ and ‘region’). See PR 284.
refine: Previously, the
--rootargument was silently ignored when no timetree was inferred. Re-rooting with an outgroup is sensible even without a timetree. See PR 282.
4.0.0 (24 April 2019)¶
distance: New interface for specifying distances between sequences. This is a backwards-incompatible change. Refer to
augur distance --helpfor all the details.
export: Add a
--minify-jsonflag to omit indentation in Auspice JSONs.
frequencies: Emit one-based coordinates (instead of zero-based) for KDE-based mutation frequencies
3.1.7 (5 February 2019)¶
Update to TreeTime 0.5.3
tree: Fix bug in printing causing errors in Python versions <3.6
tree: Alter site masking to not be so memory intensive
3.1.6 (29 January 2019)¶
filter: Allow negative matches to
--exclude-where. For example,
--exclude-where country!=usawould exclude all samples where metadata
countrydoes not equal
--exclude-sitesto work with FASTA input. Ensure that indexing of input sites is one-based.
fix loading of strains when loading titers from file, previously strains had not been filtered to match the tree appropriately
3.1.5 (13 January 2019)¶
titers: Allow multiple titer date files in
--non-nucleotidecall to include
?as allowed character.
--method raxmlto properly delimit interim RAxML output so that simultaneous builds don’t conflict.
3.1.4 (1 January 2019)¶
augur frequenciesoutput JSON to support downstream plotting.
3.1.3 (29 December 2018)¶
--non-nucleotideoption to remove sequences with non-conforming nucleotide characters.
Revise treatment of
augur parseto leave
-as is and remove white space. Also delimit
Fix bug in naming of temp IQTREE fixes to prevent conflicts from simultaneous builds.
3.1.1 (21 December 2018)¶
--include-where. Adds an
all_seqvariable needed by the logic to include records by value. This was previously working for VCF but threw an exception for sequences in FASTA format.
Update flu reference viruses and lat longs.
3.1.0 (18 December 2018)¶
augur reconstruct-sequencesmodule that reconstructs alignments from mutations inferred on the tree
augur distancemodule that calculates the distance between amino acid sequences across entire genes or at a predefined subset of sites
augur lbimodule that calculates local branching index (LBI) for a given tree and one or more sets of parameters.
--method kdeas option to
augur frequencies, separate from the existing
--method diffusionlogic. KDE frequencies are faster and better for smaller clades but don’t extrapolate as well as diffusion frequencies.
titers: Enable annotation of nodes in a tree from the substitution model
3.0.5.dev1 (26 November 2018)¶
translate: Nucleotide (“nuc”) annotation for non-bacterial builds starts at 0 again, not 1, fixing a regression.
Schemas: Correct coordinate system description for genome start/end annotations.
3.0.4.dev1 (26 November 2018)¶
validate: Fix regression for gene names containing an asterisk.
Fix Travis CI tests which were silently not running.
3.0.3.dev1 (26 November 2018)¶
refine: Add a
traits: Add a
--sampling-bias-correctionoption for mugration model
validate: Gene names in tree annotations may now contain hyphens. Compatible with Auspice version 1.33.0 and later.
All JSON is now emitted with sorted keys, making it easier to diff and run other textual comparisons against output.
filter: Only consider A, T, C, and G when calculating sequence length for the
filter: Allow comments in files passed to
filter: Ignore case when matching trait values against excluded values.
Normalize custom geographic names to lower case for consistent matching.
Fix typo in geographic entry for
Schemas: Reconcile naming patterns used in gene definitions and tree annotations.
Upgrade TreeTime dependency to 0.5.x and at least 0.5.1.
environment.ymlfile for use with
conda env create.
Stop testing under Python 2.7 on Travis CI.
3.0.1.dev1 (27 September 2018)¶
align and tree: The –nthreads option now accepts the special value “auto” to automatically set the number of threads to the number of CPU cores available.
tree: The –nthreads option is now respected. Previously all tree builders were ignoring the value and using either 2 threads (RAxML, IQ-TREE) or as many threads as cores (FastTree, if the OpenMP version).
translate: Check for and, if necessary pad, nucleotide sequences which aren’t a multiple of 3 earlier to avoid errors later.
export: Optionally write inferred nucleotide and amino acid sequences (or mutations) to a separate file.
export: Omit genes with no amino acid mutations.
validate: Allow underscores in gene names.
refine: Remove unused –nthreads argument.
ancestral, filter, tree, refine: Exit 1 instead of -1 on error.
Print the help message, instead of throwing an exception, when
auguris run without arguments.
Briefly describe each command in its
--helpoutput and in the global
Revamp README to emphasize new, modular augur and make it suitable for inclusion on PyPi.
Reconciled conflicting license declarations; augur is AGPLv3 (not MIT) licensed like the rest of Nextstrain.
Include URLs for bug reports, the change log, and the source on PyPi.
Geographic coordinates added for the Netherlands and the Philippines.
releasebranch when rewinding a failed local release process.
Refactor the augur program and command architecture for improved maintainability.
3.0.0.dev3 (4 September 2018)¶
Use an allowed Topic classifier so we can upload to PyPi
Ignore distribution egg-info build files
3.0.0.dev2 (4 September 2018)¶
Export: Add safety checks for optional annotations and geo data
Include more lat/longs in the default geo data
Add release tooling
Document the release process and a few development practices
Travis CI: Switch to rebuilding the Docker image only for new releases
Remove ebola, lassa, tb, WNV, and zika builds now in their own repos. These builds are now available at URLs like https://github.com/nextstrain/ebola, for example.