augur.parse module

Parse delimited fields from FASTA sequence names into a TSV and FASTA file.

augur.parse.fix_dates(d, dayfirst=True)

attempt to parse a date string using pandas date parser. If ambiguous, the argument ‘dayfirst’ determines whether month or day is assumed to be the first field. Incomplete dates will be padded with XX. On failure to parse the date, the function will return the input.

augur.parse.parse_sequence(sequence, fields, strain_key='strain', separator='|', prettify_fields=None, fix_dates=None)

Parse a single sequence record into a sequence record and associated metadata.

  • sequence (Bio.SeqRecord.SeqRecord) – a BioPython sequence record to parse with metadata stored in its description field.

  • fields (list or tuple) – a list of names for fields expected in the given record’s description.

  • strain_key (str) – name of the field to use as the given sequence’s unique id

  • separator (str) – delimiter to split record description by.

  • prettify_fields (list or tuple) – a list of field names for which the values in those fields should be prettified.

  • fix_dates (str) – parse “date” field into the requested canonical format (“dayfirst” or “monthfirst”).


  • Bio.SeqRecord.SeqRecord – a BioPython sequence record with the given sequence’s name as the record id and all other metadata stripped.

  • dict – metadata associated with the given record indexed by the given field names.

augur.parse.prettify(x, trim=0, camelCase=False, etal=None, removeComma=False)

parse a fasta file and turn information in the header into a tsv or csv file.