format-datesο
Format date fields to ISO 8601 dates (YYYY-MM-DD), where incomplete dates are masked with βXXβ (e.g. 2023 -> 2023-XX-XX).
usage: augur curate format-dates [-h] [--metadata METADATA]
[--id-column ID_COLUMN]
[--metadata-delimiters METADATA_DELIMITERS [METADATA_DELIMITERS ...]]
[--fasta FASTA]
[--seq-id-column SEQ_ID_COLUMN]
[--seq-field SEQ_FIELD]
[--unmatched-reporting {error_first,error_all,warn,silent}]
[--duplicate-reporting {error_first,error_all,warn,silent}]
[--output-metadata OUTPUT_METADATA]
[--output-fasta OUTPUT_FASTA]
[--output-id-field OUTPUT_ID_FIELD]
[--output-seq-field OUTPUT_SEQ_FIELD]
[--date-fields DATE_FIELDS [DATE_FIELDS ...]]
[--expected-date-formats EXPECTED_DATE_FORMATS [EXPECTED_DATE_FORMATS ...]]
[--failure-reporting {error_first,error_all,warn,silent}]
[--no-mask-failure]
INPUTSο
Input options shared by all augur curate commands. If no input options are provided, commands will try to read NDJSON records from stdin.
- --metadata
Input metadata file. Accepts β-β to read metadata from stdin.
- --id-column
Name of the metadata column that contains the record identifier for reporting duplicate records. Uses the first column of the metadata file if not provided. Ignored if also providing a FASTA file input.
- --metadata-delimiters
Delimiters to accept when reading a metadata file. Only one delimiter will be inferred.
Default: (β,β, βtβ)
- --fasta
Plain or gzipped FASTA file. Headers can only contain the sequence id used to match a metadata record. Note that an index file will be generated for the FASTA file as <filename>.fasta.fxi
- --seq-id-column
Name of metadata column that contains the sequence id to match sequences in the FASTA file.
- --seq-field
The name to use for the sequence field when joining sequences from a FASTA file.
- --unmatched-reporting
Possible choices: error_first, error_all, warn, silent
How unmatched records from combined metadata/FASTA input should be reported.
Default: error_first
- --duplicate-reporting
Possible choices: error_first, error_all, warn, silent
How should duplicate records be reported.
Default: error_first
OUTPUTSο
Output options shared by all augur curate commands. If no output options are provided, commands will output NDJSON records to stdout.
- --output-metadata
Output metadata TSV file. Accepts β-β to output TSV to stdout.
- --output-fasta
Output FASTA file.
- --output-id-field
The record field to use as the sequence identifier in the FASTA output.
- --output-seq-field
The record field that contains the sequence for the FASTA output. This field will be deleted from the metadata output.
REQUIREDο
- --date-fields
List of date field names in the record that need to be standardized.
- --expected-date-formats
Expected date formats that are currently in the provided date fields, defined by standard format codes as listed at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. If a date string matches multiple formats, it will be parsed as the first matched format in the provided order.
OPTIONALο
- --failure-reporting
Possible choices: error_first, error_all, warn, silent
How should failed date formatting be reported.
Default: error_first
- --no-mask-failure
Do not mask dates with βXXXX-XX-XXβ and return original date string if date formatting failed. (default: False)
Default: True