augur.align module¶
Align multiple sequences from FASTA.
-
exception
augur.align.
AlignmentError
¶ Bases:
Exception
-
augur.align.
analyse_insertions
(aln, ungapped, insertion_csv)¶
-
augur.align.
check_arguments
(args)¶
-
augur.align.
check_duplicates
(*values)¶
-
augur.align.
ensure_reference_strain_present
(ref_name, existing_alignment, seqs)¶
-
augur.align.
generate_alignment_cmd
(method, nthreads, existing_aln_fname, seqs_to_align_fname, aln_fname, log_fname)¶
-
augur.align.
make_gaps_ambiguous
(aln)¶ replace all gaps by ‘N’ in all sequences in the alignment. TreeTime will treat them as fully ambiguous and replace then with the most likely state. This modifies the alignment in place.
- Parameters
aln (MultipleSeqAlign) – Biopython Alignment
-
augur.align.
postprocess
(output_file, ref_name, keep_reference, fill_gaps)¶ Postprocessing of the combined alignment file.
- Parameters
output_file (str) – The file the new alignment was written to
ref_name (str) – If provided, the name of the reference strain used in the alignment
keep_reference (bool) – If the reference was provided, whether it should be kept in the alignment
fill_gaps (bool) – Replace all gaps in the alignment with “N” to indicate ambiguous sites.
- Returns
- Return type
None - the modified alignment is written directly to output_file
-
augur.align.
prepare
(sequences, existing_aln_fname, output, ref_name, ref_seq_fname)¶ Prepare the sequences, existing alignment, and reference sequence for alignment.
- This function:
Combines all given input sequences into a single file
Checks to make sure the input sequences don’t overlap with the existing alignment, if one exists.
If given a reference name, check that sequence exists in either the existing alignment, if given, or the input sequences.
If given a reference sequence, either add it to the existing alignment or prepend it to the input seqeunces.
Write the input sequences to a single file, and write the alignment back out if we added the reference sequence to it.
- Parameters
sequences (list[str]) – List of paths to FASTA-formatted sequences to align.
existing_aln_fname (str) – Path of an existing alignment to use, or None
output (str) – Path the aligned sequences will be written out to.
ref_name (str) – The name of the reference sequence, if provided
ref_seq_fname (str) – The path to the reference sequence file. If this is provided, it overrides ref_name.
- Returns
tuple
- Return type
The existing alignment filename, the new sequences filename, and the name of the reference sequence.
-
augur.align.
prettify_alignment
(aln)¶ Converts all bases to uppercase and removes auto reverse-complement prefix (_R_). This modifies the alignment in place.
- Parameters
aln (MultipleSeqAlign) – Biopython Alignment
-
augur.align.
prune_seqs_matching_alignment
(seqs, aln)¶ Return a set of seqs excluding those already in the alignment & print a warning message for each sequence which is exluded.
-
augur.align.
read_alignment
(fname)¶
-
augur.align.
read_reference
(ref_fname)¶
-
augur.align.
read_sequences
(*fnames)¶ return list of sequences from all fnames
-
augur.align.
register_arguments
(parser)¶
-
augur.align.
remove_reference_sequence
(seqs, reference_name)¶
-
augur.align.
run
(args)¶ - Parameters
args (namespace) – arguments passed in via the command-line from augur
- Returns
returns 0 for success, 1 for general error
- Return type
int
-
augur.align.
strip_non_reference
(aln, reference, insertion_csv=None)¶ return sequences that have all insertions relative to the reference removed. The aligment is returned as list of sequences.
- Parameters
aln (MultipleSeqAlign) – Biopython Alignment
reference (str) – name of reference sequence, assumed to be part of the alignment
- Returns
list – list of trimmed sequences, effectively a multiple alignment
Tests
—–
>>> [s.name for s in strip_non_reference(read_alignment(“tests/data/align/test_aligned_sequences.fasta”), “with_gaps”)]
Trimmed gaps in with_gaps from the alignment
[‘with_gaps’, ‘no_gaps’, ‘some_other_seq’, ‘_R_crick_strand’]
>>> [s.name for s in strip_non_reference(read_alignment(“tests/data/align/test_aligned_sequences.fasta”), “no_gaps”)]
No gaps in alignment to trim (with respect to the reference, no_gaps)
[‘with_gaps’, ‘no_gaps’, ‘some_other_seq’, ‘_R_crick_strand’]
>>> [s.name for s in strip_non_reference(read_alignment(“tests/data/align/test_aligned_sequences.fasta”), “missing”)]
Traceback (most recent call last) – …
augur.align.AlignmentError (ERROR: reference missing not found in alignment)
-
augur.align.
write_seqs
(seqs, fname)¶ A wrapper around SeqIO.write with error handling