The scaProcessMSA script conducts the basic steps in multiple sequence alignment (MSA) pre-processing for SCA, and stores the results using the python tool pickle:
Trim the alignment, either by truncating to a reference sequence (specified with the -t flag) or by removing excessively gapped positions (set to positions with more than 40% gaps)
Identify/designate a reference sequence in the alignment, and create a mapping of the alignment numberings to position numberings for the reference sequence. The reference sequence can be specified in one of four ways:
- By supplying a PDB file - in this case, the reference sequence is taken from the PDB (see the pdb kwarg)
- By supplying a reference sequence directly (as a fasta file - see the refseq kwarg)
- By supplying the index of the reference sequence in the alignment (see the refseq kwarg)
- If no reference sequence is supplied by the user, one is automatically selected using the scaTools function chooseRef.
- The position numbers (for mapping the alignment) can be specified in one of three ways:
- By supplying a PDB file - in this case the alignment positions are mapped to structure positions
- By supplying a list of reference positions (see the refpos kwarg)
- If no reference positions are supplied by the user, sequential numbering (starting at 1) is assumed.
Filter sequences to remove highly gapped sequences, and sequences with an identity below or above some minimum or maximum value to the reference sequence (see the parameters kwarg)
Filter positions to remove highly gapped positions (default 20% gaps, can also be set using –parameters)
Calculate sequence weights and write out the final alignment and other variables
Arguments: | Input_MSA.fasta (the alignment to be processed, typically the headers contain taxonomic information for the sequences). |
||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Keyword Arguments: | |||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||
Example: |
>>> ./scaProcessMSA.py Inputs/PF00071_full.an -s 5P21 -c A -f 'Homo sapiens'
By: | Rama Ranganathan |
---|---|
On: | 8.5.2014 |
Copyright (C) 2015 Olivier Rivoire, Rama Ranganathan, Kimberly Reynolds This program is free software distributed under the BSD 3-clause license, please see the file LICENSE for details.