.. _install

=============
Installation
=============

1. Downloading the code
=========================
The pySCA package, tutorials, and associated scripts are available for
download from github.com. 

2. Package Dependencies
=========================
Before running pySCA, you will need to download and install the following
(free) packages:

    1)  Anaconda Scientific Python - this package will install python,
        as well as several libraries necessary for the operation of
        pySCA (NumPy, SciPy, iPython, and Matplotlib).

	https://store.continuum.io/cshop/anaconda/
 
    2)  Install biopython - this can be done in two ways:

	a. from the command line, run:

	   >>> conda install biopython

	b. download (and install) from here:
	  
	   http://biopython.org/wiki/Main_Page

    3)  Install a robust pairwise alignment program. Either of the below will
	work with pySCA, in our hands ggsearch is fastest. This is
	critical for the scaProcessMSA.py script.

	a.  ggsearch - part of the FASTA software package, http://fasta.bioch.virginia.edu/fasta_www2/fasta_down.shtml
	b.  needle - part of the EMBOSS software package, http://emboss.sourceforge.net/

	*The following steps are optional but highly recommended.*	

    4)  Download pfamseq.txt - a file containing phylogenetic
	annotations for PFAM sequences. This is necessary if you would
	like to annotate PFAM alignments with taxonomic/phylogenetic
	information using the annotate_MSA.py script provided by
	pySCA. This file is quite large, but available from the PFAM
	ftp site in compressed (*.gz) format.
 
        http://pfam.xfam.org/help#tabview=tab12

    5)  Install PyMol (http://www.pymol.org/) - necessary if you would like to use pySCA's automated
        structure mapping scripts, and useful for mapping the sectors
	to structure in general	    

    6)  Install mpld3 (http://mpld3.github.io/) - a package that
	allows more interactive plot visualization in ipython notebooks . If you
	choose not to install this (optional) package, you will need
	to comment out the line "import mpld3" at the beginning of the tutorials.

	   
3. Path Modifications
=========================
Following the successful installation of these packages, 
edit the following variables in the "PATHS" of scaTools.py to reflect
the locations of these files on your computer:

     path2pfamseq = 'pfamseq.txt' `location of the pfam.seq database file`
     
     path2structures = 'Inputs/'  `location of your PDB structures for analysis`

     path2pymol = '/Applications/MacPyMOL.app/Contents/MacOS/MacPyMOL'
     `location of your PyMOL executable`
   
     path2needle = '/usr/local/bin' `location of the needle
     executable, if you have installed EMBOSS needle`

You may also need to modify the "shebang" line (!#) at the top of the
following scripts to appropriately reflect the path of
your python installation: 
     
     annotate_MSA.py

     scaProcessMSA.py

     scaCore.py

     scaSectorID.py


4. Getting started and Running the tutorials
==============================================
The `"getting started"`_ section of this documentation provides
instructions on how to run some initial calculations and the
tutorials. The basic idea behind the pySCA code is that the
core calculations are performed using a series of executable python scripts,
and then the results can be loaded and analyzed/visualized using an
ipython notebook (or alternatively, Matlab).

All of the tutorials are written provided  as ipython notebooks. For
more on how ipython notebooks work, see:
http://ipython.org/notebook.html. Prior to running the notebook
tutorials, you'll need to run the core calculation scripts that
generate the input for the notebooks. One way to do this is with the
shell script "runAllNBCalcs.sh", and there is more information on this
in the `"getting started"`_ section. Once the calculations are
completed, you can begin the tutorial in interactive python from the
command line, by typing:

>>> ipython notebook script_dhfr.ipynb 

.. _"getting started": get_started.html