The following is a paprica installation tutorial for novice users on Mac OSX (installation on Linux is quite a bit simpler). If you’re comfortable editing your PATH and installing things using bash you probably don’t need to follow this tutorial, just get the dependencies as indicated in the install instructions and linux_install.sh script. If command line operations stress you out, and you haven’t dealt with a lot of weird bioinformatics program installs, use this tutorial.
Please note that this tutorial is a work in progress. If you notice errors, inconsistencies, or omissions please leave a comment and I’ll be sure to correct them. This tutorial has been updated for the most recent version of paprica, and will not work for v0.6 or earlier.
** IMPORTANT ** It is generally considered very poor practice to install anything in the root directory. You might think, “but I’m the only user, so this makes more sense” or “but everyone in the lab wants program X, so I should install as root.” Don’t do it. Install to your home directory. It will add years to your life.
This tutorial assumes you’ve followed this advice, and that you are installing all the dependencies in your home directory.
Install Python and Python packages
paprica is 90 % an elaborate set of wrapper scripts for several core programs written by other groups. The scripts that execute the pipeline are bash scripts, the scripts that do the actual work are Python. Therefore you need Python up and running on your system. If you already have a mainstream v3.0 Python distro going just make sure that the modules listed below are installed (e.g., withconda install [package]
and not pip3). Note that Python 3 must be callable on your system as “python3” which should be the default.
Install some necessary Python modules, assuming you don’t already have them:
pip3 install numpy pip3 install biopython pip3 install joblib pip3 install pandas pip3 install seqmagick pip3 install termcolor
In case you have conflicts with other Python installations, or some other mysterious problems, it’s a good idea to test things out at this point. Open a shell, type “python3” and:
import numpy import Bio import joblib import pandas import termcolor
If you get any error messages something somewhere is wrong. Burn some incense and try again. If that doesn’t work try holy water.
Seqmagick is a standalone program, not a module, so check the installation by typing:
seqmagick
You should get a sensible error that is clearly seqmagick yelling at you and not your computer trying to find seqmagick.
Install Homebrew and wget
Older versions of paprica needed the programs pplacer and gappa, which had dependencies that could only be acquired for OSX for a package manager such as Homebrew. These are no longer needed for paprica but I’ve left the Homebrew step in here because if you’re doing anything sciency with your computer you probably want a package manager, and I find wget to be a much more useful file fetching utility than curl.
To download Homebrew (assuming you don’t already have it) type:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Follow the on-screen instructions.
Now install wget.
brew install wget
Install Infernal, pplacer, and epa-ng
Assuming all that went okay go ahead and download the software you need to execute just the paprica-run.sh portion of paprica. First, the excellent aligner Infernal. From your home directory:
wget http://eddylab.org/infernal/infernal-1.1.1-macosx-intel.tar.gz tar -xzvf infernal-1.1.1-macosx-intel.tar.gz mv infernal-1.1.1-macosx-intel infernal
Then gappa:
To install gappa you need make and cmake. And a fairly up-to-date C+++11 compiler.
brew install make brew install cmake
If you try to do the compilation with AppleClang on MacOSX it will probably fail. You need OpenMp, but it is a know issue that AppleClang on MacOSx does not work well with OpenMp. Here we provide two alternatives based on a discussion in gappa’s page, and you can read about it here.
- Install gappa via conda, instead of compiling on your own (it does not need OpenMp): https://anaconda.org/bioconda/gappa
- Instead of AppleClang, use a “proper” clang:
brew install llvm libomp
You’ll need to set custom paths so that the new clang is used:
#Example on how to do this: export PATH="$ (brew --prefix llvm) /bin:$PATH"; export COMPILER=/usr/local/opt/llvm/bin/clang++
exportCFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
exportCXXFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
exportLDFLAGS="-L /usr/local/lib -L/usr/local/opt/llvm/lib"
exportCXX=${COMPILER}
#Not needed for all MacOS versions #If you get this error:"ld: unknown option: -platform_version"
#You'll need to add: exportCXXFLAGS="${CXXFLAGS} -mlinker-version=450"
exportLDFLAGS="${LDFLAGS} -mlinker-version=450"
And now you should be ready to install gappa:
git clone --recursive https://github.com/lczech/gappa.git cd gappa make cd ~
And finally, epa-ng:
brew install brewsci/bio/epa-ng
Add dependencies to PATH
Now comes the tricky bit, you need to add the locations of the executables for these programs to your PATH variable. This is a pretty important basic computing skill to master. Try not to screw it up. It isn’t hard to undo screw-ups, but it will freak you out because bash will suddenly be unable to find programs that it could find before. Before you continue please read the excellent summary of shell startup scripts as they pertain to OSX here:
http://hayne.net/MacDev/Notes/unixFAQ.html#shellStartup
This tutorial attempts to provide a broad solution to shell startup scripts by sourcing .profile and .bash_profile in .bashrc. I recommend you then only modify .bashrc, though this is not strictly necessary.
## Open .bashrc for editing. nano .bashrc
At the top of the file type:
source .bash_profile source .profile
Now navigate to the end of the file and paste the following, modifying as necessary (note: there are lots of syntactic variations for adding a location to PATH, the below commands are a little redundant but clear and easy to modify):
export PATH=/Users/your-user-name/infernal/binaries:${PATH} export PATH=/Users/your-user-name/infernal/easel:${PATH} export PATH=/Users/your-user-name/pplacer:${PATH} export PATH=/Users/your-user-name/epa-ng/bin:${PATH} export PATH=/Users/your-user-name/paprica:${PATH} export PATH=/Users/your-user-name/gappa/bin:${PATH}
Don’t be the guy or gal who types your-user-name. Replace with your actual user name. Hit ctrl-o to write out the file, enter to save, and ctrl-x to exit nano.
Re-source .bashrc by typing:
source .bashrc
Confirm that you can execute the following programs by navigating to your home directory and executing each of the following commands:
cmalign esl-alimerge gappa epa-ng
You should get an error message that is clearly from the program, not a bash error like “command not found”.
Get paprica
Okay, now you are ready to get paprica and do some analysis! You can clone the latest repository here :
git clone https://github.com/bowmanjeffs/paprica.git
Now make paprica-run.sh and python scripts executable.
cd paprica chmod a+x paprica-run.sh chmod a+x *py
At this point you should be ready to rock. Take a deep breath and type:
./paprica-run.sh test bacteria
This analyzes the file test.fasta against the bacteria database. You should see a lot of output flash by on the screen, and you should see a number of new files in the directory with the prefix “test.” Checkout the paprica analysis tutorial and manual for more info on these files.
To run your own analysis, say on amazing_sample.fasta against the bacteria database, simply type:
./paprica-run.sh amazing_sample bacteria
Please, please, please, read the manual for further details. Remember that the fasta file you input should contain only reads that are properly QC’d (i.e. low quality ends and adapters and barcodes and such trimmed away) and denoised (e.g., with dada2).
Note that if you’re having difficulty installing paprica or its dependencies you can now use a handy Docker image instead:
docker pull jsbowman/paprica:latest