Tutorial: Installing paprica on Mac OSX

The following is a paprica installation tutorial for novice users on Mac OSX (installation on Linux is quite a bit simpler). If you’re comfortable editing your PATH and installing things using bash you probably don’t need to follow this tutorial, just get the dependencies as indicated in the install instructions and linux_install.sh script. If command line operations stress you out, and you haven’t dealt with a lot of weird bioinformatics program installs, use this tutorial.

Please note that this tutorial is a work in progress.  If you notice errors, inconsistencies, or omissions please leave a comment and I’ll be sure to correct them.  This tutorial has been updated for the most recent version of paprica, and will not work for v0.6 or earlier.

** IMPORTANT ** It is generally considered very poor practice to install anything in the root directory.  You might think, “but I’m the only user, so this makes more sense” or “but everyone in the lab wants program X, so I should install as root.”  Don’t do it.  Install to your home directory.  It will add years to your life.

This tutorial assumes you’ve followed this advice, and that you are installing all the dependencies in your home directory.

Install Python and Python packages

paprica is 90 % an elaborate set of wrapper scripts for several core programs written by other groups. The scripts that execute the pipeline are bash scripts, the scripts that do the actual work are Python. Therefore you need Python up and running on your system. If you already have a mainstream v3.0 Python distro going  just make sure that the modules listed below are installed (e.g., withconda install [package]and not pip3).  Note that Python 3 must be callable on your system as “python3” which should be the default.

Install some necessary Python modules, assuming you don’t already have them:

pip3 install numpy
pip3 install biopython
pip3 install joblib
pip3 install pandas
pip3 install seqmagick
pip3 install termcolor

In case you have conflicts with other Python installations, or some other mysterious problems, it’s a good idea to test things out at this point. Open a shell, type “python3”  and:

import numpy
import Bio
import joblib
import pandas
import termcolor

If you get any error messages something somewhere is wrong. Burn some incense and try again. If that doesn’t work try holy water.

Seqmagick is a standalone program, not a module, so check the installation by typing:

seqmagick

You should get a sensible error that is clearly seqmagick yelling at you and not your computer trying to find seqmagick.

Install Homebrew and wget

Older versions of paprica needed the programs pplacer and gappa, which had dependencies that could only be acquired for OSX for a package manager such as Homebrew.  These are no longer needed for paprica but I’ve left the Homebrew step in here because if you’re doing anything sciency with your computer you probably want a package manager, and I find wget to be a much more useful file fetching utility than curl.

To download Homebrew (assuming you don’t already have it) type:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Follow the on-screen instructions.

Now install wget.

brew install wget

Install Infernal, pplacer, and epa-ng

Assuming all that went okay go ahead and download the software you need to execute just the paprica-run.sh portion of paprica. First, the excellent aligner Infernal. From your home directory:

wget http://eddylab.org/infernal/infernal-1.1.1-macosx-intel.tar.gz
tar -xzvf infernal-1.1.1-macosx-intel.tar.gz
mv infernal-1.1.1-macosx-intel infernal

Then gappa:

To install gappa you need make and cmake. And a fairly up-to-date  C+++11 compiler.

brew install make
brew install cmake

If you try to do the compilation  with AppleClang on MacOSX it will probably fail.  You need OpenMp, but  it is a know issue that AppleClang on MacOSx does not work well with OpenMp. Here we provide two alternatives based on a discussion in gappa’s page, and you can read about it here.

  1. Install gappa via conda, instead of compiling on your own (it does not need OpenMp): https://anaconda.org/bioconda/gappa
  2. Instead of AppleClang, use a “proper” clang:
brew install llvm libomp

You’ll need to set custom paths so that the new clang is used:

#Example on how to do this:
export PATH="$ (brew --prefix llvm) /bin:$PATH";
export COMPILER=/usr/local/opt/llvm/bin/clang++
export CFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
export CXXFLAGS="-I /usr/local/include -I/usr/local/opt/llvm/include"
export LDFLAGS="-L /usr/local/lib -L/usr/local/opt/llvm/lib"
export CXX=${COMPILER}

#Not needed for all MacOS versions
#If you get this error: "ld: unknown option: -platform_version"
#You'll need to add:
export CXXFLAGS="${CXXFLAGS} -mlinker-version=450"
export LDFLAGS="${LDFLAGS} -mlinker-version=450"

And now you should be ready to install gappa:

git clone --recursive https://github.com/lczech/gappa.git
cd gappa
make
cd ~

And finally, epa-ng:

brew install brewsci/bio/epa-ng

Add dependencies to PATH

Now comes the tricky bit, you need to add the locations of the executables for these programs to your PATH variable.  This is a pretty important basic computing skill to master.  Try not to screw it up.  It isn’t hard to undo screw-ups, but it will freak you out because bash will suddenly be unable to find programs that it could find before. Before you continue please read the excellent summary of shell startup scripts as they pertain to OSX here:

http://hayne.net/MacDev/Notes/unixFAQ.html#shellStartup

This tutorial attempts to provide a broad solution to shell startup scripts by sourcing .profile and .bash_profile in .bashrc.  I recommend you then only modify .bashrc, though this is not strictly necessary.

## Open .bashrc for editing.

nano .bashrc

At the top of the file type:

source .bash_profile
source .profile

Now navigate to the end of the file and paste the following, modifying as necessary (note: there are lots of syntactic variations for adding a location to PATH, the below commands are a little redundant but clear and easy to modify):

export PATH=/Users/your-user-name/infernal/binaries:${PATH}
export PATH=/Users/your-user-name/infernal/easel:${PATH}
export PATH=/Users/your-user-name/pplacer:${PATH}
export PATH=/Users/your-user-name/epa-ng/bin:${PATH}
export PATH=/Users/your-user-name/paprica:${PATH}
export PATH=/Users/your-user-name/gappa/bin:${PATH}

Don’t be the guy or gal who types your-user-name. Replace with your actual user name. Hit ctrl-o to write out the file, enter to save, and ctrl-x to exit nano.

Re-source .bashrc by typing:

source .bashrc

Confirm that you can execute the following programs by navigating to your home directory and executing each of the following commands:

cmalign
esl-alimerge
gappa
epa-ng

You should get an error message that is clearly from the program, not a bash error like “command not found”.

Get paprica

Okay, now you are ready to get paprica and do some analysis! You can clone the latest repository here :

git clone https://github.com/bowmanjeffs/paprica.git

Now make paprica-run.sh and python scripts executable.

cd paprica 
chmod a+x paprica-run.sh 
chmod a+x *py

At this point you should be ready to rock. Take a deep breath and type:

./paprica-run.sh test bacteria

This analyzes the file test.fasta against the bacteria database.  You should see a lot of output flash by on the screen, and you should see a number of new files in the directory with the prefix “test.”  Checkout the paprica analysis tutorial and manual for more info on these files.

To run your own analysis, say on amazing_sample.fasta against the bacteria database, simply type:

./paprica-run.sh amazing_sample bacteria

Please, please, please, read the manual for further details. Remember that the fasta file you input should contain only reads that are properly QC’d (i.e. low quality ends and adapters and barcodes and such trimmed away) and denoised (e.g., with dada2).

21360 Total Views 10 Views Today
This entry was posted in paprica. Bookmark the permalink.

One Response to Tutorial: Installing paprica on Mac OSX

  1. Avatar photo Jeff says:

    Note that if you’re having difficulty installing paprica or its dependencies you can now use a handy Docker image instead: docker pull jsbowman/paprica:latest

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield