Tutorial: Nanopore Analysis Pipeline

Introduction

Hi! I’m Sabeel Mansuri, an Undergraduate Research Assistant for the Bowman Lab at the Scripps Institute of Oceanography, University of California San Diego. The following is a tutorial that demonstrates a pipeline used to assemble and annotate a bacterial genome from Oxford Nanopore MinION data.

This tutorial will require the following (brief installation instructions are included below):

  1. Canu Assembler
  2. Bandage
  3. Prokka
  4. Barrnap
  5. DNAPlotter (alternatively circos)

Software Installation

Canu

Canu is a packaged correction, trimming, and assembly program that is forked from the Celera assembler codebase. Install the latest release by running the following:

git clone https://github.com/marbl/canu.git
cd canu/src
make

Bandage

Bandage is an assembly visualization software. Install it by visiting this link, and downloading the version appropriate for your device.

Prokka

Prokka is a gene annotation program. Install it by visiting this link, and running the installation commands appropriate for your device.

Barrnap

Barrnap is an rRNA prediction software used by Prokka. Install it by visiting this link, and running the installation commands appropriate for your device.

DNAPlotter

DNAPlotter is a gene annotation visualization software. Install it by visiting this link, and running the installation commands appropriate for your device.

Dataset

Download the nanopore dataset located here. This is an isolate from a sample taken from a local saline lake at South Bay Salt Works near San Diego, California.

The download will provide a tarball. Extract it:

tar -xvf nanopore.tar.gz

This will create a runs_fastq folder containing 8 fastq files containing genetic data.

Assembly

Canu can be used directly on the data without any preprocessing. The only additional information needed is an estimate of the genome size of the sample. For the saline isolate, we estimate 3,000,000 base pairs. Then, use the following Canu command to assemble our data:

canu -nanopore_raw -p test_canu -d test_canu runs_fastq/*.fastq genomeSize=3000000 gnuplotTested=true

A quick description of all flags and parameters:

  • -nanopore_raw – specifies data is Oxford Nanopore with no data preprocessing
  • -p – specifies prefix for output files, use “test_canu” as default
  • -d – specifies directory to run test and output files in, use “test_canu” as default
  • genomeSize – estimated genome size of isolate
  • gnuplotTested – setting to true will skip gnuplot testing; gnuplot is not needed for this pipeline

Running this command will output various files into the test_canu directory. The assembled contigs are located in the test.contigs.fasta file. These contigs can be better visualized using Bandage.

Assembly Visualization

Opening Bandage and a GUI window should pop up. In the toolbar, click File > Load Graph, and select the test.contigs.gfa. You should see something like the following:

This graph reveals that one of our contigs appears to be a whole circular chromosome! A quick comparison with the test.contigs.fasta file reveals this is Contig 1. We extract only this sequence from the contigs file to examine further. Note that the first contig takes up the first 38,673 lines of the file, so use head:

head -n38673 test_canu/test_canu.contigs.fasta >> test_canu/contig1.fasta 

NCBI BLAST

We blast this Contig using NCBI’s nucleotide BLAST database (linked here) with all default options. The top hit is:

Hit: Halomonas sp. hl-4 genome assembly, chromosome: I  
Organism: Halomonas sp. hl-4  
Phylogeny: Bacteria/Proteobacteria/Gammaproteobacteria/Oceanospirillales/Halomonadaceae/Halomonas  
Max score: 65370  
Query cover: 72%  
E value: 0.0  
Ident 87%

It appears this chromosome is the genome of an organism in the genus Halomonas. We may now be interested in the gene annotation of this genome.

Gene Annotation

Prokka will take care of gene annotation, the only required input is the contig1.fasta file.

prokka --outdir circular --prefix test_prokka test_canu/contig1.fasta

The newly created circular directory contains various files with data on the gene annotation. Take a look inside test_prokka.txt for a summary of the annotation. We can take a quick look at the annotation using the DNAPlotter GUI.  For a more customized circular plot use circos.

Summary

The analysis above has taken Oxford Nanopore sequenced data, assembled contigs, identified the closest matching organism, and annotated its genome.

25696 Total Views 2 Views Today
This entry was posted in Computer tutorials. Bookmark the permalink.

3 Responses to Tutorial: Nanopore Analysis Pipeline

  1. Pankaj says:

    Hi,
    I am working on 16S data from MinION please guide me the working pipeline for the same and any reference would be great.

  2. Avatar photo Jeff says:

    That looks great, will check it out. We did play around with Nanopolish but I don’t think we’ve tried racon yet…

  3. Eric says:

    Nice! If you’re just doing nanopore you probably also want to do some polishing of the assembly before calling orfs

    https://github.com/nanoporetech/ont-assembly-polish

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield