Introduction
Hi! I’m Sabeel Mansuri, an Undergraduate Research Assistant for the Bowman Lab at the Scripps Institute of Oceanography, University of California San Diego. The following is a tutorial that demonstrates a pipeline used to assemble and annotate a bacterial genome from Oxford Nanopore MinION data.
This tutorial will require the following (brief installation instructions are included below):
- Canu Assembler
- Bandage
- Prokka
- Barrnap
- DNAPlotter (alternatively circos)
Software Installation
Canu
Canu is a packaged correction, trimming, and assembly program that is forked from the Celera assembler codebase. Install the latest release by running the following:
git clone https://github.com/marbl/canu.git
cd canu/src
make
Bandage
Bandage is an assembly visualization software. Install it by visiting this link, and downloading the version appropriate for your device.
Prokka
Prokka is a gene annotation program. Install it by visiting this link, and running the installation commands appropriate for your device.
Barrnap
Barrnap is an rRNA prediction software used by Prokka. Install it by visiting this link, and running the installation commands appropriate for your device.
DNAPlotter
DNAPlotter is a gene annotation visualization software. Install it by visiting this link, and running the installation commands appropriate for your device.
Dataset
Download the nanopore dataset located here. This is an isolate from a sample taken from a local saline lake at South Bay Salt Works near San Diego, California.
The download will provide a tarball. Extract it:
tar -xvf nanopore.tar.gz
This will create a runs_fastq folder containing 8 fastq files containing genetic data.
Assembly
Canu can be used directly on the data without any preprocessing. The only additional information needed is an estimate of the genome size of the sample. For the saline isolate, we estimate 3,000,000 base pairs. Then, use the following Canu command to assemble our data:
canu -nanopore_raw -p test_canu -d test_canu runs_fastq/*.fastq genomeSize=3000000 gnuplotTested=true
A quick description of all flags and parameters:
- -nanopore_raw – specifies data is Oxford Nanopore with no data preprocessing
- -p – specifies prefix for output files, use “test_canu” as default
- -d – specifies directory to run test and output files in, use “test_canu” as default
- genomeSize – estimated genome size of isolate
- gnuplotTested – setting to true will skip gnuplot testing; gnuplot is not needed for this pipeline
Running this command will output various files into the test_canu directory. The assembled contigs are located in the test.contigs.fasta file. These contigs can be better visualized using Bandage.
Assembly Visualization
Opening Bandage and a GUI window should pop up. In the toolbar, click File > Load Graph, and select the test.contigs.gfa. You should see something like the following:
This graph reveals that one of our contigs appears to be a whole circular chromosome! A quick comparison with the test.contigs.fasta file reveals this is Contig 1. We extract only this sequence from the contigs file to examine further. Note that the first contig takes up the first 38,673 lines of the file, so use head
:
head -n38673 test_canu/test_canu.contigs.fasta >> test_canu/contig1.fasta
NCBI BLAST
We blast this Contig using NCBI’s nucleotide BLAST database (linked here) with all default options. The top hit is:
Hit: Halomonas sp. hl-4 genome assembly, chromosome: I
Organism: Halomonas sp. hl-4
Phylogeny: Bacteria/Proteobacteria/Gammaproteobacteria/Oceanospirillales/Halomonadaceae/Halomonas
Max score: 65370
Query cover: 72%
E value: 0.0
Ident 87%
It appears this chromosome is the genome of an organism in the genus Halomonas. We may now be interested in the gene annotation of this genome.
Gene Annotation
Prokka will take care of gene annotation, the only required input is the contig1.fasta file.
prokka --outdir circular --prefix test_prokka test_canu/contig1.fasta
The newly created circular directory contains various files with data on the gene annotation. Take a look inside test_prokka.txt for a summary of the annotation. We can take a quick look at the annotation using the DNAPlotter GUI. For a more customized circular plot use circos.
Summary
The analysis above has taken Oxford Nanopore sequenced data, assembled contigs, identified the closest matching organism, and annotated its genome.
Hi,
I am working on 16S data from MinION please guide me the working pipeline for the same and any reference would be great.
That looks great, will check it out. We did play around with Nanopolish but I don’t think we’ve tried racon yet…
Nice! If you’re just doing nanopore you probably also want to do some polishing of the assembly before calling orfs
https://github.com/nanoporetech/ont-assembly-polish