Discover America visits SIO

Thanks to Jesse and Natalia for their help yesterday with the Discover America program run by the US State Department; SIO successfully hosted 35 foreign ambassadors and their spouses for an educational tour of the Scripps Pier. It was quite an experience. Jesse and I successfully dodged the photographers, but here’s a photo of Natalia talking science (presumably) with the ambassador of Cabo Verde and his wife. Downside: not a single diplomat or spouse wanted to go swimming, despite dolphins and balmy water temps!

Natalia talks science with his Excellency Carlos Alberto Wahnon De Carvalho Veiga and Ms. Maria Epifania Cruz Almeida of Cabo Verde. I presume they’re discussing biogeochemical cycling in mangrove forests. Credit: Scripps Communications
Posted in Uncategorized | Leave a comment

MOSAiC in the news!

As a quick followup to Emelia’s post (https://www.polarmicrobes.org/training-for-mosaic-bremerhaven-utqiagvik/) on training for MOSAiC, there is a nice piece out today in the Washington Post on the US-based training for MOSAiC here: https://www.washingtonpost.com/graphics/2019/national/science/arctic-sea-ice-expedition-to-study-climate-change/?utm_term=.2552b79d5a32. It’s alarming to realize that Polarstern will depart from Tromsø, Norway on September 20 – just 100 days from now!

Posted in Uncategorized | Leave a comment

Training for MOSAiC: Bremerhaven & Utqiagvik

A photo of me with the famous Utqiagvik whale-bone arch, and behind, the Chukchi Sea.

Hello! My name is Emelia Chamberlain and I am a first year PhD student here in the Bowman Lab working on the MOSAiC project. I just got back from a very exciting week in Utqiagvik Alaska for MOSAiC snow and ice training. But first, an overview… As mentioned in an earlier post, the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) project is an international effort to study the Arctic ocean-ice-atmosphere system with the goal of clarifying key climatic and ecological processes as they function in a changing Arctic. Within the larger scope of this project, our lab and collaborators from the University of Rhode Island (URI) will be studying how microbial community structure and ecophysiology control fluxes of oxygen and methane in the central Arctic Ocean.

MOSAiC begins in Sept of 2019, when the German icebreaker RV Polarstern will sail into the Laptev Sea and be tethered to an ice flow. Once trapped in the ice, both ship & scientists will spend the next year drifting through the Arctic. The goal is to set up a central observatory and collect time-series observations across the complete seasonal cycle. This year-long time series will be both exciting and critical for the future of Arctic research, but it is logistically difficult to carry out. The cruise is split up into 6 “legs”, with scientists taking two month shifts collecting observations and living the Arctic life. Resupply will be carried out by other icebreakers and aircraft. I myself will be taking part in the last two legs of this project from June – October 2020, with Jeff, Co-PI Brice Loose (URI), and his post-doc Alessandra D’Angelo (URI) representing our project on the rest of the voyage.

A representation of the central observatory taken from the MOSAiC website

Laboratory training in Bremerhaven, Germany

As one would imagine, with over 600 scientists involved and continuous measurements broken up between multiple teams, this project requires a LOT of advanced planning. However, this is the fun part, as it means we get to travel a lot in preparation! In March, Jeff and I traveled to Potsdam, Germany to participate in a MOSAiC implementation workshop. Shortly after, we took a train up to the Alfred Wegener Institute facilities in Bremerhaven with Brice, Alessandra, and other MOSAiC participants to train on some of the instrumentation we will be operating on the Polarstern. We spent a full week training on instruments like a gas chromatograph, gas-flux measurement chambers, and a membrane inlet mass spectrometer (MIMS). While many of us had operated these types of instruments before, each machine is different and several were engineered or re-designed by participating scientists specifically for MOSAiC.

A specially designed gas-flux chamber for measuring metabolic gas fluxes in both snow and ice. Photo courtesy of Brice Loose (URI)
The AWI engineered MIMS that will be onboard Polarstern. The bubbling chamber ensures precise, daily calibrations (and looks really cool).

The bulk of the training was focused on the MIMS, which will be used to take continuous underway ∆O2/Ar measurements from surface waters during MOSAiC. Water is piped from below the Polarstern and run through the mass spectrometer where dissolved gas concentrations are measured. Argon (Ar), a biologically inert gas, is incorporated into the ocean’s mixed layer at the same rate as oxygen (O2). However, while argon concentrations are evenly distributed, oxygen concentrations are affected by biogeochemical processes (photosynthesis and respiration by biota). We can therefore compare oxygen and argon measurements in the water column to determine how much oxygen has deviated from what we would expect through physical air-sea exchange processes (i.e. deviations from biologic activity). From these oxygen fluxes, we can estimate Net Community Production (NCP), which is defined as the total amount of chemical energy produced by photosynthesis minus that which is used in respiration. This is an important balance to quantify, as it is representative of the amount of carbon removed biologically from the atmosphere (CO2) and sequestered into the ocean pool. The goal is to use these continuous MOSAiC measurements to quantify these biogeochemical budgets through time and get a better understanding of whether the Arctic is net phototrophic or heterotrophic – whether photosynthesis or respiration is the dominant process.  

A behind-the-scenes view of operating the MIMS – photo courtesy of Brice Loose (URI).
Learning how to remove and clean the equilibration tubes These tubes bubble gases into the water for calibration.
PC: Brice Loose (URI)
We will be partially responsible for operating this instrument during our respective legs, and therefore spent a lot of time thinking about what might possibly go wrong during a year on an ice-locked vessel… and how to fix it PC: Brice Loose (URI)

Field training in Utqiagvik, Alaska

Utqiagvik, Alaska (formerly Barrow) is located at the northern tip of Alaska situated between the Chukchi and Beaufort seas. It boasts the northern most point in continental North America.

After a productive week in Bremerhaven, this past week we stepped outside the laboratory with a snow and ice field training session in Utqiagvik, Alaska. One of the challenges of Arctic fieldwork is, of course, that it takes place in the frigid Arctic environment. To help scientists prepare for life on the ice and to help standardize/optimize sampling methods for MOSAiC, there were 3 snow and ice field training sessions organized (the two others took place earlier this year in Finland.) This trip was particularly exciting for me, as it was my first time in the Arctic! Not only did I learn a lot about sampling sea ice but I was struck by the dynamic beauty of the polar landscape. No wonder researchers continue to be fascinated with the unanswered questions of this wild ecosystem.

Up close and personal with a large pressure ridge. Pressure ridges in sea ice are formed when two ice floes collide with each other. You can tell that this ridge was formed from multi-year ice by the thickness of the blocks and their deep blue color. Ice is classified as multi-year when it has survived multiple melt seasons.
Post-doc J.P. Balmonte from Uppsala University meanders his way along the pressure ridge.

The three trainings that everyone had to complete consisted of snow sampling, ice sampling and snow mobile training. Aside from that, people were able to learn or compare more advanced methods for their sampling specialities and test out gear, both scientific and personal weather protection. I was lucky in that the average -18ºC weather we experienced in Utqiagvik will most likely be representative of the type of weather I will be facing in the summer months of MOSAiC. The winter teams will have to contend with quite a bit cooler conditions.

Some days are windier than others and it’s very important to bundle up. However, on this trip I also learned that layers are very important. Working on the ice, especially coring, can be hard work and you don’t want to overheat. Should I need to remove it, beneath my big parka I’ve got on a light puffy jacket, a fleece, and a wool thermal under-layer.
Digging snow-pits is an important aspect for sampling parameters like snow thickness and density. The goal is to get a clear vertical transect of snow to examine depth horizons and sample from. If you look closely, you can see 2 cm thick squares of snow which have been removed from the pit’s wall and weighed before discarding. The wall is built from the snow removed from the working pit and is intended to block researchers from the wind.
Note the meter-stick for snow thickness.
This is a work view I could get used to.
Coring practice! The extension pole between the corer and drill indicate that this is some pretty thick ice. PC: Jeff Bowman
One of the most exciting trainings we had was on how to operate the snow mobiles. These are a critical form of transport on the ice. They often have sleds attached with which to transport gear and samples to and from the ship. As such, we researchers are expected to be able to drive them properly (plus it was pretty fun and allowed us to reach more remote ice locations over our short week in Utquiagvik).
Once out on the ice we practiced tipping the machines over… and how to right them again.
Learning the basics! Note the sled behind ready to be attached to the machine.

While in Utqiagvik, we here at the Bowman Lab decided to make the most of this trip by also collecting some of our own sea-ice cores to sample and experiment with. The goal of our experiment is to determine the best method for melting these cores (necessary for sampling them) while providing the least amount of stress to the resident microbial communities that we are interested in sampling for. I will write up a post covering the methods and ideas behind this experiment soon – but in the meantime, please enjoy this excellent go-pro footage from beneath the ice captured by Jeff during our fieldwork. The brown gunk coating the bottom of the ice is sea-ice algae, mostly made up of diatoms. The ice here is only 68 cm thick allowing for a lot of light penetration and an abundant photosynthetic community. At the end, you can also note the elusive Scientists in their natural sampling habitat.

What’s next?

Jeff looks to the horizon.

As Sept 2019 gets closer, preparations are likely to ramp up even more. Even though I won’t be in the field for another year, it is exciting to think that the start of MOSAiC is rapidly approaching and after these two weeks of training I am feeling much more prepared for the scientific logistics and field challenges that will accompany this research. However, there is still much more to come. In a few weeks I will be jetting off again, but this time to URI to meet up with our collaborators for more instrument training. And thus the preparations continue…

Posted in MOSAiC | Leave a comment

Tutorial: Basic heatmaps and ordination with paprica output

The output from our paprica pipeline for microbial community structure analysis and metabolic inference has changed quite a lot over the last few months. In response to some recent requests here’s a tutorial that walks through an ordination and a few basic plots with the paprica output. The tutorial assumes that you’ve completed this tutorial which runs paprica on the samples associated with our recent seagrass paper.

For our analysis lets bring the pertinent files into R and do some pre-processing:

## read in the edge and unique abundance tables. Note that it's going to take a bit to load the unique_tally file because you have a lot of variables!

tally <- read.csv('2017.07.03_seagrass.bacteria.edge_tally.csv', header = T, row.names = 1)
unique <- read.csv('2017.07.03_seagrass.bacteria.unique_tally.csv', header = T, row.names = 1)

## read in edge_data and taxon_map and seq_edge_map

data <- read.csv('2017.07.03_seagrass.bacteria.edge_data.csv', header = T, row.names = 1)
taxa <- read.csv('2017.07.03_seagrass.bacteria.taxon_map.csv', header = T, row.names = 1, sep = ',', as.is = T)
map <- read.csv('2017.07.03_seagrass.bacteria.seq_edge_map.csv', header = T, row.names = 1)

## convert all na's to 0, then check for low abundance samples

tally[is.na(tally)] <- 0
unique[is.na(unique)] <- 0
rowSums(tally)

## remove any low abundance samples (i.e. bad library builds), and also
## low abundance reads.  This latter step is optional, but I find it useful
## unless you have a particular interest in the rare biosphere.  Note that
## even with subsampling your least abundant reads are noise, so at a minimum
## exclude everything that appears only once.

tally.select <- tally[rowSums(tally) > 5000,]
tally.select <- tally.select[,colSums(tally.select) > 1000]

unique.select <- unique[rowSums(unique) > 5000,]
unique.select <- unique.select[,colSums(unique.select) > 1000]

If your experiment is based on factors (i.e. you want to test for differences between categories of samples) you may want to use DESeq2, otherwise I suggest normalizing by sample abundance.

## normalize

tally.select <- tally.select/rowSums(tally.select)
unique.select <- unique.select/rowSums(unique.select)

Now we’re going to do something tricky. For both unique.select and tally.select, rows are observations and columns are variables (edges or unique reads). Those likely don’t mean much to you unless you’re intimately familiar with the reference tree. We can map the edge numbers to taxa using “taxa” dataframe, but first we need to remove the “X” added by R to make the numbers legal column names. For the unique read labels, we need to split on “_”, which divides the unique read identified from the edge number.

## get edge numbers associated with columns, and map to taxa names.
## If the entry in taxon is empty it means the read could not be classifed below
## the level of the domain Bacteria, and is labeled as "Bacteria"

tally.lab.Row <- taxa[colnames(tally.select), 'taxon']
tally.lab.Row[tally.lab.Row == ""] <- 'Bacteria'

unique.lab.Row <- map[colnames(unique.select), 'global_edge_num']
unique.lab.Row <- taxa[unique.lab.Row, 'taxon']
unique.lab.Row[unique.lab.Row == ""] <- 'Bacteria'
unique.lab.Row[is.na(unique.lab.Row)] <- 'Bacteria'

In the above block of code I labeled the new variables as [tally|unique].lab.Row, because we’ll first use them to label the rows of a heatmap. Heatmaps are a great way to start getting familiar with your data.

## make a heatmap of edge abundance

heat.col <- colorRampPalette(c('white', 'lightgoldenrod1', 'darkgreen'))(100)

heatmap(t(data.matrix(tally.select)),
        scale = NULL,
        col = heat.col,
        labRow = tally.lab.Row,
        margins = c(10, 10))
heatmap(t(data.matrix(unique.select)),
        scale = NULL,
        col = heat.col,
        labRow = unique.lab.Row,
        margins = c(10, 10))

Heatmaps are great for visualizing broad trends in the data, but they aren’t a good entry point for quantitative analysis. A good next step is to carry out some kind of ordination (NMDS, PCoA, PCA, CA). Not all ordination methods will work well for all types of data. Here we’ll use correspondence analysis (CA) on the relative abundance of the unique reads. CA will be carried out with the package “ca”, while “factoextra” will be used to parse the CA output and calculate key additional information. You can find a nice in-depth tutorial on correspondence analysis in R here.

library(ca)
library(factoextra)

unique.select.ca <- ca(unique.select)
unique.select.ca.var <- get_eigenvalue(unique.select.ca)
unique.select.ca.res <- get_ca_col(unique.select.ca)

species.x <- unique.select.ca$colcoord[,1]
species.y <- unique.select.ca$colcoord[,2]

samples.x <- unique.select.ca$rowcoord[,1]
samples.y <- unique.select.ca$rowcoord[,2]

dim.1.var <- round(unique.select.ca.var$variance.percent[1], 1)
dim.2.var <- round(unique.select.ca.var$variance.percent[2], 2)

plot(species.x, species.y,
     ylab = paste0('Dim 2: ', dim.2.var, '%'),
     xlab = paste0('Dim 1: ', dim.1.var, '%'),
     pch = 3,
     col = 'red')

points(samples.x, samples.y,
       pch = 19)

legend('topleft',
       legend = c('Samples', 'Unique reads'),
       pch = c(19, 3),
       col = c('black', 'red'))

At this point you’re ready to crack open the unique.select.ca object and start doing some hypothesis testing. There’s one more visualization, however, that can help with initial interpretation; a heatmap of the top unique edges contributing to the first two dimensions (which account for nearly all of the variance between samples).

species.contr <- unique.select.ca.res$contrib[,1:2]
species.contr.ordered <- species.contr[order(rowSums(species.contr), decreasing = T),]
species.contr.top <- species.contr.ordered[1:10,]

species.contr.lab <- unique.lab.Row[order(rowSums(abs(species.contr)), decreasing = T)]

heatmap(species.contr.top,
        scale = 'none',
        col = heat.col,
        Colv = NA,
        margins = c(10, 20),
        labRow = species.contr.lab[1:10],
        labCol = c('Dim 1', 'Dim 2'),
        cexCol = 1.5)

From this plot we see that quite a few different taxa are contributing approximately equally to Dim 1 (which accounts for much of the variance between samples), including several different Pelagibacter and Rhodobacteracea strains. That makes sense as the dominant environmental gradient in the study was inside vs. outside of San Diego Bay and we would expect these strains to be organized along such a gradient. Dim 2 is different with unique reads associated with Tropheryma whipplei and Rhodoluna lacicola contributing most. These aren’t typical marine strains, and if we look back at the original data we see that these taxa are very abundant in just two samples. These samples are the obvious outliers along Dim 2 in the CA plot.

In this tutorial we covered just the community structure output from paprica, but of course the real benefit to using paprica is its estimation of metabolic potential. These data are found in the *.ec_tally.csv and *path_tally.csv files, and organized in the same way as the edge and unique read abundance tables. Because of this they can be plotted and analyzed in the same way.


Posted in paprica | Leave a comment

New paper on seagrass microbial ecology

We have a new paper out today on the impacts of coastal seagrasses on the microbial community structure of San Diego Bay.  I’m excited about this paper as the first student-led study to come out of my lab.  The study was conceived by Tia Rabsatt, an undergraduate from UVI, during a SURF REU in 2017.  Tia carried out the sample collection, DNA extractions, and flow cytometry, then handed the project off to Sahra Webb.  Sahra carried out the remainder of the project as her Masters thesis.

Tia filters water just outside the mouth of San Diego Bay.  Coronado Island is in the background.

Why the interest in seagrass?  Unlike kelp, seagrasses are true flowering plants.  They’re found around the world from the tropics to the high latitudes and perform a number of important ecosystem functions.  Considerable attention has been given to their importance as nursery habitat for a number of marine organisms.  More recently we’ve come to appreciate the role they play in mediating sediment transport and pollution.  Recent work in Indonesia (which inspired Tia to carry out this study) even showed that the presence of seagrass meadows between inhabited beaches and coral reefs reduced the load of human and coral pathogens within the reefs.

Seagrass, barely visible on a murky collection day.  Confirming seagrass presence/absence was a considerable challenge during the field effort, and one we hadn’t anticipated.  There’s always something…

There are a number of good papers out on the seagrass microbiome – epibionts and other bacteria that are physically associated with the seagrass (see here and here) – but not so many on water column microbes in the vicinity of seagrass meadows.  In this study we took paired samples inside and outside of seagrass beds within and just outside of San Diego Bay.  I’ll be the first to admit that our experimental design was simple, with a limited sample set, and we look forward to a more comprehensive analysis at some point in the future.  Regardless, it worked well for a factor-type analysis using DESeq2; testing for differentially present microbial taxa while controlling for the different locations.

What we found was that (not surprisingly) the influence of seagrass is pretty minor compared to the influence of sample location (inside vs. outside of the bay).  There were, however, some taxa that were more abundant near seagrass even when we controlled for sample location.  These included some expected copiotrophs including members of the Rhodobacteraceae, Puniceispirillum, and Colwellia, as well as some unexpected genera including Synechococcus and Thioglobus (a sulfur oxidizing gammaproteobacteria).  We spent the requisite amount of time puzzling over some abundant Rickettsiales within San Diego Bay.  We usually take these to mean SAR11 (though our analysis used paprica, which usually picks up Pelagibacter just fine), but didn’t look like SAR11 in this case.  An unusual coastal SAR11 clade?  A parasite or endosymbiont with a whonky GC ratio?  TBD…

Posted in Research, Uncategorized | Tagged , , , , | 1 Comment

New paper on Antarctic microbial dark matter

I’m happy to report that I have a new paper out this week in Frontiers in Microbiology titled Identification of Microbial Dark Matter in Antarctic Environments. I thought that it would be interesting to see how well different Antarctic environments are represented by the available completed genomes (not very was my initial guess), got a little bored at the ISME meeting this summer, and had a go at it.

My approach was to find as many Antarctic 16S rRNA gene sequence datasets as I could on the NCBI SRA (Illumina MiSeq only), reanalyze them using consistent QC and denoising (dada2), and apply our paprica pipeline to see how well the environmental 16S rRNA sequence reads match the full-length reads in a recent build of the paprica database.

First things first, however, it was interesting to see 1) how poorly distributed the available Illumina libraries were around the Antarctic continent, and 2) just how many bad, incomplete, and incorrect submissions exist in SRA. 90 % of the effort on this project was invested in culling my list of projects, tracking down incorrect or erroneous lat/longs, sequence files that weren’t demultiplexed, etc. The demultiplexing issue is particularly irritating as I suspect it results purely from laziness. Of course the errors extend to some of my own data and I was chagrined to see that the accession number in our 2017 paper on microbial transport in the McMurdo Sound region is incorrect. Clearly we can all do better.

The collection locations for 16S rRNA libraries available on the NCBI SRA. From Bowman, 2018. Note the concentration of samples near major research bases along the western Antarctic Peninsula, in Prydz Bay, and at McMurdo Sound.

In the end I ended up with 1,810 libraries that I felt good about, and that could be loosely grouped into the environments shown in the figure above. To get a rough idea of how well each library was represented by genomes in the paprica database I used the map ratio value calculated within paprica by Guppy. The map ratio is the fraction of bases in a query read that match the reference read within the region of alignment. This is a pretty unrefined way to assess sequence similarity, but it’s fast and easy to interpret. My analysis looked at the map ratio value for 1) individual unique reads, 2) samples, and 3) environments. One way to think about #1 is represented by the figure below:

Read map ratio as a function of read abundance for A) Bacteria and B) Archaea, calculated individually for all libraries. The orange lines define an arbitrary cutoff for reads that are reasonably abundant, but have very low map ratios (meaning we should probably think about getting those genomes).

What these plots tell us is that most unique reads were reasonably well represented by the 16S rRNA genes associated with complete genomes (> 80 % map ratio, which is still pretty distant genetically speaking!), however, there are quite a lot of reasonably abundant reads with much lower map ratios (looking at this now it seems painfully obvious that I should have used relative abundance. Oh well).

I didn’t make an effort to track down all the completed genomes associated with Antarctic strains – if that’s even possible – but there is a known deficit of psychrophile genomes. Given that Antarctica tends to be chilly I’ll hazard a guess that there aren’t many complete bacterial or archaeal genomes from Antarctica isolates or metagenomes. Given the novelty of many Antarctic environments, and the number of microbiologists that do work in Antarctica, I’m a little surprised by this. Also kind of excited, however, thinking about how we might solve this for the future…

Posted in Uncategorized | Leave a comment

AbSciCon session on life in high salt habitats

Abstract submissions are open for AbSciCon 2019!  You can check out the full selection of sessions here, however, I’d like to draw your attention toward the session Salty Goodness: Understanding life, biosignature preservation, and brines in the Solar System.  This session targets planetary scientists and microbiologists (and everyone in between), and we welcome submissions on any aspect of brines and habitability.  Full text follows, help us out by sharing this post widely!

Pure liquid water is only stable in a small fraction of the Solar System; however, salty aqueous solutions (i.e., brines) are more broadly stable. These brine systems however, prove to be some of the most challenging environments for microorganisms, where biology must overcome extreme osmotic stresses, low water activities, chemical toxicity, and depending on the location of the environment, temperature extremes, UV radiation, and intense pressure. Despite these stressors, hypersaline environments on Earth host an astounding diversity of micro- and macroorganisms. With worlds like Mars, Ceres, and outer Solar System Ocean worlds showing the potential for present-day brines, and with upcoming missions to Europa, it is timely to elucidate the potential for such aqueous systems to sustain and support life as well as the stability of these systems on host worlds.

This session is intended to encourage multidisciplinary and cross planetary discussions focused on the phase space of habitability within brines. We seek to discuss 1) the potential and stability of brines on host worlds through both laboratory and modeling experiments, 2) microbial ecology and adaptations to brines, 3) the effects of water activity and chaotropicity on habitability, 4) the ability of hypersaline systems to preserve biomolecules and 5) techniques and technology needed to detect biosignatures in these unique systems.

Posted in Uncategorized | Leave a comment

Tutorial: Nanopore Analysis Pipeline

Introduction

Hi! I’m Sabeel Mansuri, an Undergraduate Research Assistant for the Bowman Lab at the Scripps Institute of Oceanography, University of California San Diego. The following is a tutorial that demonstrates a pipeline used to assemble and annotate a bacterial genome from Oxford Nanopore MinION data.

This tutorial will require the following (brief installation instructions are included below):

  1. Canu Assembler
  2. Bandage
  3. Prokka
  4. Barrnap
  5. DNAPlotter (alternatively circos)

Software Installation

Canu

Canu is a packaged correction, trimming, and assembly program that is forked from the Celera assembler codebase. Install the latest release by running the following:

git clone https://github.com/marbl/canu.git
cd canu/src
make

Bandage

Bandage is an assembly visualization software. Install it by visiting this link, and downloading the version appropriate for your device.

Prokka

Prokka is a gene annotation program. Install it by visiting this link, and running the installation commands appropriate for your device.

Barrnap

Barrnap is an rRNA prediction software used by Prokka. Install it by visiting this link, and running the installation commands appropriate for your device.

DNAPlotter

DNAPlotter is a gene annotation visualization software. Install it by visiting this link, and running the installation commands appropriate for your device.

Dataset

Download the nanopore dataset located here. This is an isolate from a sample taken from a local saline lake at South Bay Salt Works near San Diego, California.

The download will provide a tarball. Extract it:

tar -xvf nanopore.tar.gz

This will create a runs_fastq folder containing 8 fastq files containing genetic data.

Assembly

Canu can be used directly on the data without any preprocessing. The only additional information needed is an estimate of the genome size of the sample. For the saline isolate, we estimate 3,000,000 base pairs. Then, use the following Canu command to assemble our data:

canu -nanopore_raw -p test_canu -d test_canu runs_fastq/*.fastq genomeSize=3000000 gnuplotTested=true

A quick description of all flags and parameters:

  • -nanopore_raw – specifies data is Oxford Nanopore with no data preprocessing
  • -p – specifies prefix for output files, use “test_canu” as default
  • -d – specifies directory to run test and output files in, use “test_canu” as default
  • genomeSize – estimated genome size of isolate
  • gnuplotTested – setting to true will skip gnuplot testing; gnuplot is not needed for this pipeline

Running this command will output various files into the test_canu directory. The assembled contigs are located in the test.contigs.fasta file. These contigs can be better visualized using Bandage.

Assembly Visualization

Opening Bandage and a GUI window should pop up. In the toolbar, click File > Load Graph, and select the test.contigs.gfa. You should see something like the following:

This graph reveals that one of our contigs appears to be a whole circular chromosome! A quick comparison with the test.contigs.fasta file reveals this is Contig 1. We extract only this sequence from the contigs file to examine further. Note that the first contig takes up the first 38,673 lines of the file, so use head:

head -n38673 test_canu/test_canu.contigs.fasta >> test_canu/contig1.fasta 

NCBI BLAST

We blast this Contig using NCBI’s nucleotide BLAST database (linked here) with all default options. The top hit is:

Hit: Halomonas sp. hl-4 genome assembly, chromosome: I  
Organism: Halomonas sp. hl-4  
Phylogeny: Bacteria/Proteobacteria/Gammaproteobacteria/Oceanospirillales/Halomonadaceae/Halomonas  
Max score: 65370  
Query cover: 72%  
E value: 0.0  
Ident 87%

It appears this chromosome is the genome of an organism in the genus Halomonas. We may now be interested in the gene annotation of this genome.

Gene Annotation

Prokka will take care of gene annotation, the only required input is the contig1.fasta file.

prokka --outdir circular --prefix test_prokka test_canu/contig1.fasta

The newly created circular directory contains various files with data on the gene annotation. Take a look inside test_prokka.txt for a summary of the annotation. We can take a quick look at the annotation using the DNAPlotter GUI.  For a more customized circular plot use circos.

Summary

The analysis above has taken Oxford Nanopore sequenced data, assembled contigs, identified the closest matching organism, and annotated its genome.

Posted in Computer tutorials | 3 Comments

South Bay Saltworks

This is a quick post of a few photos from our trip to the South Bay Saltworks earlier this week.  Thanks to PhD students Natalia, Emelia, and Srishti for getting up early to go play in the mud, and to Jesse Wilson and Melissa Hopkins for lab-side support!

Getting an early start at one of the lower salinity lakes.

A high salinity lakes with the pink pigmentation clearly visible.  Biology is happening!

A high salinity MgCl2 dominated lake.  It isn’t clear whether anything is living in these lakes – the green pigmentation could be remnants of microbes that lived in a happier time.  Our new OAST project will be further investigating these and other lakes to improve life detection technologies, and better constrain the chemical conditions that are compatible with life.

Srishti and Emelia working very hard at filtering.

Hey Srishti, I think you forgot something!

It will be a long time before we’re done with our analysis for these lakes, but here are a couple of teaser microscope images that reflect the huge difference between an NaCl and MgCl2 dominated lake.

Big, happy bacteria from an NaCl lake at near-saturation.

Same prep applied to an MgCl2 lake.  No sign of large bacterial cells.  There could be life there but it isn’t obvious…

Posted in Uncategorized | Leave a comment

Saturday morning at the office

Sometimes working weekends can be a lot of fun.  Last Saturday morning we carried out the second Scripps Institution of Oceanography visit by undergraduate biology majors from National University for our NSF-funded project CURE-ing Microbes on Ocean Plastics.  We recovered a plastic colonization experiment that we started last month, installed the next iteration of the experiment, and finally replaced the pump intake for our continuous flow membrane inlet mass spec (MIMS).  Many thanks to PhD students Natalia Erazo, Srishti Dasarathy, and Emelia Chamberlain for taking the time to work with the the National University undergraduates, and to Kasia Kenitz in the Barton Lab for the diving assist!  Here are a couple of photo/video highlights from the day.

A short video of the plastic colonization experiment after one month of incubation.  Though there has been some swell it hasn’t been a particularly stormy month.  Despite that the cages that hold our plastic wafers were hanging by a thread!  I need to come up with a better system before the winter storms hit…

Chasing a school of baitfish under the pier after installation.  At the end of the video you can see the shiny new cage with the next set of plastic wafers, and to the right our newly installed pump intake for the MIMS.

Natalia and Srishti tell it like it is to National University students on the SIO pier.

Checking out microbes in the lab after field sampling on the pier.

Posted in Uncategorized | Leave a comment