Weighted Gene Correlation Network Analysis (WGCNA) Applied to Microbial Communities

Weighted gene correlation network analysis (WGCNA) is a powerful network analysis tool that can be used to identify groups of highly correlated genes that co-occur across your samples. Thus genes are sorted into modules and these modules can then be correlated with other traits (that must be continuous variables).

Originally created to assess gene expression data in human patients, the authors of the WGCNA method and R package  have a thorough tutorial with in-depth explanations (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/). More recently, the method has been applied to microbial communities (Duran-Pinedo et al., 2011; Aylward et al., 2015; Guidi et al., 2016; Wilson et al., 2018)–the following is a walk though using microbial sequence abundances and environmental data from my 2018 work (https://www.ncbi.nlm.nih.gov/m/pubmed/29488352/).

Background: WGCNA finds how clusters of genes (or in our case abundances of operational taxonomic units–OTUs) correlates with traits (or in our case environmental variables or biochemical rates) using hierarchical clusters, novel applications of weighted adjacency functions and topological overlap measures, and a dynamic tree cutting method.

Very simply, each OTU is going to be represented by a node in a vast network and the adjacency (a score between 0 and 1) between each set of nodes will be calculated. Many networks use hard-thresholding (where a connection score [e.g. a Pearson Correlation Coefficient] between any two nodes is noted as 1 if it is above a certain threshold and noted as 0 if it is below it). This ignores the actual strength of the connection so WGCNA constructs a weighted gene (or OTU) co-occurrence adjacency matrix in lieu of ‘hard’ thresholding. Because our original matrix has abundance data the abundance of each OTU is also factored in.

For this method to work you also have to select a soft thresholding power (sft) to which each co-expression similarity is raised in order to make these scores “connection strengths”. I used a signed adjacency function:

  • Adjacency = 0.5*(1+Pearson correlation)^sft

because it preserves the sign of the connection (whether nodes are positively or negatively correlated) and this is recommendation by authors of WGCNA.

You pick your soft thresholding value by using a scale-free topology. This is based on the idea that the probability that a node is connected with k other nodes decays as a power law:

  • p(k)~ k^(-γ)

This idea is linked to the growth of networks–new nodes are more likely to be attached to already established nodes. In general, scale-free networks display a high degree of tolerance against errors (Zhang & Horvath, 2005).

You then turn your adjacency matrix into a Topological Overlap Measure (TOM) to minimize the effects of noise and spurious associations. A topological overlap of two nodes also factors in all of their shared neighbors (their relative interrelatedness)–so you are basically taking a simple co-occurrence between two nodes and placing it in the framework of the entire network by factoring in all the other nodes each is connected to. For more information regarding adjacency matrices and TOMs please see Zhang & Horvath (2005) and Langfelder & Horvath (2007 & 2008).

Start: Obtain an OTU abundance matrix (MB.0.03.subsample.fn.txt) and environmental data (OxygenMatrixMonterey.csv).

The OTU abundance matrix simply has all the different OTUs that were observed in a bunch of different samples (denoted in the Group column; e.g. M1401, M1402, etc.). These OTUs represent 16S rRNA sequences that were assessed with the universal primers 515F-Y (5′-GTGYCAGCMGCCGCGGTAA) and 926R (5′-CCGYCAATTYMTTTRAGTTT) and were created using a 97% similarity cutoff. These populations were previously subsampled to the smallest library size and all processing took place in mothur (https://www.mothur.org/). See Wilson et al. (2018) for more details.

The environmental data matrix tells you a little bit more about the different Samples, like the Date of collection, which of two site Locations it was collected from, the Depth or Zone of collection. You also see a bunch of different environmental variables like several different Upwelling indices (for different stations and different time spans), community respiration rate (CR), Oxygen Concentration, and Temperature. Again, see Wilson et al. (2018) for more details.

Code–Initial Stuff:

Read data in:

data<-read.table("MB.0.03.subsample.fn.txt",header=T,na.strings="NA")

For this particular file we have to get rid of first three columns since the OTUs don’t actually start until the 4th column:

data1 = data[-1][-1][-1]

You should turn your raw abundance values into a relative abundance matrix and potentially transform it. I recommend a Hellinger Transformation (a square root of the relative abundance)–this effectively gives low weight to variables with low counts and many zeros. If you wanted you could do the Logarithmic transformation of Anderson et al. (2006) here in stead.

library("vegan", lib.loc="~/R/win-library/3.3")
HellingerData<-decostand(data1,method = "hellinger")

You have to limit the OTUs to the most frequent ones (ones that occur in multiple samples so that you can measure co-occurance across samples). I just looked at my data file and looked for where zeros became extremely common. This was easy because mothur sorts OTUs according to abundance. If you would like a more objective way of selecting the OTUs or if your OTUs are not sorted you then this code may help:

lessdata <- data1[,colSums(data1) > 0.05]

(though you will have to decide what cutoff works best for your data).

Code–Making your relative abundance matrix:

You have to reattach the Group Name column:

RelAbun1 = data.frame(data[2],HellingerData[1:750])

Write file (this step isn’t absolutely necessary, but you may want this file later at some point):

write.table(RelAbun1, file = "MontereyRelAbun.txt", sep="\t")

Code–Visualizing your data at the sample level:

Now load the WGCNA package:

library("WGCNA", lib.loc="~/R/win-library/3.3")

Bring data in:

OTUs<-read.table("MontereyRelAbun.txt",header=T,sep="\t")
dim(OTUs);
names(OTUs);

Turn the first column (sample name) into row names (so that only OTUs make up actual columns):

datExpr0 = as.data.frame((OTUs[,-c(1)]));
names(datExpr0) = names(OTUs)[-c(1)];
rownames(datExpr0) = OTUs$Group;

Check Data for excessive missingness:

gsg = goodSamplesGenes(datExpr0[-1], verbose = 3);
gsg$allOK

You should get TRUE for this dataset given the parameters above. TRUE means that all OTUs have passed the cut. This means that when you limited your OTUs to the most common ones above that you didn’t leave any in that had too many zeros. It is still possible that you were too choosy though. If you got FALSE for your data then you have to follow some other steps that I don’t go over here.

Cluster the samples to see if there are any obvious outliers:

sampleTree = hclust(dist(datExpr0), method = "average");

sizeGrWindow(12,9)

par(cex = 0.6);
par(mar = c(0,4,2,0))

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)

The sample dendrogram doesn’t show any obvious outliers so I didn’t remove any samples. If you need to remove some samples then you have to follow some code I don’t go over here.

Now read in trait (Environmental) data and match with expression samples:

traitData = read.csv("OxygenMatrixMonterey.csv");
dim(traitData)
names(traitData)

Form a data frame analogous to expression data (relative abundances of OTUs) that will hold the Environmental traits:

OTUSamples = rownames(datExpr0);
traitRows = match(OTUSamples, traitData$Sample);
datTraits = traitData[traitRows, -1];
rownames(datTraits) = traitData[traitRows, 1];
collectGarbage()

Outcome: Now your OTU expression (or abundance) data are stored in the variable datExpr0 and the corresponding environmental traits are in the variable datTraits. Now you can visualize how the environmental traits relate to clustered samples.

Re-cluster samples:

sampleTree2 = hclust(dist(datExpr0), method = "average")

Convert traits to a color representation: white means low, red means high, grey means missing entry:

traitColors = numbers2colors(datTraits[5:13], signed = FALSE);

Plot the sample dendrogram and the colors underneath:

plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits[5:13]),
main = "Sample dendrogram and trait heatmap")

Again: white means a low value, red means a high value, and gray means missing entry. This is just initial stuff… we haven’t looked at modules of OTUs that occur across samples yet.

Save:

save(datExpr0, datTraits, file = "Monterey-dataInput.RData")

Code–Network Analysis:

Allow multi-threading within WGCNA. This helps speed up certain calculations.
Any error here may be ignored but you may want to update WGCNA if you see one.

options(stringsAsFactors = FALSE);
enableWGCNAThreads()

Load the data saved in the first part:

lnames = load(file = "Monterey-dataInput.RData");

The variable lnames contains the names of loaded variables:

lnames

Note: You have a couple of options for how you create your weighted OTU co-expression network. I went with the step-by-step construction and module detection. Please see this document for information on the other methods (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/Simulated-05-NetworkConstruction.pdf).

Choose a set of soft-thresholding powers:

powers = c(c(1:10), seq(from = 11, to=30, by=1))

Call the network topology analysis function:
Note: I am using a signed network because it preserves the sign of the connection (whether nodes are positively or negatively correlated); this is recommendation by authors of WGCNA.

sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5, networkType = "signed")

Output:

pickSoftThreshold: will use block size 750.
 pickSoftThreshold: calculating connectivity for given powers...
   ..working on genes 1 through 750 of 750
   Power SFT.R.sq slope truncated.R.sq  mean.k. median.k. max.k.
1      1   0.0299  1.47          0.852 399.0000  400.0000 464.00
2      2   0.1300 -1.74          0.915 221.0000  221.0000 305.00
3      3   0.3480 -2.34          0.931 128.0000  125.0000 210.00
4      4   0.4640 -2.41          0.949  76.3000   73.1000 150.00
5      5   0.5990 -2.57          0.966  47.2000   44.0000 111.00
6      6   0.7010 -2.52          0.976  30.1000   27.1000  83.40
7      7   0.7660 -2.47          0.992  19.8000   17.2000  64.30
8      8   0.8130 -2.42          0.986  13.3000   11.0000  50.30
9      9   0.8390 -2.34          0.991   9.2200    7.1900  40.00
10    10   0.8610 -2.24          0.992   6.5200    4.8800  32.20
11    11   0.8670 -2.19          0.987   4.7000    3.3700  26.20
12    12   0.8550 -2.18          0.959   3.4600    2.3300  21.50

This is showing you the power (soft thresholding value), the r2 for the scale independence for each particular power (we shoot for an r2 higher than 0.8), the mean number of connections each node has at each power (mean.k), the median number of connections/node (median.k), and the maximum number of connections (max.k).

Plot the results:

sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;

Scale-free topology fit index (r2) as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red");

This line corresponds to using an R^2 cut-off of h:

abline(h=0.8,col="red")

Mean connectivity as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

I picked a soft thresholding value of 10 because it was well above an r2 of 0.8 (it is a local peak for the r2) and the mean connectivity is still above 0.

So now we just calculate the adjacencies, using the soft thresholding power of 10:

softPower = 10;
adjacency = adjacency(datExpr0, power = softPower, type = "signed");

Then we transform the adjacency matrix into a Topological Overlap Matrix (TOM) and calculate corresponding dissimilarity:

Remember: The TOM you calculate shows the topological similarity of nodes, factoring in the connection strength two nodes share with other “third party” nodes. This will minimize effects of noise and spurious associations:

TOM = TOMsimilarity(adjacency, TOMType = "signed");
dissTOM = 1-TOM

Create a dendogram using a hierarchical clustering tree and then call the hierarchical clustering function:

TaxaTree = hclust(as.dist(dissTOM), method = "average");

Plot the resulting clustering tree (dendrogram):

sizeGrWindow(12,9)
plot(TaxaTree, xlab="", sub="", main = "Taxa clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04);

This image is showing us the clustering of all 750 OTUs based on the TOM dissimilarity index.

Now you have to decide the optimal module size for your system and should play around with this value a little. I wanted relatively large module so I set the minimum module size relatively high at 30:

minModuleSize = 30;

Module identification using dynamic tree cut (there a couple of different ways to figure out your modules and so you should explore what works best for you in the tutorials by the authors):

dynamicMods = cutreeDynamic(dendro = TaxaTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
table(dynamicMods)

Convert numeric labels into colors:

dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
dynamicColors
    black      blue     brown     green       red turquoise    yellow 
       49       135       113        71        64       216       102

You can see that there are a total of 7 modules (you should have seen that above too) and that now each module is named a different color. The numbers under the colors tells you how many OTUs were sorted into that module. Each OTU is in exactly 1 module, and you can see that if you add up all of the numbers from the various modules you get 750 (the number of OTUs that we limited our analysis to above).

Plot the dendrogram with module colors underneath:

sizeGrWindow(8,6)
plotDendroAndColors(TaxaTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Taxa dendrogram and module colors")

Now we will quantify co-expression similarity of the entire modules using eigengenes and cluster them based on their correlation:
Note: An eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community.

Calculate eigengenes:

MEList = moduleEigengenes(datExpr0, colors = dynamicColors)
MEs = MEList$eigengenes

Calculate dissimilarity of module eigengenes:

MEDiss = 1-cor(MEs);

Cluster module eigengenes:

METree = hclust(as.dist(MEDiss), method = "average");

Plot the result:

sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")

Now we will see if any of the modules should be merged. I chose a height cut of 0.30, corresponding to a similarity of 0.70 to merge:

MEDissThres = 0.30

Plot the cut line into the dendrogram:

abline(h=MEDissThres, col = "red")

You can see that, according to our cutoff, none of the modules should be merged.

If there were some modules that needed to be merged you can call an automatic merging function:

merge = mergeCloseModules(datExpr0, dynamicColors, cutHeight = MEDissThres, verbose = 3)

The merged module colors:

mergedColors = merge$colors;

Eigengenes of the new merged modules:

mergedMEs = merge$newMEs;

If you had combined different modules then that would show in this plot:

sizeGrWindow(12, 9)

plotDendroAndColors(TaxaTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

If we had merged some of the modules that would show up in the Merged dynamic color scheme.

Rename the mergedColors to moduleColors:

moduleColors = mergedColors

Construct numerical labels corresponding to the colors:

colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;

Save module colors and labels for use in subsequent parts:

save(MEs, moduleLabels, moduleColors, TaxaTree, file = "Monterey-networkConstruction-stepByStep.RData")

Code–Relating modules to external information and IDing important taxa:

Here you are going to identify modules that are significantly associate with environmental traits/biogeochemical rates. You already have summary profiles for each module (eigengenes–remeber that an eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community), so we just have to correlate these eigengenes with environmental traits and look for significant associations.

Defining numbers of OTUs and samples:

nTaxa = ncol(datExpr0);
nSamples = nrow(datExpr0);

Recalculate MEs (module eigengenes):

MEs0 = moduleEigengenes(datExpr0, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);

Now we will visualize it:

sizeGrWindow(10,6)

textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));

labeledHeatmap(Matrix = moduleTraitCor,
xLabels = names(datTraits),
yLabels = names(MEs),
ySymbols = names(MEs),
colorLabels = FALSE,
colors = greenWhiteRed(50),
textMatrix = textMatrix,
setStdMargins = FALSE,
cex.text = 0.5,
zlim = c(-1,1),
main = paste("Module-trait relationships"))

Each row corresponds to a module eigengene and each column corresponds to an environmental trait or biogeochemical rate (as long as it is continuous–notice that the categorical variables are gray and say NA). Each cell contains the corresponding Pearson correlation coefficient (top number) and a p-value (in parentheses). The table is color-coded by correlation according to the color legend.

You can see that the Brown module is positively correlated with many indices of upwelling while the Black module is negatively correlated with many indices of upwelling. For this work I was particularly interested in CR and so I focused on modules the positively or negatively correlated with CR. The Red module was negatively associated with CR while the Blue module was positively associated with CR.

Let’s look more at the Red module by quantifying the associations of individual taxa with CR:

First define the variable we are interested in from datTrait:

CR = as.data.frame(datTraits$CR);
names(CR) = "CR"

modNames = substring(names(MEs), 3)
TaxaModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(TaxaModuleMembership), nSamples));
names(TaxaModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
TaxaTraitSignificance = as.data.frame(cor(datExpr0, CR, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(TaxaTraitSignificance), nSamples));
names(TaxaTraitSignificance) = paste("GS.", names(CR), sep="");
names(GSPvalue) = paste("p.GS.", names(CR), sep="");

module = "red"
column = match(module, modNames);
moduleTaxa = moduleColors==module;
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(TaxaModuleMembership[moduleTaxa, column]),
abs(TaxaTraitSignificance[moduleTaxa, 1]),
xlab = paste("Module Membership in", module, "module"),
ylab = "Taxa significance for CR",
main = paste("Module membership vs. Taxa significance\n"),
cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

This graph shows you how each taxa (each red dot is an OTU that belongs in the Red module) correlated with 1) the Environmental trait of interest and 2) how important it is to the module. The taxa/OTUs that have high module membership tend to occur whenever the module is represented in the environment and are therefore often connected throughout the samples with other red taxa/OTUs. In this module, these hubs (Red OTUs that occur with other Red OTUs) are also the most important OTUs for predicting CR.

Now lets get more info about the taxa that make up the Red module:

First, merge the statistical info from previous section (modules with high assocation with trait of interest–e.g. CR or Temp) with taxa annotation and write a file that summarizes these results:

names(datExpr0)
names(datExpr0)[moduleColors=="red"]

You will have to feed in an annotation file–a file listing what Bacteria/Archaea go with each OTU (I am not providing you will this file, but it just had a column with OTUs and a column with the Taxonomy).

annot = read.table("MB.subsample.fn.0.03.cons.taxonomy",header=T,sep="\t");
dim(annot)
names(annot)
probes = names(datExpr0)
probes2annot = match(probes, annot$OTU)

Check for the number or probes without annotation (it should return a 0):

sum(is.na(probes2annot))

Create the starting data frame:

TaxaInfo0 = data.frame(Taxon = probes,
TaxaSymbol = annot$OTU[probes2annot],
LinkID = annot$Taxonomy[probes2annot],
moduleColor = moduleColors,
TaxaTraitSignificance,
GSPvalue)

Order modules by their significance for weight:

modOrder = order(-abs(cor(MEs, CR, use = "p")));

Add module membership information in the chosen order:

for (mod in 1:ncol(TaxaModuleMembership))
{
oldNames = names(TaxaInfo0)
TaxaInfo0 = data.frame(TaxaInfo0, TaxaModuleMembership[, modOrder[mod]],
MMPvalue[, modOrder[mod]]);
names(TaxaInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
paste("p.MM.", modNames[modOrder[mod]], sep=""))
}

Order the OTUs in the geneInfo variable first by module color, then by geneTraitSignificance:

TaxaOrder = order(TaxaInfo0$moduleColor, -abs(TaxaInfo0$GS.CR));
TaxaInfo = TaxaInfo0[TaxaOrder, ]

Write file:

write.csv(TaxaInfo, file = "TaxaInfo.csv")

Here is a bit of the output file I got:

Taxon TaxaSymbol LinkID moduleColor GS.TotalRate p.GS.TotalRate MM.red p.MM.red MM.blue p.MM.blue MM.green p.MM.green MM.brown p.MM.brown MM.turquoise p.MM.turquoise MM.black p.MM.black MM.yellow p.MM.yellow
Otu00711 Otu00711 Otu00711 Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);Surface_4(100); black 0.461005 0.00111 0.005028 0.973244 0.243888 0.098526 -0.07719 0.606075 -0.25274 0.086535 0.058878 0.694233 0.502027 0.000324 0.132412 0.374947
Otu00091 Otu00091 Otu00091 Bacteria(100);Bacteroidetes(100);Flavobacteriia(100);Flavobacteriales(100);Flavobacteriaceae(100);Formosa(100); black 0.378126 0.008778 -0.17243 0.246462 0.446049 0.001676 0.34467 0.017667 -0.55057 6.08E-05 0.492517 0.000437 0.615168 4.20E-06 0.367211 0.011115
Otu00082 Otu00082 Otu00082 Bacteria(100);Bacteroidetes(100);Flavobacteriia(100);Flavobacteriales(100);Flavobacteriaceae(100);NS5_marine_group(100); black -0.35649 0.013911 0.222515 0.132755 -0.06428 0.667734 0.175654 0.237601 -0.45502 0.001312 0.421756 0.003151 0.750195 1.28E-09 0.126362 0.397349
Otu00341 Otu00341 Otu00341 Bacteria(100);Bacteroidetes(100);Cytophagia(100);Cytophagales(100);Flammeovirgaceae(100);Marinoscillum(100); black -0.28242 0.054435 0.023927 0.873162 -0.07555 0.613762 0.144688 0.331879 -0.03144 0.833838 0.035147 0.814565 0.209255 0.158058 -0.06565 0.661083
Otu00537 Otu00537 Otu00537 Bacteria(100);Verrucomicrobia(100);Verrucomicrobiae(100);Verrucomicrobiales(100);Verrucomicrobiaceae(100);Persicirhabdus(100); black -0.23668 0.109211 0.123673 0.40755 -0.17362 0.243171 -0.05738 0.701628 -0.26399 0.072961 0.264411 0.072493 0.425082 0.002897 0.040045 0.789278
Otu00262 Otu00262 Otu00262 Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);Surface_1(100);Candidatus_Pelagibacter(90); black -0.23615 0.110023 0.327396 0.02468 -0.22748 0.12411 -0.13779 0.355699 -0.23708 0.108594 0.271968 0.064409 0.554592 5.23E-05 0.036113 0.809563
Otu00293 Otu00293 Otu00293 Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);SAR11_clade_unclassified(100); black 0.223427 0.131133 0.142016 0.34098 0.209327 0.157912 0.234713 0.112274 -0.53032 0.000126 0.529907 0.000128 0.627937 2.30E-06 0.390714 0.006621
Otu00524 Otu00524 Otu00524 Bacteria(100);Actinobacteria(100);Acidimicrobiia(100);Acidimicrobiales(100);OM1_clade(100);Candidatus_Actinomarina(100); black -0.20559 0.165629 0.28312 0.053809 0.016758 0.910982 0.148756 0.318316 -0.34758 0.016669 0.376043 0.009188 0.494903 0.000406 0.377597 0.00888
Otu00370 Otu00370 Otu00370 Bacteria(100);Verrucomicrobia(100);Opitutae(100);MB11C04_marine_group(100);MB11C04_marine_group_unclassified(100); black -0.20397 0.169074 0.303984 0.037771 -0.06655 0.656707 0.009401 0.949991 -0.25451 0.084275 0.097595 0.514 0.642409 1.13E-06 0.224711 0.128875

 

NOTES on output:

moduleColor is the module that the OTU was ultimately put into

GS stands for Gene Significance (for us it means taxon significance) while MM stands for module membership.

GS.Environmentaltrait = Pearson Correlation Coefficient for that OTU with the trait. GS allows incorporation of external info into the co-expression network by showing gene/OTU significance. The higher the absolute value of GS the more biologically significant the gene (or in our case taxa) to that external variable (e.g. CR).
p.GS.Environmentaltrait = P value for the preceding relationship.

MM.color = Pearson Correlation Coefficient for Module Membership–i.e. how well that OTU correlates with that particular color module (each OTU has a value for each module but only belongs to one module). If close to 0 or negative then the taxa is not part of that color module (since each OTU has to be put in a module you may get some OTUs that are close to 0, but they aren’t important to that module). If it is close to 1 then it is highly connected to that color module, but will be placed with the color module that it is most connected to throughout the samples.
p.MM.color = P value for the preceding relationship.

Modules will be ordered by their significance for the external variable you selected (e.g. CR), with the most significant ones to the left.
Each of the modules (with each OTU assigned to exactly one module) will be represented for the environmental trait you selected.
You will have to rerun this for each environmental trait you are interested in.

 

45271 Total Views 2 Views Today
This entry was posted in Computer tutorials. Bookmark the permalink.

3 Responses to Weighted Gene Correlation Network Analysis (WGCNA) Applied to Microbial Communities

  1. Dinesh says:

    Hi there,
    Thanks a lot. It would be great if you could share code for cytoscape export or sample .csv files (edge and node).

    Regards,
    Dinesh

  2. Avatar photo Jeff says:

    Glad to hear it!

  3. Larry Wilhelm says:

    This is just a note to express our gratitude for this post. We were working through the Horvath tutorial to figure out how to apply it to our epigenetic (dna methylation) data from primate neuronal tissue. This example provided exactly what we needed.

    Thank you! You have facilitated our research significantly.

    Larry Wilhelm
    Rita Cervera Juanes
    Oregon Health Sciences University

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield