从qiime2中导出丰度数据用R语言绘制热图
参考:https://www.molecularecologist.com/2013/08/making-heatmaps-with-r-for-microbiome-analysis/
1.数据导出
如图所示,从qiime2中导出物种数据。
除去不必要的sample data, 得到genus水平物种的绝对丰度表,如下图所示:
2. R语言基础
2.1Load libraries
library(gplots) # for heatmap.2 # to install packages from Bioconductor: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Heatplus") # load the vegan package for hierachical clustering if you want to use distance functions not specified in dist. library(vegan) # load the RColorBrewer package for better colour options library(RColorBrewer)
2.2 Get and Prepare Dataset
all.data <- read.csv("F:\\Zhaolab\\2019SouthChinaSea\\data_from_paper\\SCS20190807\\heatmap\\level-6.csv", row.names = 1, header = T) # load the data dim(all.data) all.data[1:3, 1:4] data.prop <- all.data/rowSums(all.data) ## 计算每个样品中各个物种的相对丰度。 data.prop[1:3, 1:3]
3. 绘制热图
一般来说,热图的默认颜色是非常可怕的。我个人很喜欢这个调色板,但是你可以尝试一下,看看什么适合你。
# colorRampPalette is in the RColorBrewer package. This creates a colour palette that shades from light yellow to red in RGB space with 100 unique colours scaleyellowred <- colorRampPalette(c("lightyellow", "red"), space = "rgb")(100)
最基础的热图如图所示:
heatmap(as.matrix(data.prop), Rowv = NA, Colv = NA, col = scaleyellowred)
显然,这个热图有很多不足。 第一,genus的标签太挤了,无法阅读。一个解决办法是去掉在这个图中丰度非常低的genera. 让我们来去掉在至少1个样本中reads的相对丰度小于1%的genera。
# determine the maximum relative abundance for each column maxab <- apply(data.prop, 2, max) head(maxab)
输出结果为:
k__Bacteria.__.__.__.__.__ 0.0018661068 k__Bacteria.p__.c__.o__.f__.g__ 0.0013746649 k__Bacteria.p__Acidobacteria.c__AT.s2.57.o__.f__.g__ 0.0043736879 k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__BPC015.f__.g__ 0.0003220329 k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__iii1.15.f__.g__ 0.0167794317 k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__iii1.15.f__mb2424.g__ 0.0002765181
# remove the genera with less than 1% as their maximum relative abundance n1 <- names(which(maxab < 0.01)) data.prop.1 <- data.prop[, -which(names(data.prop) %in% n1)] dim(data.prop.1) #[1] 24 62
这里还剩下62个genera,意味着很多的taxa的相对丰度非常低。新数据的热图需要做一些额外的调整才能显示所有的标签。
heatmap(t(as.matrix(data.prop.1)), Rowv = NA, Colv = NA, col = scaleyellowred, margins = c(5, 30))
好多了! 现在让我们来增加样本的dendrogram([生物] 系统树图,表示亲缘关系的树状图解)。热图函数将会为你做这个,但是我倾向于自己使用vegan包,因为它有更多的距离度量选择。而且这也意味着你可以做使用整个数据集做层次聚类,而不仅仅是展示丰度最高的物种的heatmap.
# calculate the Bray-Curtis dissimilarity matrix on the full dataset: data.dist <- vegdist(data.prop, method = "bray") # Do average linkage hierarchical clustering. Other options are 'complete' or 'single'. # You'll need to choose the one that best fits the needs of your situation and your data. row.clus <- hclust(data.dist, "aver") # make the heatmap with Rowv = as.dendrogram(row.clus) heatmap(t(as.matrix(data.prop.1)), Rowv = NA, Colv = as.dendrogram(row.clus), col = scaleyellowred, margins = c(5, 20))
你也可以给出现比较频繁的genera增加dendrogram([生物] 系统树图,表示亲缘关系的树状图解)。注意,这个必须在用于热图绘制的相同数据集(减少了genera的数目)上进行。
# you have to transpose the dataset to get the genera as rows data.dist.g <- vegdist(t(data.prop.1), method = "bray") col.clus <- hclust(data.dist.g, "aver")
# make the heatmap with Rowv = as.dendrogram(row.clus) heatmap(t(as.matrix(data.prop.1)), Rowv = as.dendrogram(col.clus), Colv = as.dendrogram(row.clus), col = scaleyellowred, margins = c(5, 30), )
对于某些应用场景,上面的热图就足够了,但是很多时候需要揭示与其他因素(如:实验分组、协变量的值)的热图的值。并且,有一个颜色跨度的图例会更好。gplots包中的heatmap.2能够做这些事情。
首先,让我们来对我们的数据集做一些注释(我想说明这是100%的虚构数据,而不是来源于任何的出版物):
# There are 24 samples in the dataset, so let's assign them randomly to one of fictional categories var1 <- round(runif(n = 12, min = 1, max = 4)) # this randomly samples from a uniform distribution and rounds the result to an integer value
# change the 1s and 4s to the names of colours: var1 <- replace(var1, which(var1 == 1), "skyblue") var1 <- replace(var1, which(var1 == 2), "magenta") var1 <- replace(var1, which(var1 == 3), "green") var1 <- replace(var1, which(var1 == 4), "orange")
# if this was real data, you would need the order of the values to be in the same order as the sequencing data e.g. cbind(row.names(data.prop), var1)
输出结果为:
var1 [1,] "B3_200" "green" [2,] "B3_800" "green" [3,] "B3_deep" "green" [4,] "B3_surface" "magenta" [5,] "D3_200" "magenta" [6,] "D3_800" "orange" [7,] "D3_deep" "magenta" [8,] "D3_surface" "orange" [9,] "F1_200" "skyblue" [10,] "F1_800" "orange" [11,] "F1_deep" "magenta" [12,] "F1_surface" "green" [13,] "G1_200" "green" [14,] "G1_800" "green" [15,] "G1_deep" "green" [16,] "G1_surface" "magenta" [17,] "LT2_200" "magenta" [18,] "LT2_800" "orange" [19,] "LT2_deep" "magenta" [20,] "LT2_surface" "orange" [21,] "X4_200" "skyblue" [22,] "X4_800" "orange" [23,] "X4_deep" "magenta" [24,] "X4_surface" "green"
接下来,我们可以绘制更加复杂的热图
heatmap.2(t(as.matrix(data.prop.1)), Rowv = as.dendrogram(col.clus), Colv = as.dendrogram(row.clus), col = scaleyellowred, ColSideColors = var1, # this puts in the annotation for the samples margins = c(5, 30))
你可能注意到了,
heatmap.2的一些默认设置看起来赏心悦目。每一列的绿色边界可以通过
trace = "none"来关闭,比例尺图例中的直方图可以使用
density.info = "none"删除。你可以增加标签(如
main,
xlab和
ylab),同时改变图片元素的相对大小(
lhei和
lwid)。在heatmap.2的帮助文档中还有很多非常有帮助的其他选项。
heatmap.2(as.matrix(data.prop.1), Rowv = as.dendrogram(row.clus), Colv = as.dendrogram(col.clus), col = scaleyellowred, RowSideColors = var1, margins = c(11, 5), trace = "none", density.info = "none", xlab = "genera", ylab = "Samples", main = "Heatmap example", lhei = c(2, 8)) # this makes the colour-key legend a little thinner
最后,最不直观的函数是来自Heatplus包的
annHeatmap2。然而,它允许一次注释多个变量,以及将dendrogram(树状图)着色为cluster的选项。函数中的所有选项被写成了列表,如果你不是经常使用他们,需要花点时间熟悉。
# the annHeatmap2 function needs to be wrapped in the plot function in order to display the results plot(annHeatmap2(t(as.matrix(data.prop.1)), # the colours have to be fiddled with a little more than with the other functions. #With 50 breaks in the heatmap densities, there needs to be 51 colours in the scale palette. #Error messages from the plotting should help to determine how many colours are needed for a given number of breaks col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(35), breaks = 34, # dendrogram controls how dendrograms are made and can be calculatated witin the function. #The internal calculation can be overridden and externally calculated dendrograms can be displayed. dendrogram = list( Row = list(dendro = as.dendrogram(col.clus)), Col = list(dendro = as.dendrogram(row.clus)) ), legend = 3, # this puts the colour-scale legend on the plot. The number indicates the side on which to plot it (1 = bottom, 2 = left, 3 = top, 4 = right) labels = list(row = list(ncol = 24)) # gives more space for the Genus names ))
使用ann子列表添加注释,该子列表以数据框作为输入。与其他函数不同,它可以使用多个变量进行注释,包括连续变量。
ann.dat <- data.frame(var1 = c(rep("cat1", 40), rep("cat2", 80)), var2 = rnorm(12, mean = 50, sd = 20)) plot(annHeatmap2(t(as.matrix(data.prop.1)), col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(35), breaks = 34, dendrogram = list( Row = list(dendro = as.dendrogram(col.clus)), Col = list(dendro = as.dendrogram(row.clus)) ), legend = 3, labels = list(Row=list(nrow=24), Col=list(nrow=8)), ann = list(Row = list(data = ann.dat)) ))
最后,dedrogram(s)的聚类能够基于聚类的子列表用不同的颜色来区分。
# make a data frame of variables for annotation ann.dat <- data.frame(var1 = c(rep("cat1", 40), rep("cat2", 80)), var2 = rnorm(12, mean = 50, sd = 20)) plot(annHeatmap2(t(as.matrix(data.prop.1)), col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(34), breaks = 33, dendrogram = list( Row = list(dendro = as.dendrogram(col.clus)), Col = list(dendro = as.dendrogram(row.clus)) ), legend = 3, labels = list(Row=list(nrow=24), Col=list(nrow=8)), ann = list(Row = list(data = ann.dat)), cluster = list(Col = list(cuth = 0.125, col = brewer.pal(8, "Set2"))) ))
- 点赞
- 收藏
- 分享
- 文章举报
- 【零基础 趣味玩数据】用R语言深度解读十九大报告,十行代码绘制党徽图案词云
- R语言绘制热图Heatmap
- R语言 绘制热图 pheatmap
- D3.js数据可视化(一)——绘制热图(heat map)
- 如何在百度地图上绘制建筑楼块(矢量面)数据并导出为图片
- python数据挖掘学习】十五.Matplotlib调用imshow()函数绘制热图
- Mac版R语言(七)数据导出/输出
- QT串口类的使用,折线图绘制,导出数据到wps表格。crc校验实现
- R语言 分类数据折线图绘制
- R语言数据的导入与导出
- 使用matlab绘制从KEIL memory导出的内存数据
- (KEILv5)使用matlab绘制从KEIL memory导出的内存数据
- R语言绘制热图——pheatmap
- R语言抓取暴风魔镜评论数据并绘制各省市购买量热力图
- R语言绘制热图Heatmap
- 【python数据挖掘课程】十五.Matplotlib调用imshow()函数绘制热图
- TrueTypeFont文件中符号数据导出与绘制
- R语言将数据导出到csv时出现科学计数表示
- R语言抓取pm2.5数据绘制全国pm2.5分布图
- R语言导出数据