您的位置:首页 > 其它

从qiime2中导出丰度数据用R语言绘制热图

2020-02-01 04:53 5425 查看

参考:https://www.molecularecologist.com/2013/08/making-heatmaps-with-r-for-microbiome-analysis/

1.数据导出

如图所示,从qiime2中导出物种数据。

除去不必要的sample data, 得到genus水平物种的绝对丰度表,如下图所示:

2. R语言基础

2.1Load libraries

library(gplots)  # for heatmap.2
# to install packages from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Heatplus")
# load the vegan package for hierachical clustering if you want to use distance functions not specified in dist.
library(vegan)
# load the RColorBrewer package for better colour options
library(RColorBrewer)

2.2 Get and Prepare Dataset

all.data <- read.csv("F:\\Zhaolab\\2019SouthChinaSea\\data_from_paper\\SCS20190807\\heatmap\\level-6.csv", row.names = 1, header = T)  # load the data
dim(all.data)
all.data[1:3, 1:4]
data.prop <- all.data/rowSums(all.data)  ## 计算每个样品中各个物种的相对丰度。
data.prop[1:3, 1:3]

3. 绘制热图

一般来说,热图的默认颜色是非常可怕的。我个人很喜欢这个调色板,但是你可以尝试一下,看看什么适合你。

# colorRampPalette is in the RColorBrewer package.  This creates a colour palette that shades from light yellow to red in RGB space with 100 unique colours
scaleyellowred <- colorRampPalette(c("lightyellow", "red"), space = "rgb")(100)

最基础的热图如图所示:

heatmap(as.matrix(data.prop), Rowv = NA, Colv = NA, col = scaleyellowred)


显然,这个热图有很多不足。 第一,genus的标签太挤了,无法阅读。一个解决办法是去掉在这个图中丰度非常低的genera. 让我们来去掉在至少1个样本中reads的相对丰度小于1%的genera。

# determine the maximum relative abundance for each column
maxab <- apply(data.prop, 2, max)
head(maxab)

输出结果为:

k__Bacteria.__.__.__.__.__
0.0018661068
k__Bacteria.p__.c__.o__.f__.g__
0.0013746649
k__Bacteria.p__Acidobacteria.c__AT.s2.57.o__.f__.g__
0.0043736879
k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__BPC015.f__.g__
0.0003220329
k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__iii1.15.f__.g__
0.0167794317
k__Bacteria.p__Acidobacteria.c__Acidobacteria.6.o__iii1.15.f__mb2424.g__
0.0002765181
# remove the genera with less than 1% as their maximum relative abundance
n1 <- names(which(maxab < 0.01))
data.prop.1 <- data.prop[, -which(names(data.prop) %in% n1)]
dim(data.prop.1)
#[1] 24 62

这里还剩下62个genera,意味着很多的taxa的相对丰度非常低。新数据的热图需要做一些额外的调整才能显示所有的标签。

heatmap(t(as.matrix(data.prop.1)), Rowv = NA, Colv = NA, col = scaleyellowred, margins = c(5, 30))

好多了! 现在让我们来增加样本的dendrogram([生物] 系统树图,表示亲缘关系的树状图解)。热图函数将会为你做这个,但是我倾向于自己使用vegan包,因为它有更多的距离度量选择。而且这也意味着你可以做使用整个数据集做层次聚类,而不仅仅是展示丰度最高的物种的heatmap.

# calculate the Bray-Curtis dissimilarity matrix on the full dataset:
data.dist <- vegdist(data.prop, method = "bray")

# Do average linkage hierarchical clustering. Other options are 'complete' or 'single'.
# You'll need to choose the one that best fits the needs of your situation and your data.
row.clus <- hclust(data.dist, "aver")

# make the heatmap with Rowv = as.dendrogram(row.clus)
heatmap(t(as.matrix(data.prop.1)), Rowv = NA, Colv = as.dendrogram(row.clus), col = scaleyellowred, margins = c(5, 20))


你也可以给出现比较频繁的genera增加dendrogram([生物] 系统树图,表示亲缘关系的树状图解)。注意,这个必须在用于热图绘制的相同数据集(减少了genera的数目)上进行。

# you have to transpose the dataset to get the genera as rows
data.dist.g <- vegdist(t(data.prop.1), method = "bray")
col.clus <- hclust(data.dist.g, "aver")
# make the heatmap with Rowv = as.dendrogram(row.clus)
heatmap(t(as.matrix(data.prop.1)),
Rowv = as.dendrogram(col.clus),
Colv = as.dendrogram(row.clus),
col = scaleyellowred,
margins = c(5, 30),
)


对于某些应用场景,上面的热图就足够了,但是很多时候需要揭示与其他因素(如:实验分组、协变量的值)的热图的值。并且,有一个颜色跨度的图例会更好。gplots包中的heatmap.2能够做这些事情。
首先,让我们来对我们的数据集做一些注释(我想说明这是100%的虚构数据,而不是来源于任何的出版物):

# There are 24 samples in the dataset, so let's assign them randomly to one of  fictional categories
var1 <- round(runif(n = 12, min = 1, max = 4))  # this randomly samples from a uniform distribution and rounds the result to an integer value
# change the 1s and 4s to the names of colours:
var1 <- replace(var1, which(var1 == 1), "skyblue")
var1 <- replace(var1, which(var1 == 2), "magenta")
var1 <- replace(var1, which(var1 == 3), "green")
var1 <- replace(var1, which(var1 == 4), "orange")
# if this was real data, you would need the order of the values to be in the same order as the sequencing data e.g.
cbind(row.names(data.prop), var1)

输出结果为:

var1
[1,] "B3_200"      "green"
[2,] "B3_800"      "green"
[3,] "B3_deep"     "green"
[4,] "B3_surface"  "magenta"
[5,] "D3_200"      "magenta"
[6,] "D3_800"      "orange"
[7,] "D3_deep"     "magenta"
[8,] "D3_surface"  "orange"
[9,] "F1_200"      "skyblue"
[10,] "F1_800"      "orange"
[11,] "F1_deep"     "magenta"
[12,] "F1_surface"  "green"
[13,] "G1_200"      "green"
[14,] "G1_800"      "green"
[15,] "G1_deep"     "green"
[16,] "G1_surface"  "magenta"
[17,] "LT2_200"     "magenta"
[18,] "LT2_800"     "orange"
[19,] "LT2_deep"    "magenta"
[20,] "LT2_surface" "orange"
[21,] "X4_200"      "skyblue"
[22,] "X4_800"      "orange"
[23,] "X4_deep"     "magenta"
[24,] "X4_surface"  "green"

接下来,我们可以绘制更加复杂的热图

heatmap.2(t(as.matrix(data.prop.1)),
Rowv = as.dendrogram(col.clus),
Colv = as.dendrogram(row.clus),
col = scaleyellowred,
ColSideColors = var1, # this puts in the annotation for the samples
margins = c(5, 30))


你可能注意到了,

heatmap.2
的一些默认设置看起来赏心悦目。每一列的绿色边界可以通过
trace = "none"
来关闭,比例尺图例中的直方图可以使用
density.info = "none"
删除。你可以增加标签(如
main
,
xlab
ylab
),同时改变图片元素的相对大小(
lhei
lwid
)。在heatmap.2的帮助文档中还有很多非常有帮助的其他选项。

heatmap.2(as.matrix(data.prop.1), Rowv = as.dendrogram(row.clus), Colv = as.dendrogram(col.clus), col = scaleyellowred, RowSideColors = var1, margins = c(11, 5), trace = "none", density.info = "none", xlab = "genera", ylab = "Samples", main = "Heatmap example", lhei = c(2, 8)) # this makes the colour-key legend a little thinner


最后,最不直观的函数是来自Heatplus包的

annHeatmap2
。然而,它允许一次注释多个变量,以及将dendrogram(树状图)着色为cluster的选项。函数中的所有选项被写成了列表,如果你不是经常使用他们,需要花点时间熟悉。

# the annHeatmap2 function needs to be wrapped in the plot function in order to display the results
plot(annHeatmap2(t(as.matrix(data.prop.1)),
# the colours have to be fiddled with a little more than with the other functions.
#With 50 breaks in the heatmap densities, there needs to be 51 colours in the scale palette.
#Error messages from the plotting should help to determine how many colours are needed for a given number of breaks
col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(35),
breaks = 34,
# dendrogram controls how dendrograms are made and can be calculatated witin the function.
#The internal calculation can be overridden and externally calculated dendrograms can be displayed.
dendrogram = list(
Row = list(dendro = as.dendrogram(col.clus)),
Col = list(dendro = as.dendrogram(row.clus))
),
legend = 3, # this puts the colour-scale legend on the plot. The number indicates the side on which to plot it (1 = bottom, 2 = left, 3 = top, 4 = right)
labels = list(row = list(ncol = 24)) # gives more space for the Genus names
))


使用ann子列表添加注释,该子列表以数据框作为输入。与其他函数不同,它可以使用多个变量进行注释,包括连续变量。

ann.dat <- data.frame(var1 = c(rep("cat1", 40), rep("cat2", 80)), var2 = rnorm(12,  mean = 50, sd = 20))
plot(annHeatmap2(t(as.matrix(data.prop.1)),
col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(35),
breaks = 34,
dendrogram = list(
Row = list(dendro = as.dendrogram(col.clus)),
Col = list(dendro = as.dendrogram(row.clus))
),
legend = 3,
labels = list(Row=list(nrow=24), Col=list(nrow=8)),
ann = list(Row = list(data = ann.dat))
))


最后,dedrogram(s)的聚类能够基于聚类的子列表用不同的颜色来区分。

# make a data frame of variables for annotation
ann.dat <- data.frame(var1 = c(rep("cat1", 40), rep("cat2", 80)), var2 = rnorm(12,  mean = 50, sd = 20))
plot(annHeatmap2(t(as.matrix(data.prop.1)),
col = colorRampPalette(c("lightyellow", "red"), space = "rgb")(34),
breaks = 33,
dendrogram = list(
Row = list(dendro = as.dendrogram(col.clus)),
Col = list(dendro = as.dendrogram(row.clus))
),
legend = 3,
labels = list(Row=list(nrow=24), Col=list(nrow=8)),
ann = list(Row = list(data = ann.dat)),
cluster = list(Col = list(cuth = 0.125, col = brewer.pal(8, "Set2")))
))
  • 点赞
  • 收藏
  • 分享
  • 文章举报
LonghaoJia1997 发布了2 篇原创文章 · 获赞 1 · 访问量 1171 私信 关注
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: