最近我們的空轉單細胞專輯 會更新一篇文章復現學習,這篇文章于2023年發表在高分雜志Cancer Res上,文獻標題為《Spatial Transcriptomics Depict Ligand–Receptor Cross-talk Heterogeneity at the Tumor-Stroma Interface in Long-Term Ovarian Cancer Survivors》。
數據背景和空轉數據讀取以及基本的降維聚類分群見:
代碼保存在著名的codeocean上:https:///capsule/1912679/tree/v1 今天的學習內容如下:
Make heatmap of top N differentially expressed genes for all the clusters
Make heatmap of CAF genes avg_logFC for all CAF clusters
Make heatmap of glutamine metabolic genes avg_logFC for all CAF clusters
Make heatmap of tumor genes avg_logFC for all tumor clusters
文獻中的結果如下:
Supplementary Fig. 1. Differentially expressed genes in spatial transcriptomics clusters. A. The top 20 differentially expressed genes in each cluster
CAF亞群特定基因熱圖:
Figure 3. A, Heatmap of the average log-fold change of CAF gene expression in the stroma clusters
空轉聚類亞群top基因熱圖 之前的運行結果在上面的帖子中:
library(dplyr) library(ggplot2) library(gplots) library(RColorBrewer) #################################################################################### save_dir = '../results' dir1 = paste0(save_dir, '/ST_differentially_expresssed_genes_up_and_down' ) dir1 files = list.files(dir1) files ############# save results to the folder ############### folder = paste0(save_dir, '/heatmap' ) folder if (!file.exists(folder)){ dir.create(folder) } files # 之前的差異分析結果
之前的空轉差異分析結果,見帖子:IF10+空轉文獻復現(一):4例HGSC患者的空轉數據讀取與降維聚類(文末有學習交流群)
[1] "A10_c0.xls" "A10_c1.xls" "A10_c2.xls" "A10_c3.xls" "A10_c4.xls" "A10_c5.xls" "A10_c6.xls" "A12_c0.xls" "A12_c1.xls" "A12_c2.xls" "A12_c3.xls" [12] "A12_c4.xls" "A12_c5.xls" "A12_c6.xls" "A4_c0.xls" "A4_c1.xls" "A4_c2.xls" "A4_c3.xls" "A4_c4.xls" "A4_c5.xls" "A4_c6.xls" "A4_c7.xls" [23] "A5_c0.xls" "A5_c1.xls" "A5_c2.xls" "A5_c3.xls" "A5_c4.xls" "A5_c5.xls"
得到在每個亞群中top差異基因的表達值: genes = c() clusters = c() for (i in seq(1,length(files))){ #i <- 1 fnm = files[i] clu = unlist(strsplit(fnm, '.x' ))[1] tmp1 = read.table(paste0(dir1, '/' ,fnm),header = TRUE, stringsAsFactors = F, check.names=F) tmp1 = tmp1[abs(tmp1 $avg_log2FC )>0.15,] tmp1 = tmp1[tmp1 $p_val <0.01,] genes = c(genes,tmp1 $gene [1:20]) temp = tmp1[order(tmp1 $avg_log2FC ,decreasing = FALSE),] genes = c(genes,temp $gene [1:20]) clusters= c(clusters,clu) } genes = unique(genes) # 所有亞群差異基因去重 df = data.frame(matrix(0, ncol = length(clusters), nrow = length(genes))) row.names(df) = genes colnames(df) = clusters for (i in seq(1,length(files))){ #i <- 1 fnm = files[i] tmp1 = read.table(paste0(dir1, '/' ,fnm),header = TRUE, stringsAsFactors = F, check.names=F) for (j in seq(1,length(genes))){ #j <- 1 if (genes[j] % in % tmp1 $gene ){ df[genes[j],i] = tmp1 $avg_log2FC [tmp1 $gene ==genes[j]] } } } # my_palette <- colorRampPalette(c("blue", "white", "red"))(n = 100) my_palette <- colorRampPalette(brewer.pal(10, "RdBu" ))(256) data = data.matrix(df) head(data)
繪制熱圖: 這里作者有單獨對列即亞群和行即基因進行聚類,使用的斯皮爾曼:
# correlation with complete linkage hr <- hclust(as.dist(1-cor(t(data), method= "spearman" )), method= "complete" ) hc <- hclust(as.dist(1-cor(data, method= "spearman" )), method= "complete" ) data[data<(-1)] = -1 data[data>1] = 1 filename = paste0(folder, '/avg_logFC_heatmap_spearman.pdf' ) pdf(filename,12,12) heatmap.2(data, col = rev(my_palette), Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc), trace= "none" , scale = "none" , density.info= "none" , margins=c(5,6), lhei=c(1,8), lwid=c(1,4), keysize=0.2, key.par = list(cex=0.5), cexRow=0.9,cexCol=1.2,srtCol=90, # rotate column label key.title = 'avg_logFC' , key.xlab = 'avg_logFC' ) dev.off()
結果如下:
CAF亞群中感興趣基因熱圖 拿到特定基因在caf亞群中的表達矩陣:
這些基因的功能可以簡單查詢一下:c('ACTA2','S100A4','FAP','VIM','PDGFRB','PDGFRA','CD36','POSTN','COL1A1')
ACTA2(平滑肌α - 肌動蛋白) :主要在平滑肌細胞中表達,是平滑肌細胞收縮功能的關鍵蛋白,參與血管、消化道等平滑肌器官的正常生理活動,其異常表達可能與血管疾病等有關。 S100A4(S100鈣結合蛋白A4) :是一種鈣結合蛋白,在多種細胞類型中表達,參與細胞的增殖、遷移和分化過程,與腫瘤的侵襲和轉移密切相關,可作為某些腫瘤的生物學標志物。 FAP(成纖維細胞激活蛋白) :主要在腫瘤相關成纖維細胞中表達,參與細胞 - 細胞和細胞 - 基質的相互作用,對腫瘤微環境的形成和腫瘤的生長、侵襲有重要作用,是腫瘤治療的潛在靶點。 VIM(波形蛋白) :是一種中間絲蛋白,主要存在于間充質細胞中,對于維持細胞形態、細胞運動和細胞信號傳導等過程有重要作用,在細胞分化和腫瘤發生發展中發揮關鍵作用。 PDGFRB(血小板源性生長因子受體β) :是血小板源性生長因子的受體,參與細胞的增殖、分化、存活和遷移等多種生物學過程,在組織修復、血管生成和腫瘤發展中起著重要作用。 PDGFRA(血小板源性生長因子受體α) :與PDGFRB類似,也是血小板源性生長因子的受體,參與細胞多種生物學行為的調控,在胚胎發育、組織修復和腫瘤發生等過程中發揮重要作用。 CD36(CD36分子) :是一種跨膜糖蛋白,參與多種生物學過程,包括脂肪酸的攝取和代謝、細胞黏附、炎癥反應等,在動脈粥樣硬化、糖尿病等疾病的發生發展中起著重要作用。 POSTN(periostin) :是一種細胞外基質蛋白,參與細胞黏附、遷移和存活等過程,在組織修復、血管生成和腫瘤發生發展中發揮關鍵作用,可作為某些疾病的診斷標志物和治療靶點。 COL1A1(膠原蛋白α - 1(I型)鏈) :是I型膠原蛋白的主要成分,I型膠原蛋白是細胞外基質的主要成分之一,為細胞提供結構支持和力學穩定性,參與組織的形成和修復,在骨骼、皮膚、肌腱等多種組織中廣泛分布,其異常表達與多種疾病如骨質疏松癥等有關。 ### visualize selected genes for CAF ### genes = c( 'ACTA2' , 'S100A4' , 'FAP' , 'VIM' , 'PDGFRB' , 'PDGFRA' , 'CD36' , 'POSTN' , 'COL1A1' ) # 這個下面的亞群可能需要跟自己的結果核對一下,不一定跟你跑的結果一樣 CAF_clusters = c( "A4_c0" , "A4_c1" , "A4_c3" , "A4_c4" , "A4_c6" , "A4_c7" , "A5_c0" , "A5_c1" , "A5_c2" , "A5_c3" , "A5_c4" , "A5_c5" , "A10_c0" , "A10_c1" , "A10_c4" , "A12_c2" , "A12_c3" , "A12_c4" ) df = data.frame(matrix(0, ncol = length(CAF_clusters), nrow = length(genes))) row.names(df) = genes colnames(df) = CAF_clusters for (i in seq(1,length(CAF_clusters))){ fnm = paste0(dir1, '/' ,CAF_clusters[i], '.xls' ) tmp1 = read.table(fnm, header = TRUE, stringsAsFactors = F, check.names=F) for (j in seq(1,length(genes))){ if (genes[j] % in % tmp1 $gene ){ if (tmp1 $p_val [tmp1 $gene ==genes[j]]<0.01){ df[genes[j],i] = tmp1 $avg_log2FC [tmp1 $gene ==genes[j]] } } } } my_palette <- colorRampPalette(brewer.pal(10, "RdBu" ))(256) data = data.matrix(df) data[data<(-1)] = -1 data[data>1] = 1 head(data)
繪圖:
filename = paste0(folder, '/avg_logFC_CAF_genes.pdf' ) pdf(filename,11,11) heatmap.2(data, col = rev(my_palette), dendrogram = 'row' , Colv= FALSE, Rowv= TRUE, distfun = dist, hclustfun = hclust, trace= "none" , scale = "none" , density.info= "none" , margins=c(5,7), lhei=c(1,6), lwid=c(1,4), keysize=0.4, key.par = list(cex=0.5), cexRow=1.1,cexCol=1,srtCol=90, # rotate column label key.title = 'avg_logFC' , key.xlab = 'avg_logFC' ) dev.off()
今天分享到這~(note:額,感覺文獻中的圖出來不是作者給的這個代碼啊,文獻中的圖明顯好看一些啊,有一些基因表達也會有一點出入,可能我的分析結果跟作者的會有差別)
(未完待續)