學(xué)習(xí)資料來源:
scanpy主頁:https://scanpy./en/stable/ 官網(wǎng):https://scanpy-tutorials./en/latest/spatial/integration-scanorama.html【注意教程有兩個版本,這里是latest版本的學(xué)習(xí)筆記】 這篇教程主要介紹怎么使用scanpy進行多個Visium空間轉(zhuǎn)錄組數(shù)據(jù)分析,以及與單細胞數(shù)據(jù)的整合分析。
教程將使用Scanorama 進行數(shù)據(jù)整合與label transfer。
加載函數(shù)庫
# 安裝包 pip install scanorama# conda 版本:conda install -c bioconda scanorama -y import scanpy as scimport anndata as animport pandas as pdimport numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltimport seaborn as snsimport scanorama sc.logging.print_versions() sc.set_figure_params(facecolor="white" , figsize=(8 , 8 )) sc.settings.verbosity = 3 outdir = '/Pub/Users/project/scanpy/stRNA/'
讀取數(shù)據(jù) 教程將使用兩個小鼠大腦(矢狀)空間數(shù)據(jù),來源:10x genomics website [3]
adata_spatial_anterior = sc.datasets.visium_sge(sample_id="V1_Mouse_Brain_Sagittal_Anterior" ) adata_spatial_posterior = sc.datasets.visium_sge(sample_id="V1_Mouse_Brain_Sagittal_Posterior" ) adata_spatial_anterior.var_names_make_unique() adata_spatial_posterior.var_names_make_unique()
兩個數(shù)據(jù)的情況:
adata_spatial_anterior AnnData object with n_obs × n_vars = 2695 × 32285 obs: 'in_tissue' , 'array_row' , 'array_col' var: 'gene_ids' , 'feature_types' , 'genome' uns: 'spatial' obsm: 'spatial' adata_spatial_posterior AnnData object with n_obs × n_vars = 3355 × 32285 obs: 'in_tissue' , 'array_row' , 'array_col' var: 'gene_ids' , 'feature_types' , 'genome' uns: 'spatial' obsm: 'spatial'
QC并可視化 sc.pp.calculate_qc_metrics(adata_spatial_anterior, inplace=True ) sc.pp.calculate_qc_metrics(adata_spatial_posterior, inplace=True )for name, adata in [ ("anterior" , adata_spatial_anterior), ("posterior" , adata_spatial_posterior), ]: fig, axs = plt.subplots(1 , 4 , figsize=(12 , 3 )) fig.suptitle(f"Covariates for filtering: {name} " ) sns.distplot(adata.obs["total_counts" ], kde=False , ax=axs[0 ]) sns.distplot( adata.obs["total_counts" ][adata.obs["total_counts" ] < 20000 ], kde=False , bins=40 , ax=axs[1 ], ) sns.distplot(adata.obs["n_genes_by_counts" ], kde=False , bins=60 , ax=axs[2 ]) sns.distplot( adata.obs["n_genes_by_counts" ][adata.obs["n_genes_by_counts" ] < 4000 ], kde=False , bins=60 , ax=axs[3 ], ) plt.savefig(outdir + "01-sns.distplot_" +name+".png" )
結(jié)果圖:
anterior:
posterior:
標準化 使用normalize_total
函數(shù)對數(shù)據(jù)進行標準化,然后高變基因選擇。
還可以選擇SCTransform [4] or GLM-PCA [5] 進行數(shù)據(jù)標準化。
for adata in [ adata_spatial_anterior, adata_spatial_posterior, ]: sc.pp.normalize_total(adata, inplace=True ) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, flavor="seurat" , n_top_genes=2000 , inplace=True )
數(shù)據(jù)整合 現(xiàn)在使用Scanorama對兩個數(shù)據(jù)進行整合,Scanorama對每個數(shù)據(jù)集會返回兩個list對象,一個是整合后的embeddings,一個是corrected counts。
Note:空間數(shù)據(jù)也可以使用前面教程分享的 BBKNN [6] or Ingest [7] 進行數(shù)據(jù)整合。
adatas = [adata_spatial_anterior, adata_spatial_posterior] adatas_cor = scanorama.correct_scanpy(adatas, return_dimred=True )
兩個數(shù)據(jù)連接在一起,并保存在整合后的embeddings:adata_spatial.obsm['scanorama_embedding']
中。
此外,我們將計算UMAP,以可視化的結(jié)果和定性評估數(shù)據(jù)整合的結(jié)果。
Note:我們使用un_merge = “unique”
策略連接這兩個數(shù)據(jù)集,以便在連接的 anndata 對象中保持兩個圖像來自 visium 數(shù)據(jù)集。
adata_spatial = sc.concat( adatas_cor, label="library_id" , uns_merge="unique" , keys=[ k for d in [ adatas_cor[0 ].uns["spatial" ], adatas_cor[1 ].uns["spatial" ], ] for k, v in d.items() ], index_unique="-" , ) sc.pp.neighbors(adata_spatial, use_rep="X_scanorama" ) sc.tl.umap(adata_spatial) sc.tl.leiden(adata_spatial, key_added="clusters" ) sc.pl.umap( adata_spatial, color=["clusters" , "library_id" ], palette=sc.pl.palettes.default_20 ) plt.savefig(outdir + "02-spatial.png" )
結(jié)果圖:
我們還可以在空間坐標系中可視化聚類結(jié)果。為此,我們首先需要將cluster顏色保存在字典中。然后我們可以繪制Anterior and Posterior的Visium組織的矢狀圖,并排在一起。
clusters_colors = dict( zip([str(i) for i in range(18 )], adata_spatial.uns["clusters_colors" ]) ) fig, axs = plt.subplots(1 , 2 , figsize=(15 , 10 ))for i, library in enumerate( ["V1_Mouse_Brain_Sagittal_Anterior" , "V1_Mouse_Brain_Sagittal_Posterior" ] ): ad = adata_spatial[adata_spatial.obs.library_id == library, :].copy() sc.pl.spatial( ad, img_key="hires" , library_id=library, color="clusters" , size=1.5 , palette=[ v for k, v in clusters_colors.items() if k in ad.obs.clusters.unique().tolist() ], legend_loc=None , show=False , ax=axs[i], ) plt.tight_layout() plt.savefig(outdir + "02-spatial-1.png" )
結(jié)果圖:從聚類圖中,我們可以清楚地看到兩個組織的皮層分層(參見Allen 腦圖譜 [8] )。此外,由于兩個組織中的簇之間存在明顯的連續(xù)性,因此數(shù)據(jù)集合整合似乎工作得很好。
與單細胞數(shù)據(jù)進行整合 scanorama還可以對 scRNA-seq 數(shù)據(jù)集和空間轉(zhuǎn)錄組學(xué)數(shù)據(jù)集之間進行數(shù)據(jù)整合。這樣的整合特別有用,因為它允許我們將從 scRNA-seq 數(shù)據(jù)集中確定的細胞類型標簽轉(zhuǎn)移到 Visium 數(shù)據(jù)集。
本次將使用來自Tasic et al. [9] 的數(shù)據(jù), 這個數(shù)據(jù)利用 smart-seq 技術(shù)對小鼠皮層進行了分析。
數(shù)據(jù)編號為:GSE115746 ,可以直接下載處理好的h5ad格式:https://hmgubox./f/4ef254675e2a41f89835/?dl=1
由于數(shù)據(jù)集是從小鼠皮層cortex產(chǎn)生的,我們將提取Visium數(shù)據(jù)集的子集,以便只選擇皮層的spots部分。注意,整合也可以在整個大腦切片上進行,但是它會產(chǎn)生假陽性的細胞類型分配,因此應(yīng)該更加謹慎地進行解釋。
dir = '/Pub/Users/project/scanpy/data/GSE115746/' adata_cortex = sc.read(dir+"./adata_processed.h5ad" )
提取子集:cortex
adata_anterior_subset = adata_spatial_anterior[ adata_spatial_anterior.obsm["spatial" ][:, 1 ] < 6000 , : ] adata_posterior_subset = adata_spatial_posterior[ (adata_spatial_posterior.obsm["spatial" ][:, 1 ] < 4000 ) & (adata_spatial_posterior.obsm["spatial" ][:, 0 ] < 6000 ), :, ]
使用Scanorama進行整合:
adatas_anterior = [adata_cortex, adata_anterior_subset] adatas_posterior = [adata_cortex, adata_posterior_subset]# Integration. adatas_cor_anterior = scanorama.correct_scanpy(adatas_anterior, return_dimred=True ) adatas_cor_posterior = scanorama.correct_scanpy(adatas_posterior, return_dimred=True )
連接數(shù)據(jù)集并將整合的embeddings分配到 anndata 對象中:
Note:這里采用join="outer"
與uns_merge="first"
的策略將兩個數(shù)據(jù)連接起來,這是因為我們希望保留visium數(shù)據(jù)的 obsm['coords']
以及 images。
adata_cortex_anterior = sc.concat( adatas_cor_anterior, label="dataset" , keys=["smart-seq" , "visium" ], join="outer" , uns_merge="first" , ) adata_cortex_posterior = sc.concat( adatas_cor_posterior, label="dataset" , keys=["smart-seq" , "visium" ], join="outer" , uns_merge="first" , )
在上一步,我們已經(jīng)將每個visium數(shù)據(jù)集與scRNA-seq數(shù)據(jù)集集成在一個共同的embedding中。在這樣的embedding空間中,我們可以計算樣本之間的距離,并使用這些距離作為權(quán)重,用于將細胞類型標簽從scRNA-seq數(shù)據(jù)集轉(zhuǎn)移到Visium數(shù)據(jù)集。
這種方法與Seurat中的TransferData
函數(shù)非常相似(see paper:https://www./cell/fulltext/S0092-8674(19)30559-8)。 這里,我們用一個簡單的python函數(shù)重新實現(xiàn)了標簽傳 遞函數(shù)。
首先,讓我們在相同的embedding空間中計算visium數(shù)據(jù)集和scRNA-seq數(shù)據(jù)集之間的余弦距離:
from sklearn.metrics.pairwise import cosine_distances distances_anterior = 1 - cosine_distances( adata_cortex_anterior[adata_cortex_anterior.obs.dataset == "smart-seq" ].obsm[ "X_scanorama" ], adata_cortex_anterior[adata_cortex_anterior.obs.dataset == "visium" ].obsm[ "X_scanorama" ], ) distances_posterior = 1 - cosine_distances( adata_cortex_posterior[adata_cortex_posterior.obs.dataset == "smart-seq" ].obsm[ "X_scanorama" ], adata_cortex_posterior[adata_cortex_posterior.obs.dataset == "visium" ].obsm[ "X_scanorama" ], )
然后,讓我們將標簽從scRNA-seq數(shù)據(jù)集傳播到visium數(shù)據(jù)集:
def label_transfer (dist, labels) : lab = pd.get_dummies(labels).to_numpy().T class_prob = lab @ dist norm = np.linalg.norm(class_prob, 2 , axis=0 ) class_prob = class_prob / norm class_prob = (class_prob.T - class_prob.min(1 )) / class_prob.ptp(1 ) return class_prob class_prob_anterior = label_transfer(distances_anterior, adata_cortex.obs.cell_subclass) class_prob_posterior = label_transfer(distances_posterior, adata_cortex.obs.cell_subclass)
class_prob_[anterior-posterior]
對象是一個numpy形狀數(shù)組(cell_type, visium_spots)
,包含每個spot分配給每個細胞類型的權(quán)重。這個值本質(zhì)上告訴我們,從表達值的角度來看,這些spots與來自scRNA-seq數(shù)據(jù)集的所有其他注釋細胞類型有多相似。
將class_prob_[anterior-posterior]
對象轉(zhuǎn)換成dataframe,并分配到對應(yīng)的anndata數(shù)據(jù)中。
cp_anterior_df = pd.DataFrame( class_prob_anterior, columns=np.sort(adata_cortex.obs.cell_subclass.unique()) ) cp_posterior_df = pd.DataFrame( class_prob_posterior, columns=np.sort(adata_cortex.obs.cell_subclass.unique()) ) cp_anterior_df.index = adata_anterior_subset.obs.index cp_posterior_df.index = adata_posterior_subset.obs.index adata_anterior_subset_transfer = adata_anterior_subset.copy() adata_anterior_subset_transfer.obs = pd.concat( [adata_anterior_subset.obs, cp_anterior_df], axis=1 ) adata_posterior_subset_transfer = adata_posterior_subset.copy() adata_posterior_subset_transfer.obs = pd.concat( [adata_posterior_subset.obs, cp_posterior_df], axis=1 )
然后,我們能夠探索細胞類型如何從scRNA-seq數(shù)據(jù)集轉(zhuǎn)移到visium數(shù)據(jù)集。讓我們先看看神經(jīng)元皮層。
sc.pl.spatial( adata_anterior_subset_transfer, img_key="hires" , color=["L2/3 IT" , "L4" , "L5 PT" , "L6 CT" ], size=1.5 , ) plt.savefig(outdir + "03-adata_anterior_subset_transfer.png" ) sc.pl.spatial( adata_posterior_subset_transfer, img_key="hires" , color=["L2/3 IT" , "L4" , "L5 PT" , "L6 CT" ], size=1.5 , ) plt.savefig(outdir + "03-adata_posterior_subset_transfer.png" )
結(jié)果圖如下:
anterior:
posterior:
這種方法似乎是有效的,因為在前矢狀slide和后矢狀slide中,連續(xù)的皮層神經(jīng)元層都可以被正確識別。
同樣,我們可視化星形膠質(zhì)細胞(astrocytes)和少突膠質(zhì)細胞(oligodendrocytes)。
sc.pl.spatial( adata_anterior_subset_transfer, img_key="hires" , color=["Oligo" , "Astro" ], size=1.5 ) plt.savefig(outdir + "03-adata_anterior_subset_transfer_Oligo_Astro.png" ) sc.pl.spatial( adata_posterior_subset_transfer, img_key="hires" , color=["Oligo" , "Astro" ], size=1.5 ) plt.savefig(outdir + "03-adata_posterior_subset_transfer_Oligo_Astro.png" )
結(jié)果圖:
anterior:
posterior:
在本次教程中,我們展示了如何使用scanpy整合分析多張slices切片數(shù)據(jù),展示了注釋后的單細胞數(shù)據(jù)在空間數(shù)據(jù)上的映射,表明,這種利用Scanorama整合數(shù)據(jù)的方法是有用的,并且為探索性分析提供了一個直接的工具。
然而,對于 label transfer 分析,我們建議分析人員探索更有原則的方法,基于細胞類型反卷積,這可能會提供更準確和可解釋的結(jié)果。參見最近的方法,例如:
paper(https://www./content/10.1101/2019.12.13.874495v1) code(https://github.com/almaan/stereoscope) paper(https://www./content/10.1101/2020.02.21.940650v1) code(https://github.com/theislab/AutoGeneS) paper(https://www./articles/s41467-018-08023-x) - code(https://github.com/xuranw/MuSiC) paper(https://www./articles/s41587-019-0114-2) - webtool(https://cibersortx./) code(https://github.com/rosedu1/deconvSeq) paper(https://www./content/10.1101/2020.11.15.378125v1) code(https://github.com/BayraktarLab/cell2location) 參考資料 [1] Scanorama paper: https://www./articles/s41587-019-0113-3
[2] Scanorama code: https://github.com/brianhie/scanorama
[3] 10x genomics website: https://support./spatial-gene-expression/datasets/
[4] SCTransform: https://genomebiology./articles/10.1186/s13059-019-1874-1
[5] GLM-PCA: https://genomebiology./articles/10.1186/s13059-019-1861-6
[6] BBKNN: https://scanpy./en/stable/external/scanpy.external.pp.bbknn.html
[7] Ingest: https://scanpy./en/stable/api/scanpy.tl.ingest.html
[8] Allen 腦圖譜: https://mouse./experiment/thumbnails/100042147?image_type=atlas
[9] Tasic et al.: https://www./articles/s41586-018-0654-5