Cell-to-cell communication#
Effective biological functions, such as the immune response, rely heavily on the communication between cells. When cells are dissociated for single-cell sequencing, the spatial relationships between them are disrupted, leading to loss of important information. Consequently, a number of algorithms have been devised to deduce intercellular communication from gene expression data.Typically, these techniques comprise a repository of recognized ligand-receptor pairings and a computation mechanism for assessing these pairings according to their gene expression patterns.
Dimitrov et al. [Dimitrov et al., 2022] conducted a comparative study of 16 ligand-receptor databases and 7 methods, which encompassed Cell-PhoneDB [Efremova et al., 2020], CellChatDB [Jin et al., 2021], and NATMI [Hou et al., 2020]. The LIANA package [Türei et al., 2021] was utilized to provide a unified interface to all methods, and they consolidated the ligand-receptor data from all databases into Omnipathdb. The researchers evaluated the methods and databases in relation to their correlation with spatial proximity and cytokine signaling, due to the abscence of a gold standard dataset that could be used for benchmarking purposes. Despite significant variations observed among both the methods and databases, no single technique emerged as consistently superior to the others. Methods like NicheNet [Browaeys et al., 2019], cytotalk [Hu et al., 2021], and SoptSC [Wang et al., 2019] go a step forward by also factoring in gene regulatory networks within a cell (intracellular communication). This enables the exploitation of the expression of downstream target genes of a receptor for predicting cell-cell interactions.
Initially, the techniques outlined above were developed to evaluate intercellular communication between distinct cell types. To achieve this objective, CellPhoneDB and CellChatDB deploy a permutation test that randomizes cell type labels. While this approach is valuable in comprehending the steady state across different tissues, with the single-cell research domain now embracing perturbation experiments and extensive atlases, it has become necessary to implement methods for assessing varied patterns of intercellular communication. To takle this problem we have incorporated a straightforward technique by considering a ligand-receptor pair as potentially perturbed if there is differential expression in at least one of the ligand and receptor. By using differentially expressed ligands/receptors (from comparisons between conditions) as input, we account for pseudoreplication bias and dropouts of lowly expressed genes.
See also
The Cell-cell communication chapter of the single-cell best practice book [Heumos et al., 2023].
Very recently, LIANA+ [Dimitrov et al., n.d.] and MultiNicheNet [Browaeys et al., n.d.] became available. Both are extensions to the original Liana and NicheNet packages that explicitly address differential cell2cell communication between conditions.
1. Import the required libraries#
import pandas as pd
import scanpy as sc
import atlas_protocol_scripts as aps
2. Load and sanitise the input data#
# Define paths
path_adata = "/home/fotakis/myScratch/atlas_tmp/atlas-integrated-annotated.h5ad"
path_ccdb = "/home/fotakis/myScratch/atlas_tmp/differential_expression/omnipathdb.tsv"
deseq2_path_prefix = "/home/fotakis/myScratch/atlas_tmp/differential_expression/"
# Load in the scRNAseq data
adata = sc.read_h5ad(path_adata)
# Load in the cellChatDB database
ccdb = pd.read_csv(path_ccdb, sep="\t")
# Keep primary sites and LUAD-LUSC conditions only
adata_primary_tumor = adata[(adata.obs["origin"] == "tumor_primary") & (adata.obs["condition"] != "NSCLC NOS")].copy()
# Sanity check
adata_primary_tumor.obs
sample | uicc_stage | sex | ever_smoker | driver_genes | condition | age | patient | tissue | origin | ... | ROS_mutation | origin_fine | study | platform | platform_fine | cell_type_major | batch | _predictions | _leiden | _cell_type_tumor_predicted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAACCTGCATTCTCAT-1_0-9 | Lambrechts_Thienpont_2018_6653_BT1375 | I | male | yes | NaN | LUSC | 60.0 | Lambrechts_Thienpont_2018_6653_7 | lung | tumor_primary | ... | NaN | tumor_primary | Lambrechts_Thienpont_2018 | 10x | 10x_3p_v2 | B cell | NaN | NaN | NaN | NaN |
AAACCTGGTGCGAAAC-1_0-9 | Lambrechts_Thienpont_2018_6653_BT1375 | I | male | yes | NaN | LUSC | 60.0 | Lambrechts_Thienpont_2018_6653_7 | lung | tumor_primary | ... | NaN | tumor_primary | Lambrechts_Thienpont_2018 | 10x | 10x_3p_v2 | T cell regulatory | NaN | NaN | NaN | NaN |
AAACCTGTCAGCGACC-1_0-9 | Lambrechts_Thienpont_2018_6653_BT1375 | I | male | yes | NaN | LUSC | 60.0 | Lambrechts_Thienpont_2018_6653_7 | lung | tumor_primary | ... | NaN | tumor_primary | Lambrechts_Thienpont_2018 | 10x | 10x_3p_v2 | Tumor cells | NaN | NaN | NaN | NaN |
AAACGGGCAGTCACTA-1_0-9 | Lambrechts_Thienpont_2018_6653_BT1375 | I | male | yes | NaN | LUSC | 60.0 | Lambrechts_Thienpont_2018_6653_7 | lung | tumor_primary | ... | NaN | tumor_primary | Lambrechts_Thienpont_2018 | 10x | 10x_3p_v2 | T cell CD8 | NaN | NaN | NaN | NaN |
AAAGATGAGACTTTCG-1_0-9 | Lambrechts_Thienpont_2018_6653_BT1375 | I | male | yes | NaN | LUSC | 60.0 | Lambrechts_Thienpont_2018_6653_7 | lung | tumor_primary | ... | NaN | tumor_primary | Lambrechts_Thienpont_2018 | 10x | 10x_3p_v2 | Club | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13850_5-18 | UKIM-V_P3_tumor_primary | I | male | yes | NaN | LUAD | 64.0 | UKIM-V_P3 | lung | tumor_primary | ... | NaN | tumor_primary | UKIM-V | BD-Rhapsody | BD-Rhapsody | Neutrophils | NaN | NaN | NaN | NaN |
790899_5-18 | UKIM-V_P3_tumor_primary | I | male | yes | NaN | LUAD | 64.0 | UKIM-V_P3 | lung | tumor_primary | ... | NaN | tumor_primary | UKIM-V | BD-Rhapsody | BD-Rhapsody | Neutrophils | NaN | NaN | NaN | NaN |
451020_5-18 | UKIM-V_P3_tumor_primary | I | male | yes | NaN | LUAD | 64.0 | UKIM-V_P3 | lung | tumor_primary | ... | NaN | tumor_primary | UKIM-V | BD-Rhapsody | BD-Rhapsody | Neutrophils | NaN | NaN | NaN | NaN |
424575_5-18 | UKIM-V_P3_tumor_primary | I | male | yes | NaN | LUAD | 64.0 | UKIM-V_P3 | lung | tumor_primary | ... | NaN | tumor_primary | UKIM-V | BD-Rhapsody | BD-Rhapsody | Neutrophils | NaN | NaN | NaN | NaN |
215691_5-18 | UKIM-V_P3_tumor_primary | I | male | yes | NaN | LUAD | 64.0 | UKIM-V_P3 | lung | tumor_primary | ... | NaN | tumor_primary | UKIM-V | BD-Rhapsody | BD-Rhapsody | Neutrophils | NaN | NaN | NaN | NaN |
25892 rows × 38 columns
3. Define the Immune cell types#
# Define immune cell types (to be used for ploting)
immune_cells = [
"B cell",
"cDC1",
"cDC2",
"DC mature",
"Macrophage",
"Macrophage alveolar",
"Mast cell",
"Monocyte",
"Neutrophils",
"NK cell",
"pDC",
"Plasma cell",
"T cell CD4",
"T cell CD8",
"T cell regulatory",
]
4. Perform the cell-to-cell communication analysis#
# Sanity check
ccdb
source | target | source_genesymbol | target_genesymbol | is_directed | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | sources | references | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Q9Y219 | P46531 | JAG2 | NOTCH1 | 1 | 1 | 1 | 1 | 1 | 0 | Baccin2019;CellCall;CellChatDB;CellPhoneDB;Cel... | Baccin2019:1100613311006130;CellChatDB:2235346... |
1 | O00548 | P46531 | DLL1 | NOTCH1 | 1 | 1 | 0 | 1 | 1 | 0 | Baccin2019;CellCall;CellChatDB;CellPhoneDB;Cel... | Baccin2019:1006133;Baccin2019:98194281;CellCha... |
2 | P05019 | P08069 | IGF1 | IGF1R | 1 | 1 | 0 | 1 | 1 | 0 | Baccin2019;CA1;CellCall;CellChatDB;CellPhoneDB... | Baccin2019:1852007;Baccin2019:2877871;CA1:8408... |
3 | P78504 | P46531 | JAG1 | NOTCH1 | 1 | 1 | 1 | 1 | 1 | 0 | ACSN;Baccin2019;BioGRID;CellCall;CellChatDB;Ce... | ACSN:22330899;ACSN:22363130;Baccin2019:7697721... |
4 | P41221 | Q14332 | WNT5A | FZD2 | 1 | 1 | 1 | 1 | 1 | 0 | Baccin2019;CellCall;CellChatDB;CellPhoneDB_Cel... | Baccin2019:9389482;CellTalkDB:24032637;Cellink... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1889 | Q9BZZ2 | P16150 | SIGLEC1 | SPN | 1 | 0 | 0 | 0 | 0 | 0 | Baccin2019;CellChatDB;CellPhoneDB;HPRD;Wang;co... | CellChatDB:11238599;HPRD:11238599;connectomeDB... |
1890 | P04216 | COMPLEX:P05107_P11215 | THY1 | ITGAM_ITGB2 | 1 | 0 | 0 | 0 | 0 | 0 | CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... | CellChatDB:15850796;Cellinker:15850796 |
1891 | P04216 | COMPLEX:P05107_P20702 | THY1 | ITGAX_ITGB2 | 1 | 0 | 0 | 0 | 0 | 0 | CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... | CellChatDB:15850796;Cellinker:15850796;Cellink... |
1892 | P04216 | COMPLEX:P05106_P06756 | THY1 | ITGAV_ITGB3 | 1 | 0 | 0 | 0 | 0 | 0 | CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... | CellChatDB:18346467;CellChatDB:29212879;Cellin... |
1893 | Q9H7M9 | Q5DX21 | VSIR | IGSF11 | 1 | 1 | 0 | 1 | 1 | 0 | CellChatDB;Cellinker | Cellinker:32589946 |
1894 rows × 12 columns
# Run the analysis function (using the receptor ligand-list and the LUAD-LUSC data)
ccdba = aps.tl._cell2cell.CpdbAnalysis(
ccdb,
adata_primary_tumor,
pseudobulk_group_by=["patient"],
cell_type_column="cell_type_major",
)
5. Use the DE results to infer cell-to-cell communications#
# Load in DE results (using LUAD as reference)
de_res_tumor_cells_luad_lusc = (
pd.read_csv(
(deseq2_path_prefix + "/IHWallGenes.tsv").format(comparison="luad_lusc"),
sep=",",
)
.fillna(1)
.pipe(aps.tl._fdr.fdr_correction)
.assign(group="LUAD")
)
# Sanity check
de_res_tumor_cells_luad_lusc
gene_id | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | weight | fdr | group | |
---|---|---|---|---|---|---|---|---|---|---|
0 | KHDC1L | 141.381429 | -28.881840 | 2.983734 | -9.679764 | 3.675625e-22 | 5.832158e-18 | 1.120682 | 6.556213e-18 | LUAD |
1 | TMPRSS11A | 90.227543 | -28.253740 | 3.184618 | -8.871940 | 7.188262e-19 | 5.724042e-15 | 1.116533 | 6.410852e-15 | LUAD |
2 | SMOC1 | 867.322568 | 23.208725 | 2.727486 | 8.509199 | 1.751387e-17 | 9.263156e-14 | 1.120682 | 1.041316e-13 | LUAD |
3 | NEFH | 276.263273 | 23.864643 | 3.075381 | 7.759899 | 8.499738e-15 | 4.511141e-11 | 0.837606 | 3.790246e-11 | LUAD |
4 | IL37 | 222.427307 | 23.570012 | 3.136695 | 7.514283 | 5.722354e-14 | 2.391063e-10 | 0.851127 | 2.041393e-10 | LUAD |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17832 | TRGV10 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000 | 1.000000e+00 | LUAD |
17833 | TRGV3 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000 | 1.000000e+00 | LUAD |
17834 | TRGV4 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000 | 1.000000e+00 | LUAD |
17835 | TRGV5 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000 | 1.000000e+00 | LUAD |
17836 | VAV3-AS1 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+00 | 1.000000e+00 | 1.000000 | 1.000000e+00 | LUAD |
17837 rows × 10 columns
# Save the results
ccdb_res = ccdba.significant_interactions(de_res_tumor_cells_luad_lusc, max_pvalue=0.1)
6. Visualisation#
# Load in the results
ccdb_res = ccdba.significant_interactions(de_res_tumor_cells_luad_lusc, max_pvalue=0.1)
# Keep only the immune cells (we'll use this for plotting)
ccdb_res = ccdb_res.loc[lambda x: x["cell_type_major"].isin(immune_cells)]
# Create a list of unique top-upregulated genes
top_genes = (
ccdb_res.loc[:, ["source_genesymbol", "fdr"]]
.drop_duplicates()
.sort_values("fdr")["source_genesymbol"][:30]
.tolist()
)
# Plot the results
ccdba.plot_result(
ccdb_res.loc[lambda x: x["source_genesymbol"].isin(top_genes)],
title="LUAD vs LUSC: tumor cells, top 30 DE ligands",
aggregate=False,
cluster="heatmap",
label_limit=80,
)