Cell-to-cell communication#

Effective biological functions, such as the immune response, rely heavily on the communication between cells. When cells are dissociated for single-cell sequencing, the spatial relationships between them are disrupted, leading to loss of important information. Consequently, a number of algorithms have been devised to deduce intercellular communication from gene expression data.Typically, these techniques comprise a repository of recognized ligand-receptor pairings and a computation mechanism for assessing these pairings according to their gene expression patterns.

Dimitrov et al. [Dimitrov et al., 2022] conducted a comparative study of 16 ligand-receptor databases and 7 methods, which encompassed Cell-PhoneDB [Efremova et al., 2020], CellChatDB [Jin et al., 2021], and NATMI [Hou et al., 2020]. The LIANA package [Türei et al., 2021] was utilized to provide a unified interface to all methods, and they consolidated the ligand-receptor data from all databases into Omnipathdb. The researchers evaluated the methods and databases in relation to their correlation with spatial proximity and cytokine signaling, due to the abscence of a gold standard dataset that could be used for benchmarking purposes. Despite significant variations observed among both the methods and databases, no single technique emerged as consistently superior to the others. Methods like NicheNet [Browaeys et al., 2019], cytotalk [Hu et al., 2021], and SoptSC [Wang et al., 2019] go a step forward by also factoring in gene regulatory networks within a cell (intracellular communication). This enables the exploitation of the expression of downstream target genes of a receptor for predicting cell-cell interactions.

Initially, the techniques outlined above were developed to evaluate intercellular communication between distinct cell types. To achieve this objective, CellPhoneDB and CellChatDB deploy a permutation test that randomizes cell type labels. While this approach is valuable in comprehending the steady state across different tissues, with the single-cell research domain now embracing perturbation experiments and extensive atlases, it has become necessary to implement methods for assessing varied patterns of intercellular communication. To takle this problem we have incorporated a straightforward technique by considering a ligand-receptor pair as potentially perturbed if there is differential expression in at least one of the ligand and receptor. By using differentially expressed ligands/receptors (from comparisons between conditions) as input, we account for pseudoreplication bias and dropouts of lowly expressed genes.

See also

1. Import the required libraries#

import pandas as pd
import scanpy as sc

import atlas_protocol_scripts as aps

2. Load and sanitise the input data#

# Define paths
path_adata = "/home/fotakis/myScratch/atlas_tmp/atlas-integrated-annotated.h5ad"
path_ccdb = "/home/fotakis/myScratch/atlas_tmp/differential_expression/omnipathdb.tsv"
deseq2_path_prefix = "/home/fotakis/myScratch/atlas_tmp/differential_expression/"
# Load in the scRNAseq data
adata = sc.read_h5ad(path_adata)

# Load in the cellChatDB database
ccdb = pd.read_csv(path_ccdb, sep="\t")
# Keep primary sites and LUAD-LUSC conditions only
adata_primary_tumor = adata[(adata.obs["origin"] == "tumor_primary") & (adata.obs["condition"] != "NSCLC NOS")].copy()
# Sanity check
adata_primary_tumor.obs
sample uicc_stage sex ever_smoker driver_genes condition age patient tissue origin ... ROS_mutation origin_fine study platform platform_fine cell_type_major batch _predictions _leiden _cell_type_tumor_predicted
AAACCTGCATTCTCAT-1_0-9 Lambrechts_Thienpont_2018_6653_BT1375 I male yes NaN LUSC 60.0 Lambrechts_Thienpont_2018_6653_7 lung tumor_primary ... NaN tumor_primary Lambrechts_Thienpont_2018 10x 10x_3p_v2 B cell NaN NaN NaN NaN
AAACCTGGTGCGAAAC-1_0-9 Lambrechts_Thienpont_2018_6653_BT1375 I male yes NaN LUSC 60.0 Lambrechts_Thienpont_2018_6653_7 lung tumor_primary ... NaN tumor_primary Lambrechts_Thienpont_2018 10x 10x_3p_v2 T cell regulatory NaN NaN NaN NaN
AAACCTGTCAGCGACC-1_0-9 Lambrechts_Thienpont_2018_6653_BT1375 I male yes NaN LUSC 60.0 Lambrechts_Thienpont_2018_6653_7 lung tumor_primary ... NaN tumor_primary Lambrechts_Thienpont_2018 10x 10x_3p_v2 Tumor cells NaN NaN NaN NaN
AAACGGGCAGTCACTA-1_0-9 Lambrechts_Thienpont_2018_6653_BT1375 I male yes NaN LUSC 60.0 Lambrechts_Thienpont_2018_6653_7 lung tumor_primary ... NaN tumor_primary Lambrechts_Thienpont_2018 10x 10x_3p_v2 T cell CD8 NaN NaN NaN NaN
AAAGATGAGACTTTCG-1_0-9 Lambrechts_Thienpont_2018_6653_BT1375 I male yes NaN LUSC 60.0 Lambrechts_Thienpont_2018_6653_7 lung tumor_primary ... NaN tumor_primary Lambrechts_Thienpont_2018 10x 10x_3p_v2 Club NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13850_5-18 UKIM-V_P3_tumor_primary I male yes NaN LUAD 64.0 UKIM-V_P3 lung tumor_primary ... NaN tumor_primary UKIM-V BD-Rhapsody BD-Rhapsody Neutrophils NaN NaN NaN NaN
790899_5-18 UKIM-V_P3_tumor_primary I male yes NaN LUAD 64.0 UKIM-V_P3 lung tumor_primary ... NaN tumor_primary UKIM-V BD-Rhapsody BD-Rhapsody Neutrophils NaN NaN NaN NaN
451020_5-18 UKIM-V_P3_tumor_primary I male yes NaN LUAD 64.0 UKIM-V_P3 lung tumor_primary ... NaN tumor_primary UKIM-V BD-Rhapsody BD-Rhapsody Neutrophils NaN NaN NaN NaN
424575_5-18 UKIM-V_P3_tumor_primary I male yes NaN LUAD 64.0 UKIM-V_P3 lung tumor_primary ... NaN tumor_primary UKIM-V BD-Rhapsody BD-Rhapsody Neutrophils NaN NaN NaN NaN
215691_5-18 UKIM-V_P3_tumor_primary I male yes NaN LUAD 64.0 UKIM-V_P3 lung tumor_primary ... NaN tumor_primary UKIM-V BD-Rhapsody BD-Rhapsody Neutrophils NaN NaN NaN NaN

25892 rows × 38 columns

3. Define the Immune cell types#

# Define immune cell types (to be used for ploting)
immune_cells = [
    "B cell",
    "cDC1",
    "cDC2",
    "DC mature",
    "Macrophage",
    "Macrophage alveolar",
    "Mast cell",
    "Monocyte",
    "Neutrophils",
    "NK cell",
    "pDC",
    "Plasma cell",
    "T cell CD4",
    "T cell CD8",
    "T cell regulatory",
]

4. Perform the cell-to-cell communication analysis#

# Sanity check
ccdb
source target source_genesymbol target_genesymbol is_directed is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition sources references
0 Q9Y219 P46531 JAG2 NOTCH1 1 1 1 1 1 0 Baccin2019;CellCall;CellChatDB;CellPhoneDB;Cel... Baccin2019:1100613311006130;CellChatDB:2235346...
1 O00548 P46531 DLL1 NOTCH1 1 1 0 1 1 0 Baccin2019;CellCall;CellChatDB;CellPhoneDB;Cel... Baccin2019:1006133;Baccin2019:98194281;CellCha...
2 P05019 P08069 IGF1 IGF1R 1 1 0 1 1 0 Baccin2019;CA1;CellCall;CellChatDB;CellPhoneDB... Baccin2019:1852007;Baccin2019:2877871;CA1:8408...
3 P78504 P46531 JAG1 NOTCH1 1 1 1 1 1 0 ACSN;Baccin2019;BioGRID;CellCall;CellChatDB;Ce... ACSN:22330899;ACSN:22363130;Baccin2019:7697721...
4 P41221 Q14332 WNT5A FZD2 1 1 1 1 1 0 Baccin2019;CellCall;CellChatDB;CellPhoneDB_Cel... Baccin2019:9389482;CellTalkDB:24032637;Cellink...
... ... ... ... ... ... ... ... ... ... ... ... ...
1889 Q9BZZ2 P16150 SIGLEC1 SPN 1 0 0 0 0 0 Baccin2019;CellChatDB;CellPhoneDB;HPRD;Wang;co... CellChatDB:11238599;HPRD:11238599;connectomeDB...
1890 P04216 COMPLEX:P05107_P11215 THY1 ITGAM_ITGB2 1 0 0 0 0 0 CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... CellChatDB:15850796;Cellinker:15850796
1891 P04216 COMPLEX:P05107_P20702 THY1 ITGAX_ITGB2 1 0 0 0 0 0 CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... CellChatDB:15850796;Cellinker:15850796;Cellink...
1892 P04216 COMPLEX:P05106_P06756 THY1 ITGAV_ITGB3 1 0 0 0 0 0 CellChatDB;CellPhoneDB;CellPhoneDB_Cellinker;C... CellChatDB:18346467;CellChatDB:29212879;Cellin...
1893 Q9H7M9 Q5DX21 VSIR IGSF11 1 1 0 1 1 0 CellChatDB;Cellinker Cellinker:32589946

1894 rows × 12 columns

# Run the analysis function (using the receptor ligand-list and the LUAD-LUSC data)
ccdba = aps.tl._cell2cell.CpdbAnalysis(
    ccdb,
    adata_primary_tumor,
    pseudobulk_group_by=["patient"],
    cell_type_column="cell_type_major",
)

5. Use the DE results to infer cell-to-cell communications#

# Load in DE results (using LUAD as reference)
de_res_tumor_cells_luad_lusc = (
    pd.read_csv(
        (deseq2_path_prefix + "/IHWallGenes.tsv").format(comparison="luad_lusc"),
        sep=",",
    )
    .fillna(1)
    .pipe(aps.tl._fdr.fdr_correction)
    .assign(group="LUAD")
)
# Sanity check
de_res_tumor_cells_luad_lusc
gene_id baseMean log2FoldChange lfcSE stat pvalue padj weight fdr group
0 KHDC1L 141.381429 -28.881840 2.983734 -9.679764 3.675625e-22 5.832158e-18 1.120682 6.556213e-18 LUAD
1 TMPRSS11A 90.227543 -28.253740 3.184618 -8.871940 7.188262e-19 5.724042e-15 1.116533 6.410852e-15 LUAD
2 SMOC1 867.322568 23.208725 2.727486 8.509199 1.751387e-17 9.263156e-14 1.120682 1.041316e-13 LUAD
3 NEFH 276.263273 23.864643 3.075381 7.759899 8.499738e-15 4.511141e-11 0.837606 3.790246e-11 LUAD
4 IL37 222.427307 23.570012 3.136695 7.514283 5.722354e-14 2.391063e-10 0.851127 2.041393e-10 LUAD
... ... ... ... ... ... ... ... ... ... ...
17832 TRGV10 0.000000 1.000000 1.000000 1.000000 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 LUAD
17833 TRGV3 0.000000 1.000000 1.000000 1.000000 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 LUAD
17834 TRGV4 0.000000 1.000000 1.000000 1.000000 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 LUAD
17835 TRGV5 0.000000 1.000000 1.000000 1.000000 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 LUAD
17836 VAV3-AS1 0.000000 1.000000 1.000000 1.000000 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 LUAD

17837 rows × 10 columns

# Save the results
ccdb_res = ccdba.significant_interactions(de_res_tumor_cells_luad_lusc, max_pvalue=0.1)

6. Visualisation#

# Load in the results
ccdb_res = ccdba.significant_interactions(de_res_tumor_cells_luad_lusc, max_pvalue=0.1)
# Keep only the immune cells (we'll use this for plotting)
ccdb_res = ccdb_res.loc[lambda x: x["cell_type_major"].isin(immune_cells)]
# Create a list of unique top-upregulated genes
top_genes = (
    ccdb_res.loc[:, ["source_genesymbol", "fdr"]]
    .drop_duplicates()
    .sort_values("fdr")["source_genesymbol"][:30]
    .tolist()
)
# Plot the results
ccdba.plot_result(
    ccdb_res.loc[lambda x: x["source_genesymbol"].isin(top_genes)],
    title="LUAD vs LUSC: tumor cells, top 30 DE ligands",
    aggregate=False,
    cluster="heatmap",
    label_limit=80,
)