atlas_protocol_scripts.tl.CpdbAnalysis#

class atlas_protocol_scripts.tl.CpdbAnalysis(cpdb, adata, *, pseudobulk_group_by, cell_type_column, min_obs=10)#

Class that handles comparative cellphonedb analysis.

Parameters
  • cpdb – pandas data frame with cellphonedb interactions. Required columns: source_genesymbols, target_genesymbol. You can get this from omnipathdb: https://omnipathdb.org/interactions/?fields=sources,references&genesymbols=1&databases=CellPhoneDB

  • adata – Anndata object with the target cells. Will use this to derive mean fraction of expressed cells. Should contain counts in X.

  • pseudobulk_group_by (list[str]) – Pseudobulk is used to compute the mean fraction of expressed cells by patient

  • cell_type_column (str) – Column in anndata that contains the cell-type annotation.

  • min_obs (default: 10) – Only consider samples with at least min_obs cells for pseudobulk analysis.

Methods table#

plot_result(cpdb_res, *[, pvalue_col, ...])

Plot cpdb results as heatmap.

significant_interactions(de_res, *[, ...])

Generates a data frame of differentiall cellphonedb interactions.

Methods#

CpdbAnalysis.plot_result(cpdb_res, *, pvalue_col='fdr', fc_col='log2FoldChange', group_col='group', title='CPDB analysis', aggregate=True, clip_fc_at=(-5, 5), label_limit=100, cluster='dotplot', de_genes_mode='ligand')#

Plot cpdb results as heatmap.

Parameters
  • cpdb_res – result of significant_interactions. May be further filtered or modified.

  • pvalue_col (default: 'fdr') – column in cpdb_res that contains the pvalue of ligands (or receptors) used for the upper panel of the plot

  • fc_col (default: 'log2FoldChange') – column in cpdb_res that contains the log fold change of ligands (or receptors) used for the upper panel of the plot

  • group_col (default: 'group') – column to be used for the y axis of the heatmap

  • title (default: 'CPDB analysis') – main title of the plot

  • aggregate (default: True) – whether to merge multiple targets of the same ligand into a single column

  • clip_fc_at (default: (-5, 5)) – Limit the maximum log fold change at this value

  • label_limit (default: 100) – Maximum length before a gene symbol gets truncated (plays a role when using aggregate=True)

  • cluster (Literal['heatmap', 'dotplot', None] (default: 'dotplot')) – whether to cluster the heatmap or the dotplot or neither

  • de_genes_mode (Literal['ligand', 'receptor'] (default: 'ligand')) – If the list of de genes provided are ligands (default) or receptors. If receptor, will show the dotplot at the top (source are expressed ligands) and the de heatmap at the bottom (target are the DE receptors). Otherwise the other way round.

CpdbAnalysis.significant_interactions(de_res, *, pvalue_col='pvalue', fc_col='log2FoldChange', gene_symbol_col='gene_id', max_pvalue=0.1, min_abs_fc=1, adjust_fdr=True, min_frac_expressed=0.1, de_genes_mode='ligand', complex_policy='explode')#

Generates a data frame of differentiall cellphonedb interactions.

This function will extract all known ligands (or receptors, respectively) from a list of differentially expressed and find all receptors (or ligands, respectively) that are expressed above a certain cutoff in all cell-types.

Parameters
  • de_res (DataFrame) – List of differentially expressed genes

  • pvalue_col (default: 'pvalue') – column in de_res that contains the pvalue or false discovery rate

  • fc_col (default: 'log2FoldChange') – column in de_res that contains the log fold change

  • gene_symbol_col (default: 'gene_id') – column in de_res that contains the gene symbol

  • max_pvalue (default: 0.1) – Only consider genes in de_res with a p-value lower than max_pvalue (after FDR-adjustion)

  • min_abs_fc (default: 1) – Only consider genes in de_res with at least this abs. log fold change

  • adjust_fdr (default: True) – Adjust the false-discovery rate of the pvalues in pvalue_col. The FDR-adjustment happens after the input table is filtered for genes that are in ligand/receptor database.

  • min_frac_expressed (default: 0.1) – Minimum fraction cells that need to express the receptor (or ligand) to be considered a potential interaction

  • de_genes_mode (Literal['ligand', 'receptor'] (default: 'ligand')) – If the list of de genes provided are ligands (default) or receptors. In case of ligand, cell-types that express corresonding receptors above the threshold will be identified. In case of receptor, cell-types that express corresponding ligands above the threshold will be identified.

  • complex_policy (Literal['ignore', 'explode'] (default: 'explode')) –

    How to handle protein:protein complexes. Currently implemented options are

    • ignore: Do nothing, i.e. treat complexes as if they were single genes. This usually means that they will be removed from the result, because there is no corresponding gene symbol (e.g. ITGA8_ITGB1) in the DE genes list or in the anndata object used to compute fractions/gene expression.

    • explode: Split complexes into individual genes, essentially discard the information that the genes form a complex

    Future options could be aggregate, i.e. aggregate metrics of a complex to a single value (e.g. by min as performed in the original CellPhoneDB publication)

Return type

DataFrame