Cell-type annotation of “seed” datasets#

scVI and scANVI are variational autoencoders that embed the single-cell expression data into a low-dimensional latent space, while removing batch effects. This is what we will be doing in chapter Integrate data and perform doublet detection. While scVI is an unsupervised method that only considers the count data, scANVI is a “semi-supervised” data that takes into account known cell-type labels of one or multiple datasets.

In an independent benchmark, the semi-supervised variant scANVI has outperformed scVI and other methods for atlas-level data integration [Luecken et al., 2022].

In order to leverage the scANVI algorithm for building the atlas, we are going to prepare cell-type labels for two datasets with very different characteristics:

  • Lambrechts_Thienpont_2018_6653, which has been sequenced using the dropblet-based, UMI-corrected 10x Genomics 3’ v2 protocol

  • Maynard_Bivona_2020, which has been sequenced using the well-based full-length Smart-seq2 protocol [Picelli et al., 2013].

1. Import the required libraries#

import scanpy as sc

2. Load the input data#

lambrechts2018 = sc.read_h5ad("../../data/input_data_zenodo/lambrechts_2018_luad_6653.h5ad")
maynard2020 = sc.read_h5ad("../../data/input_data_zenodo/maynard2020.h5ad")

3. Define and create output directory#

out_dir = "../../results/seed_annotation"
!mkdir -p {out_dir}

3. preprocess each dataset individually#

TODO either based on scVI or just using normalize_total/log1p. Do this once filtering is complete.

4. annotate cell-types for each dataset individually#

Seed datasets can be annotated based on unsupervised clustering and marker genes as shown in section Cell-type annotation. For the sake of this tutorial, we simply re-use the cell-type annotations from [Salcher et al., 2022].

lambrechts2018.obs["cell_type"] = lambrechts2018.obs["cell_type_salcher"]
maynard2020.obs["cell_type"] = maynard2020.obs["cell_type_salcher"]

5. Store annotated AnnData objects#

lambrechts2018.write_h5ad(f"{out_dir}/lambrechts_2018_luad_6653_annotated.h5ad")
maynard2020.write_h5ad(f"{out_dir}/maynard2020_annotated.h5ad")