Next steps#

Now that you built a single-cell atlas and performed some analyses you should consider making it available to a wider audience to maximize its utility. This section contains our suggestions on how to share your data and code and make your analysis reproducible.

Sharing the atlas#

The cell-x-gene platform offers an interactive web browser to explore single-cell datasets. You can request your dataset to be added. Besides the slick web interface, cell-x-gene provides download links in h5ad (scverse) and rds (Seurat) format. Before uploading data to cell-x-gene, the metadata must be reannotated to match standardized ontology terms (e.g. the “cell ontology” for cell-types).

Sharing the model#

The integration with scANVI generated a pre-trained model that can be used to project additional data onto the atlas using scArches as we have shown in scarches . To enable others to use this functionality it is required that you share the scvi model and the AnnData object that was used to generate it. The pre-trained model can be shared via the scvi model hub on huggingface. For more details, see the scvi-hub upload tutorial.

Sharing the code#

Sharing the code for your entire analysis is a prerequiste for others to reproduce your work. We recommend uploading the code on e.g. GitHub.

Sharing the environment#

Single-cell analyses require an increasingly complex environment of software packages. As results may differ slightly between different software versions, for reproducibility it is required to declare the exact software versions used. You can export all dependencies of in a conda environment using

conda env export > environment.yml

To go one step further, you can build a container (e.g. using apptainer) to obtain a single, sharable file that also abstracts the operating system in addition to the software packages.

A note on reproducibility#

Sharing the data, code and environment is a necessary, but not sufficient condition for reproducing an analysis [Heil et al., 2021].

Some single cell analysis algorithms (in particular scVI/scANVI and UMAP) will yield slightly different results on different operating systems and different hardware, trading off computational reproducibility for a significantly faster runtime. In particular, results will differ when changing the number of cores, or when running on a CPU/GPU of a different architecture. See also scverse/scanpy#2014 for a discussion.

To circumvent this, declare the hardware used for the analysis and consider sharing intermediate results, such as the scVI and UMAP embeddings.