Beginner guide

The atlasapprox-disease Python API provides access to over 600 disease-related single-cell datasets. Currently, it includes datasets from the CELLxGENE Census as its initial source, covering diseases such as COVID-19, diabetes, acute kidney failure, and gastritis, along with metadata like cell type, developmental stage, and sex. This API enables users to quickly explore cellular and gene expression patterns in disease contexts.

Follow this tutorial to get started with the basics of using the API.

Installation

(Optional) To ensure consistent dependencies, we recommend setting up a virtual environment:

python -m venv ./venv
source ./venv/bin/activate

Then, install the atlasapprox-disease package using pip:

pip install atlasapprox-disease

Python quick start

Below are 2 examples of common operations you can do with the atlasapprox_disease Python API:

# Import the package and initialise the API
import atlasapprox_disease

api = atlasapprox_disease.API()
Loading atlasapprox_disease from: /home/docs/checkouts/readthedocs.org/user_builds/cell-atlas-approximations-disease-api/checkouts/latest/Python/atlasapprox_disease/__init__.py

Querying cell metadata

The metadata function lets you explore cell metadata across datasets by applying filters on attributes like tissue, disease, or developmental stage. For example, the following filters cells from lung tissue at the adult stage:

api.metadata(
    tissue="lung",
    development_stage="adult"
)
unique_id dataset_id cell_type tissue_general disease development_stage_general sex cell_count
0 98ac1a55676e61a854d68c3f3e5f791a 01209dce-3575-4bed-b1df-129f57fbc031 CD4-positive, alpha-beta T cell lung normal adult male 1993
1 01b9218f253b6c07a021b1b4f3954871 01209dce-3575-4bed-b1df-129f57fbc031 CD4-positive, alpha-beta thymocyte lung normal adult male 3056
2 bbff9e63b378470377555ed3f97bedb2 01209dce-3575-4bed-b1df-129f57fbc031 CD8-positive, alpha-beta T cell lung normal adult male 2391
3 2540425305dd88c7aedd91fcebea46d0 01209dce-3575-4bed-b1df-129f57fbc031 CD8-positive, alpha-beta thymocyte lung normal adult male 3350
4 b387d4572395da54759d1b9fdd9b4a66 01209dce-3575-4bed-b1df-129f57fbc031 immature alpha-beta T cell lung normal adult male 171
... ... ... ... ... ... ... ... ...
1413 2430bd747f2aad4f2ead5a33ff9ef3b8 f72958f5-7f42-4ebb-98da-445b0c6de516 type II pneumocyte lung normal adult male 30969
1414 548cac77e1039631066deb4d65bdaa39 f72958f5-7f42-4ebb-98da-445b0c6de516 unknown lung normal adult female 866
1415 bb0f087ae00b96f2552bd1e84e8fa105 f72958f5-7f42-4ebb-98da-445b0c6de516 unknown lung normal adult male 1358
1416 1c4e1440f60ee38746e5200b635337c7 f72958f5-7f42-4ebb-98da-445b0c6de516 vein endothelial cell lung normal adult female 2028
1417 077eee6802a2c541daf0fd33c379f677 f72958f5-7f42-4ebb-98da-445b0c6de516 vein endothelial cell lung normal adult male 5882

1418 rows × 8 columns



The output is a pandas.DataFrame with over 1400 unique combinations of cell types, diseases, and other columns such as sex, cell_count and the dataset it comes from. You can get a quick overview of what cell types and conditions are available and use this information later for querying other API functions.

Querying average gene expression

The average function retrieves average gene expression across cell types, tissues, and diseases. For example, to query immune-related genes in COVID-19:

api.average(
    features="ACE2,TLR4,NLRP3,MBL2,IL6",
    disease="COVID-19"
)
cell_count cell_type tissue_general disease dataset_id ACE2 TLR4 NLRP3 MBL2 IL6
0 42850 B cell blood COVID-19 01ad3cd7-3929-4654-84c0-6db05bd5fd59 0.000000 0.020387 0.014871 0.0 0.246985
1 111297 CD4-positive, alpha-beta T cell blood COVID-19 01ad3cd7-3929-4654-84c0-6db05bd5fd59 0.000000 0.011792 0.016089 0.0 0.000325
2 64766 CD8-positive, alpha-beta T cell blood COVID-19 01ad3cd7-3929-4654-84c0-6db05bd5fd59 0.000000 0.016499 0.018025 0.0 0.000705
3 113753 classical monocyte blood COVID-19 01ad3cd7-3929-4654-84c0-6db05bd5fd59 0.000000 0.636274 0.534583 0.0 0.006524
4 4776 conventional dendritic cell blood COVID-19 01ad3cd7-3929-4654-84c0-6db05bd5fd59 0.000000 0.123744 0.248952 0.0 0.002513
... ... ... ... ... ... ... ... ... ... ...
565 118 mast cell respiratory system COVID-19 f156606a-dd9a-49fd-bc40-0e069b6cf07c 0.000000 0.090158 0.000000 0.0 0.000000
566 1108 mature NK T cell respiratory system COVID-19 f156606a-dd9a-49fd-bc40-0e069b6cf07c 0.000000 0.042362 0.104304 0.0 0.013103
567 7438 myeloid cell respiratory system COVID-19 f156606a-dd9a-49fd-bc40-0e069b6cf07c 0.000000 0.736049 0.948961 0.0 0.164485
568 4 neutrophil respiratory system COVID-19 f156606a-dd9a-49fd-bc40-0e069b6cf07c 0.000000 0.000000 4.633920 0.0 0.000000
569 241 unknown respiratory system COVID-19 f156606a-dd9a-49fd-bc40-0e069b6cf07c 0.001793 0.250800 0.319913 0.0 0.081020

570 rows × 10 columns



The output is a pandas.DataFrame with columns such as cell_type, tissue_general, disease, dataset_id, and the expression levels of the queried genes (in counts per 10k). This helps you explore gene activity in specific conditions and identify key genes for further analysis.

Next steps

This tutorial introduced the basics of the atlasapprox-disease API. To learn more, explore additional functions like dotplot for visualizing gene expression, or query differential gene expression data.

Visit the official documentation for further details.

Total running time of the script: (0 minutes 27.095 seconds)

Gallery generated by Sphinx-Gallery