Note
Go to the end to download the full example code.
Beginner guide
The atlasapprox-disease Python API provides access to over 600 disease-related single-cell datasets. Currently, it includes datasets from the CELLxGENE Census as its initial source, covering diseases such as COVID-19, diabetes, acute kidney failure, and gastritis, along with metadata like cell type, developmental stage, and sex. This API enables users to quickly explore cellular and gene expression patterns in disease contexts.
Follow this tutorial to get started with the basics of using the API.
Installation
(Optional) To ensure consistent dependencies, we recommend setting up a virtual environment:
python -m venv ./venv
source ./venv/bin/activate
Then, install the atlasapprox-disease package using pip:
pip install atlasapprox-disease
Python quick start
Below are 2 examples of common operations you can do with the atlasapprox_disease Python API:
# Import the package and initialise the API
import atlasapprox_disease
api = atlasapprox_disease.API()
Loading atlasapprox_disease from: /home/docs/checkouts/readthedocs.org/user_builds/cell-atlas-approximations-disease-api/checkouts/latest/Python/atlasapprox_disease/__init__.py
Querying cell metadata
The metadata function lets you explore cell metadata across datasets by applying filters
on attributes like tissue, disease, or developmental stage.
For example, the following filters cells from lung tissue at the adult stage:
api.metadata(
tissue="lung",
development_stage="adult"
)
The output is a pandas.DataFrame with over 1400 unique combinations of cell types, diseases,
and other columns such as sex, cell_count and the dataset it comes from.
You can get a quick overview of what cell types and conditions are available
and use this information later for querying other API functions.
Querying average gene expression
The average function retrieves average gene expression across cell types, tissues, and diseases.
For example, to query immune-related genes in COVID-19:
api.average(
features="ACE2,TLR4,NLRP3,MBL2,IL6",
disease="COVID-19"
)
The output is a pandas.DataFrame with columns such as cell_type, tissue_general,
disease, dataset_id, and the expression levels of the queried genes (in counts per 10k).
This helps you explore gene activity in specific conditions and identify key genes
for further analysis.
Next steps
This tutorial introduced the basics of the atlasapprox-disease API.
To learn more, explore additional functions like dotplot for visualizing gene expression,
or query differential gene expression data.
Visit the official documentation for further details.
Total running time of the script: (0 minutes 27.095 seconds)