.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "python/gallery/explore_differential_gene_exp.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_python_gallery_explore_differential_gene_exp.py: .. _differential-gene-exp: Differential gene expression analysis ===================================== This tutorial showcases one of the basic ways to perform differential gene expression analysis using the atlasapprox-disease API. You will use the ``metadata`` and ``differential_gene_expression`` functions to identify datasets for a specific cell type, analyze gene expression changes in a disease context, and identify frequently occurring differentially expressed genes across datasets. The tutorial uses memory B cells as an example, but you can apply the code to any cell type, disease, or tissue of interest using the API's many features. .. GENERATED FROM PYTHON SOURCE LINES 17-24 Contents -------- - Overview metadata with filters - Perform differential gene expression analysis for a specific cell type across datasets - Find frequently occurring differentially expressed genes - Tips for further exploration .. GENERATED FROM PYTHON SOURCE LINES 26-34 Installation ------------ Install the required packages using `pip`: .. code-block:: bash pip install atlasapprox-disease pandas .. GENERATED FROM PYTHON SOURCE LINES 36-40 Import libraries and initialize the API --------------------------------------- Import the necessary libraries .. GENERATED FROM PYTHON SOURCE LINES 40-46 .. code-block:: Python import atlasapprox_disease as aad import pandas as pd # Initialize the API api = aad.API() .. GENERATED FROM PYTHON SOURCE LINES 47-52 Overview datasets with cell type-specific data --------------------------------------------- One way to start is to use the ``metadata`` function to get an overview of all the data relevant to what you want to explore, such as a cell type, disease, or tissue. In this example, we will focus on datasets related to memory B cells as a simple starting point: .. GENERATED FROM PYTHON SOURCE LINES 52-58 .. code-block:: Python cell_metadata = api.metadata(cell_type="memory B cell") # Display the result cell_metadata .. raw:: html
unique_id dataset_id cell_type tissue_general disease development_stage_general sex cell_count
0 a925cc9db06ddad8450db673a12c769c 0041b9c3-6a49-4bf7-8514-9bc7190067a7 memory B cell skin of body normal adult male 9
1 54cd76493a728a81e3f835b1b461c004 03d5794d-cde9-4769-a1a9-b3899d2b1d87 memory B cell esophagogastric junction normal adult female 84
2 5edd95b990ab65367627bd85b3005b69 03d5794d-cde9-4769-a1a9-b3899d2b1d87 memory B cell esophagogastric junction normal adult male 2
3 c3ca6e7e995ff56d7063a69997096af8 03d5794d-cde9-4769-a1a9-b3899d2b1d87 memory B cell esophagus Barrett esophagus adult female 80
4 70f50e140f68a84d87cc3853e7f08aab 03d5794d-cde9-4769-a1a9-b3899d2b1d87 memory B cell esophagus Barrett esophagus adult male 8
... ... ... ... ... ... ... ... ...
181 453bdd0f78d96b935a8fd217b4ed0cff f01bdd17-4902-40f5-86e3-240d66dd2587 memory B cell exocrine gland normal adult male 4
182 cb49352fd25f7642e90e30b991b05df0 f6dafdd1-d746-407e-8019-4470e02d4cbd memory B cell lung normal adult female 356
183 a43c02c3be1257db4b3636139bfcc403 f6dafdd1-d746-407e-8019-4470e02d4cbd memory B cell lung normal adult male 316
184 0034a338d54a3f31c52fded5366c488d f6dafdd1-d746-407e-8019-4470e02d4cbd memory B cell respiratory system normal adult female 361
185 8d7f5cb441c725b466f7f51aa01b3025 f6dafdd1-d746-407e-8019-4470e02d4cbd memory B cell respiratory system normal adult male 348

186 rows × 8 columns



.. GENERATED FROM PYTHON SOURCE LINES 59-60 The DataFrame contains 186 rows, each representing a unique combination of metadata attributes (e.g. tissue, disease, sex, and development stage) involving memory B cells. .. GENERATED FROM PYTHON SOURCE LINES 63-65 To see the full list of unique diseases without truncation: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: Python cell_metadata.disease.unique() .. rst-class:: sphx-glr-script-out .. code-block:: none array(['normal', 'Barrett esophagus', 'gastric intestinal metaplasia', 'gastritis', 'breast carcinoma', 'invasive ductal breast carcinoma', 'invasive lobular breast carcinoma', 'COVID-19', 'post-COVID-19 disorder', 'common variable immunodeficiency', 'Crohn disease', 'B-cell non-Hodgkin lymphoma', 'influenza'], dtype=object) .. GENERATED FROM PYTHON SOURCE LINES 69-71 As shown, there is a variety of diseases involving memory B cell data, e.g., COVID-19, post-COVID-19 disorder, breast carcinoma, and Crohn disease, which you can explore further. For example, you can select a disease like COVID-19 to perform differential gene expression analysis on memory B cells, as demonstrated in the following sections: .. GENERATED FROM PYTHON SOURCE LINES 73-78 Perform differential gene expression analysis for memory B cells in COVID-19 ---------------------------------------------------------------------------- To understand how memory B cells respond to COVID-19, query the top 10 up- and down-regulated genes (20 in total) across all datasets with diseased and normal conditions. This analysis identifies genes with the most significant expression changes in COVID-19 compared to healthy samples. .. GENERATED FROM PYTHON SOURCE LINES 78-89 .. code-block:: Python df_genes = api.differential_gene_expression( differential_axis = "disease", disease="covid", cell_type="memory B cell", top_n=10 # Top 10 up and down-regulated genes to query ) # Display the results df_genes .. raw:: html
tissue_general cell_type regulation gene unit baseline_expr state_expr baseline_fraction state_fraction metric dataset_id differential_axis state baseline
0 blood IgG memory B cell up HLA-DRB5 cptt 2.201501 7.332582 0.205931 0.831040 0.625109 de2c780c-1747-40bd-9ccf-9588ec186cee disease COVID-19 normal
1 blood memory B cell up HLA-DRB5 cptt 2.234262 7.035844 0.246106 0.826568 0.580462 4c4cd77c-8fee-4836-9145-16562a8782fe disease COVID-19 normal
2 blood IgG-negative class switched memory B cell up HLA-DRB5 cptt 2.965737 7.341037 0.235019 0.811541 0.576522 de2c780c-1747-40bd-9ccf-9588ec186cee disease COVID-19 normal
3 nose memory B cell up RPL17 cptt 2.389493 8.601742 0.409091 0.941176 0.532086 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
4 nose memory B cell up TRAC cptt 0.705393 5.557863 0.181818 0.705882 0.524064 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
175 nose memory B cell down NDUFA1 cptt 3.241598 0.526369 0.477273 0.117647 -0.359626 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
176 nose memory B cell down BBLN cptt 4.268116 1.021769 0.613636 0.235294 -0.378342 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
177 nose memory B cell down DDT cptt 2.917153 0.000000 0.431818 0.000000 -0.431818 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
178 nose memory B cell down HLA-DQA2 cptt 5.280043 0.000000 0.522727 0.000000 -0.522727 edc8d3fe-153c-4e3d-8be0-2108d30f8d70 disease COVID-19 normal
179 blood memory B cell down HLA-DRB5 cptt 3.983789 0.933865 0.811111 0.159544 -0.651567 59b69042-47c2-47fd-ad03-d21beb99818f disease COVID-19 normal

180 rows × 14 columns



.. GENERATED FROM PYTHON SOURCE LINES 90-94 The resulting DataFrame lists the top 10 up- and down-regulated genes for memory B cells in COVID-19 across all relevant datasets. Key columns include gene, regulation, expression and metric (fold change). Up-regulated genes may indicate activation of immune memory or antibody production pathways in response to COVID-19, while down-regulated genes could suggest suppression of other functions. Since the query includes multiple datasets and tissues, variations in gene expression may reflect dataset-specific or tissue-specific differences. .. GENERATED FROM PYTHON SOURCE LINES 97-102 Find frequently occurring differentially expressed genes -------------------------------------------------------- Since memory B cells are present in multiple datasets, identify which genes appear most frequently as top differentially expressed genes across these datasets. This analysis highlights genes consistently affected by COVID-19 in memory B cells. .. GENERATED FROM PYTHON SOURCE LINES 102-110 .. code-block:: Python # Count the frequency of up-regulated genes across datasets up_gene_counts = df_genes[df_genes["regulation"] == "up"]["gene"].value_counts() # Display the results print("Frequency of up-regulated genes across datasets:") print(up_gene_counts) .. rst-class:: sphx-glr-script-out .. code-block:: none Frequency of up-regulated genes across datasets: gene XAF1 4 HLA-DRB5 3 HLA-DQA2 3 LY6E 3 RPS4Y1 3 .. PRDX1 1 ANXA4 1 S100A10 1 S100A11 1 SNHG9 1 Name: count, Length: 64, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 111-114 The output shows the frequency of up-regulated genes across datasets, for example, XAF1 appearing 4 times, NFKBID 3 times, MX1 3 times, and so on. This is how you can use the API to identify genes that frequently appear as top differentially expressed genes in your analysis. You can also explore down-regulated genes or analyze other diseases to compare results across different conditions. .. GENERATED FROM PYTHON SOURCE LINES 116-145 Examples for further exploration -------------------------------- 1. Analyze down-regulated genes: Repeat the frequency analysis for down-regulated genes to identify consistently suppressed pathways. .. code-block:: python down_gene_counts = df_genes[df_genes["regulation"] == "down"]["gene"].value_counts() print(down_gene_counts) 2. Explore other diseases: Use the diseases from the metadata (e.g., influenza) to compare memory B cell responses across conditions. .. code-block:: python df_influenza = api.differential_gene_expression( disease="influenza", cell_type="memory B cell", top_n=10 ) 3. Explore specific tissues: Query differential gene expression for a specific tissue (e.g., kidney) to analyze expression changes across all diseases and cell types in that tissue. .. code-block:: python df_kidney = api.differential_gene_expression( tissue="kidney", top_n=10 ) print(df_kidney) .. GENERATED FROM PYTHON SOURCE LINES 147-155 Next steps ---------- This tutorial introduced differential gene expression analysis with the atlasapprox-disease API. To learn more, explore additional functions like average to retrieve gene expression levels, or dotplot for visualizing expression patterns. Visit the official documentation for further details. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 8.201 seconds) .. _sphx_glr_download_python_gallery_explore_differential_gene_exp.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: explore_differential_gene_exp.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: explore_differential_gene_exp.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: explore_differential_gene_exp.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_