Integrative Computational Framework for Linking Cell Surface Proteins to Downstream Transcriptional Programs in Cells
The composition of tumor microenvironment has an important impact on treatment response. Hence, we aim to develop a “systems biology” approach where single-cell multi-omics data collected from patients’ tumors are combined with our novel computer framework for systems-level characterization of the tumor microenvironment and the underlying regulatory mechanisms associated with therapy resistance. To achieve this, we will link surface proteins to downstream transcriptional response through integration of transcriptomic and proteomic data from human cancer specimens at the single-cell level by utilizing single-cell multi-omics technologies and machine learning strategies. Together, these approaches will give us unprecedented knowledge of the underlying causes of therapy resistance and new combination therapy strategies.
Complex signaling and transcriptional programs control the development and physiology of specialized cell types. Genetic perturbations in these programs cause human cancers to arise from a diverse set of specialized cell types and developmental states. Understanding these complex systems and their potential to drive cancer is critical for the development of immunotherapies and druggable targets.
Recent innovations such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have coupled the relatively sparse single-cell RNA-sequencing (scRNA-seq) signal with robust detection of highly abundant and well-characterized surface proteins using index sorting and barcoded antibodies, providing better cell type discrimination. In this project, we developed the first computational framework, SPaRTAN (Single-cell Proteomic and RNA-based Transcription factor Activity Network), to exploit single-cell proteomic (scADT-seq) and corresponding scRNA-seq datasets, both obtained using CITE-seq, to link the expression of surface proteins with inferred transcription factor (TF) activities. SPaRTAN provides a conceptually novel and mechanistically inspired approach for integrating cell-specific transcriptomic and proteomic data with regulatory genomics resources, providing a significant advance in the modeling of cell-specific signaling and gene regulatory programs. The cell-surface phenotype is well-known to immunologists through flow cytometry, but the signaling downstream of cell-surface receptors/co-receptors drives transcriptional and chromatin state changes. Thus, it is important to connect the "cell-surface phenotype" to downstream transcriptional programs and resulting transcriptomic phenotypes. SPaRTAN models this flow of information at single-cell resolution. We applied SPaRTAN to peripheral blood monocyte cells (PBMC) and malignant mesothelioma CITE-seq datasets to predict the coupling of signaling receptors with context-specific TFs. We validated predictions by prior knowledge, flow cytometry, and immunohistochemical analyses.
Our model pipeline, including preprocessing and downstream analysis, is done in the Python programming language and is compatible with the scVerse and Scanpy ecosystems. Our Python package is open source and available on our GitHub, along with a Jupyter notebook. SPaRTAN greatly enhanced the utility of CITE-seq datasets to reveal TF and cell-surface receptor relationships in diverse cellular states. We will continue expanding our developed framework and apply it to the cancer space.
We deposited our developed model on GitHub and made the repository accessible to the public:
Notice of Award:
Dr. Osmanbeyoglu has received (9/5/22)a Notice of Award (NOA) from the National Institute of Health for continuing work related to this grant.
Computational Methods for Delineating Cell Content-Specific Regulatory Programs.
Signaling-regulated transcription factors (TFs) orchestrate the developmental and differentiation trajectories of cells as well as their activation states. Understanding TF activities at the single-cell level represents a formidable challenge. Single-cell multi-omics technologies now measure different modalities such as RNA, surface proteins, and chromatin states. Moreover, emerging spatial technologies offer highly multiplex profiling of RNAs and proteins, while preserving spatial context of the tissue. Consequently, there is a tremendous need for computational methods that can integrate these measurements and infer the underlying cell type- and state- specific transcriptional programs. In response to this critical need, we developed SPaRTAN (Single-cell Proteomic and RNA based Transcription factor Activity Network) and integrated parallel single-cell proteomic, and transcriptomic data, based on Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) with cis-regulatory information (e.g. TF – target-gene priors) to predict cell-specific TF and surface protein activities. To the best of our knowledge, we are the first group to use CITE-seq data with cis-regulatory information for linking cell-surface receptors to TFs and construct cell-specific signaling linked regulatory programs. My research program develops interpretable machine learning approaches and computational tools to identify and characterize signaling-regulated TFs and spatial transcriptional heterogeneity for more concise understanding of cellular states. Here, we propose to advance our modeling efforts using context-specific chromatin accessibility data and simultaneously extend SPaRTAN to handle multiple cell-types and/or samples using multi-task and interpretable deep learning approaches based on single-cell multi-omics datasets (Goal 1). We will further develop computational methods for delineating spatially-informed cell context-specific transcriptional programs using spatial transcriptomics datasets (Goal 2). These methods will be integrated into software packages to make them widely accessible to the research community. We will exploit our methods to delineate that are both specific to humans and relevant to disease. Together, cell context-specific TF activities proposed frameworks have the potential to fill an important gap in knowledge by defining cell context-specific regulators driving cellular identity, as well as discover new targets and approaches for advancing therapy.