Skip to main content

SC2SG Research

Collaborating with clinical researchers, the center seeks to bridge the gap between scientific discoveries and real-world clinical applications, ultimately improving patient outcomes. Through workshops, training programs, and collaborations, we also aim to equip the scientific community with the resources and knowledge needed to utilize the latest genomics technologies.

SC2SG Research

Focus Areas

Single-Cell Genomics

Single-Cell Genomics

The center’s research in single-cell genomics focuses on developing cutting-edge statistical and AI-driven tools to analyze and interpret high-dimensional single-cell data. Faculty members have pioneered methods to address key challenges in the field, including removing batch effects to ensure consistent data quality, integrating multi-omics data for a more comprehensive understanding of cellular processes, and automatically annotating cell types. Additionally, their work enables the integration of single-cell data with bulk RNA-seq to infer cell type compositions, leveraging spatial omics data to recover missing cell locations, and inferring circadian phase. These innovative approaches provide powerful insights into cellular heterogeneity and its role in health and disease.

Spatial Omics

The center’s research in spatial omics focuses on developing advanced statistical and AI tools to integrate gene expression data with high-resolution histology images, offering deeper insights into tissue architecture and disease progression. Faculty members have created methods to detect spatial domains and spatially variable genes by combining gene expression with histology, while also leveraging high-resolution images to decipher complex tumor ecosystems. These tools enhance the spatial resolution of gene expression data and enable the inference of super-resolution tissue architecture.

Furthermore, by integrating multi-modal spatial omics data, the center’s methods allow for the detailed reconstruction of fine-grained tissue structures. These innovative approaches have been applied across various human disease studies, providing new insights into disease mechanisms and tissue organization.

Spatial Omics

Software and Computational Tools

DESC (Deep Embedding for Single-cell Clustering)

DESC is an unsupervised deep learning algorithm for clustering scRNA-seq data. The algorithm constructs a non-linear mapping function from the original scRNA-seq data space to a low-dimensional feature space by iteratively learning cluster-specific gene expression representation and cluster assignment based on a deep neural network. This iterative procedure moves each cell to its nearest cluster, balances biological and technical differences between clusters, and reduces the influence of batch effect. DESC also enables soft clustering by assigning cluster-specific probabilities to each cell, which facilitates the identification of cells clustered with high-confidence and interpretation of results.

ItClust (Iterative Transfer learning algorithm for scRNA-seq Clustering)

ItClust is an Iterative Transfer learning algorithm for scRNA-seq Clustering. It starts from building a training neural network to extract gene-expression signatures from a well-labeled source dataset. This step enables initializing the target network with parameters estimated from the training network. The target network then leverages information in the target dataset to iteratively fine-tune parameters in an unsupervised manner, so that the target-data-specific gene-expression signatures are captured. Once fine-tuning is finished, the target network then returns clustered cells in the target data. ItClust has shown to be a powerful tool for scRNA-seq clustering and cell type classification analysis. It can accurately extract information from source data and apply it to help cluster cells in target data. It is robust to strong batch effect between source and target data, and is able to separate unseen cell types in the target. Furthermore, it provides confidence scores that facilitates cell type assignment. With the increasing popularity of scRNA-seq in biomedical research, we expect ItClust will make better utilization of the vast amount of existing well annotated scRNA-seq datasets, and enable researchers to accurately cluster and annotate cells in scRNA-seq.

CarDEC (Count adapted regularized Deep Embedded Clustering)

CarDEC is a joint deep learning computational tool that is useful for analyses of single-cell RNA-seq data. CarDEC can be used to:

  • Cluster cells.
  • Correct for batch effect in the full gene expression space, allowing the investigator to remove batch effect from downstream analyses like psuedotime analysis and coexpression analysis. Batch correction is also possible in a low-dimensional embedding space.
  • Denoise gene expression.
sciPENN (single cell imputation Protein Embedding Neural Network)

sciPENN is a deep learning computational tool that is useful for analyses of CITE-seq data. sciPENN can be used to:

  • Transfer cell-type labels from a reference CITE-seq dataset to a query scRNA-seq dataset.
  • Predict proteins in a query scRNA-seq dataset from a reference CITE-seq dataset.
  • Integrate the scRNA-seq and CITE-seq data into a shared latent space.
  • Combine multiple CITE-seq datasets with different protein panels by imputing missing proteins for each CITE-seq dataset.
SpaGCN (Spatial Graph Convolutional Network)

SpaGCN is a graph convolutional network to integrate gene expression and histology to identify spatial domains and spatially variable genes. To jointly model all spots in a tissue slide, SpaGCN integrates information from gene expression, spatial locations and histological pixel intensities across spots into an undirected weighted graph. Each vertex in the graph contains gene expression information of a spot and the edge weight between two vertices quantifies their expression similarity that is driven by spatial dependency of their coordinates and the corresponding histology. To aggregate gene expression of each spot from its neighboring spots, SpaGCN utilizes a convolutional layer based on edge weights specified by the graph. The aggregated gene expression is then fed into a deep embedding clustering algorithm to cluster the spots into different spatial domains. After spatial domains are identified, genes that are enriched in each spatial domain can be detected by differential expression analysis between domains. SpaGCN is applicable to both in-situ transcriptomics with single-cell resolution (seqFISH, seqFISH+, MERFISH, STARmap, and FISSEQ) and spatial barcoding based transcriptomics (Spatial Transcriptomics , SLIDE-seq, SLIDE-seqV2, HDST, 10x Visium, DBiT-seq, Stero-seq, and PIXEL-seq) data.

TESLA (Tumor Edge Structure and Lymphocyte multi-level Annotation)

TESLA is a machine learning framework for multi-level tissue annotation on the histology image with pixel-level resolution in Spatial Transcriptomics (ST). By integrating information from high-resolution histology image, TESLA can impute gene expression at superpixels and fill in missing gene expression in tissue gaps. The increased gene expression resolution makes it possible to treat gene expression data as images, which enabled the integration with histological features for joint tissue segmentation and annotation of different cell types directly on the histology image with pixel-level resolution. Additionally, TESLA can detect unique structures of tumor immune microenvironment such as Tertiary Lymphoid Structures (TLSs), , separate a tumor into core and edge to examine their cellular compositions, expression features, and molecular processes. TESLA has been evaluated on five cancer datasets. Our results consistently showed that TESLA can generate high-quality super-resolution gene expression images, which facilitated the downstream multi-level tissue annotation.

iStar (Inferring Super-resolution Tissue Architecture)

Inferring Super-Resolution Tissue Architecture by Integrating Spatial Transcriptomics and Histology, iStar enhances the spatial resolution of spatial transcriptomic data from a spot-level to a near-single-cell level. 

MISO (Multi-modal Spatial Omics)

MISO is a deep-learning based method developed for the integration and clustering of multi-modal spatial omics data. MISO requires minimal hyperparameter tuning, and can be applied to any number of omic and imaging data modalities from any multi-modal spatial omics experiment. MISO has been evaluated on datasets from experiements including spatial transcriptomics (transcriptomics and histology), spatial epigenome-transcriptome co-profiling (chromatin accessibility, histone modification, and transcriptomics), spatial CITE-seq (transcriptomics, proteomics, and histology), and spatial transcriptomics and metabolomics (transcriptomics, metabolomics, and histology).

iSCALE (Inferring Spatially resolved Cellular Architectures for Large-sized tissue Environments)

This software package implements iSCALE (Inferring Spatially resolved Cellular Architectures for Large-sized tissue Environments), A novel framework designed to integrate multiple daughter captures and utilize H&E information from large tissue samples, enabling the prediction of gene expression in large-sized tissues with near single-cell resolution.

Publications

Selected Publications from SC2SG Researchers

Abedini A, Levinsohn J, Klötzer KA, Dumoulin B, Ma Z, Frederick J, Dhillon P, Balzer MS, Shrestha R, Liu H, Vitale S, Bergeson AM, Devalaraja‑Narashimha K, Grandi P, Bhattacharyya T, Hu E, Pullen SS, Boustany‑Kari C, Guarnieri P, Karihaloo A, Traum D, Yan H, Coleman K, Palmer M, Sarov‑Blat L, Morton L, Hunter CA, Kaestner KH, Li M, Susztak K. Single-cell multi‑omic and spatial profiling of human kidneys implicates the fibrotic microenvironment in kidney disease progression. Nat Genet. 2024 Aug;56(8):1712–1724. doi:10.1038/s41588-024-01802-x.

Coleman K, Schroeder A, Loth M, Zhang D, Park JH, Sung JY, Blank N, Cowan AJ, Qian X, Chen J, Jiang J, Yan H, Samarah LZ, Clemenceau JR, Jang I, Kim M, Barnfather I, Rabinowitz JD, Deng Y, Lee EB, Lazar A, Gao J, Furth EE, Hwang TH, Wang L, Thaiss CA, Hu J, Li M. Resolving tissue complexity by multimodal spatial omics modeling with MISO. Nat Methods. 2025 Mar;22(3):530–538. doi:10.1038/s41592‑024‑02574‑2.

Govek KW, Nicodemus P, Lin Y, et al. CAJAL enables analysis and integration of single-cell morphological data using metric geometry. Nat Commun. 2023;14(1):3672. doi:10.1038/s41467-023-39424-2.

Guo P, Mao L, Chen Y, Lee CN, Cardilla A, Li M, Bartosovic M, Deng Y, et al. Multiplexed spatial mapping of chromatin features, transcriptome and proteins in tissues. Nat Methods. 2025 Mar;22(3):520–529. doi:10.1038/s41592‑024‑02576‑0.

Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, Lee EB, Shinohara RT, Li M. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021 Nov;18(11):1342–1351. doi:10.1038/s41592-021-01255-8.

Perlman BS, Burget N, Zhou Y, Schwartz GW, Petrovic J, Modrusan Z, Faryabi RB. Enhancer-promoter hubs organize transcriptional networks promoting oncogenesis and drug resistance. Nat Commun. 2024;15(1):8070. doi:10.1038/s41467-024-52375-6.

Wilson PC, Verma A, Yoshimura Y, Muto Y, Li H, Malvin NP, Dixon EE, Humphreys BD. Mosaic loss of Y chromosome is associated with aging and epithelial injury in chronic kidney disease. Genome Biol. 2024 Jan 29;25(1):36. doi:10.1186/s13059-024-03173-2.

Zhang Z, Mathew D, Lim TL, et al. Recovery of biological signals lost in single‑cell batch integration with CellANOVA. Nat Biotechnol. 2024 Nov 26;42(11). doi:10.1038/s41587-024-02463-1.

Zhang D, Wang X, Shivashankar GV, Uhler C. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat Biotechnol. 2024;42(1):22–31. doi:10.1038/s41587-023-02019-9.