Bioinformatics Portal for Predicting Binding Regions and Modes in Protein-Nucleic Acid Interactions
Protein-nucleic acid interactions (PNIs) are essential for biological processes, including gene regulation, DNA repair, and viral infection. The changes in binding regions and modes in PNIs are vital for understanding the action mechanism and detecting abnormalities. Computational methods, particularly those leveraging machine learning (ML) and deep learning (DL), have become powerful tools for predicting PNI binding sites and structural features. However, systematic evaluation is needed to ensure the reliability and promote innovation in these bioinformatics resources. Here, we present a comprehensive toolbox for PNIs. It includes curated databases detailing interaction types, sources, cross-domain interactions, and potential applications. Then, we investigated a toolbox that leveraged ML and DL-based algorithms to predict binding sites and conformational dynamics, with the aim of uncovering the molecular mechanisms underlying PNIs. Additionally, we discussed the potential applications of drug research related to PNIs. This study introduces a suite of advanced predictive tools that utilize computational modeling to enhance the design of nucleic acid therapeutics forward. We've streamlined these tools into a user-friendly online platform accessible for academic use at http://rv.agroda.cn/pni_portal.
circASbase:A Comprehensive Database of Alternative Splicing Events in circRNAs
Despite extensive evidence has underscored the critical role of alternative splicing in generating mature circular RNA (circRNA) isoforms and augmenting their function diversity, a significant gap remains in the availability of specialized databases housing circRNA alternative splicing events. To bridge this gap, we develop circASbase, a pioneering and comprehensive database that catalogues 452,129 alternative splicing events in 884,047 full-length circRNAs from 581 samples across 13 species, and provides rich annotations to facilitate understanding the splicing regulation of circRNA. Our findings reveal substantial differences between circRNAs and linear transcripts regarding the distribution and occurrence of alternative splicing events, highlighting the unique regulatory landscape of circRNAs. These special splicing events result in functional differences of circRNAs by affecting IRES sites, m6A sites, ORFs, protein features, miRNA targets, and more. In summary, circASbase not only covers the urgent need of the research community for data repositories, but also represents a significant advancement in our understanding of circRNA biology. With its user-friendly interfaces and web-based visualization tools, circASbase is poised to become an indispensable resource for researchers exploring the regulatory mechanisms and functional roles of alternative splicing events in circRNAs. This database will continuously drive new insights and discoveries in the field, setting the stage for further advancements in circRNA research. circASbase is freely available at http://reprod.njmu.edu.cn/cgi-bin/circASbase/.
HGCPep: Hypergraph Deep Learning Identifies Cancer-associated Non-coding Peptides
A small peptide encoded by a non-coding RNA (ncRNA), known as a non-coding peptide (ncPEP), is emerging as a critical regulator and biomarker in cancer, holding immense promise for immunotherapy. However, systematic identification of ncPEPs is hampered by computational methods that typically analyze peptides based on sequence alone. This approach overlooks the fundamental biological principle that multiple distinct peptides can be translated from a single ncRNA transcript, thus sharing a common transcriptional origin. Here, we address this limitation by developing HGCPep, a deep learning framework that leverages hypergraphs to model these intrinsic relationships. In our model, each ncRNA is represented as a hyperedge connecting the cohort of peptides it encodes, thereby enriching peptide feature representations with transcriptional context. We demonstrate that HGCPep, which integrates a hypergraph neural network with a convolutional neural network, outperforms state-of-the-art methods in identifying cancer-associated ncPEPs. Furthermore, dimensionality reduction of the learned embeddings reveals distinct clustering of ncPEPs by cancer type, illustrating how the model effectively deciphers complex biological associations. Our work introduces a new method for ncPEPs analytics and provides a powerful tool for discovering novel therapeutic targets in oncology. The dataset and source code of our proposed method can be found via https://github.com/Longwt123/HGCPep_Github.
Profiling Cell-state Fingerprints Based on Deep Learning Model with Meta-programs of Pan-cancer
Cell states within cancer have garnered significant attention, yet the mechanisms through which malignant cells assert dominance in pan-cancer commonalities remain elusive. In this study, we employed label-free multiplexed single-cell RNA sequencing (scRNA-seq) to analyze cell states in 159,372 cells across 245 cell lines spanning 14 tissue types, integrating both public and proprietary datasets. We identified 21 meta-programs (MPs) representing shared characteristics across pan-cancer landscapes, encompassing 16 biological processes. Subsequently, we developed a deep learning model StateNet to generate cell-state fingerprints for delineating the individuality of each cell line based on these MPs. Leveraging StateNet, we pinpointed ACAT2 as a potential mediator bridging hypoxia and the lipid metabolism pathway, and we also showcased that epithelial-mesenchymal transition programs are vital for classifying cell lines through perturbation experiments. StateNet not only elucidates the overarching manifold structure of scRNA-seq data but also furnishes cell-state fingerprints of cell clusters, unveiling prognosis-related programs and distinguishing between patients with varying survival outcomes. Utilizing these prognosis-related programs on 3210 cancer samples, we constructed Cox models and identified risk-associated programs and genes responsible for different cancer types. StateNet thus emerges as a novel and efficient tool for cancer profiling, unraveling the shared commonalities and distinct individualities of pan-cancer cells across expansive datasets.
Spanve: A Statistical Method for Downstream-friendly Spatially Variable Genes in Large-scale Data
Depicting gene expression in a spatial context through spatial transcriptomics is beneficial for inferring cellular mechanisms. Identifying spatially variable genes is a crucial step in leveraging spatial transcriptome data to understand intricate spatial dynamics. In this study, we developed Spanve, a nonparametric statistical method for detecting spatially variable genes in large-scale spatial transcriptomics datasets by quantifying expression differences between each spot or cell and its local neighbors. This method offers a nonparametric approach for identifying spatial dependencies in gene expression without distributional assumptions. Compared with existing methods, Spanve yields fewer false positives, leading to more accurate identification of spatially variable genes. Furthermore, Spanve improves the performance of downstream spatial transcriptomics analyses including spatial domain detection and cell type deconvolution. These results show the broad application potential of Spanve in advancing our understanding of spatial gene expression patterns within complex tissue microenvironments. Spanve is publicly available at https://github.com/zjupgx/Spanve and https://ngdc.cncb.ac.cn/biocode/tool/BT7724.
Enhanced Functional Potential of Pseudogene-associated lncRNA Genes in Mammals
The functional significance of long non-coding RNAs (lncRNAs) remains a subject of debate, largely due to the complexity and cost associated with their validation experiments. However, emerging evidence suggests that pseudogenes, once viewed as genomic relics, may contribute to the origin of functional lncRNA genes. In this study spanning eight species, we systematically identified pseudogene-associated lncRNA genes using our PacBio long-read sequencing data and published RNA-seq data. Our investigation revealed that pseudogene-associated lncRNA genes exhibit heightened functional attributes compared to their non-pseudogene-associated counterparts. Notably, these pseudogene-associated lncRNAs show protein-binding proficiency, positioning them as potent regulators of gene expression. In particular, pseudogene-associated sense lncRNAs retain protein-binding capabilities inherited from parent genes of pseudogenes, thereby demonstrating greater protein-binding proficiency. Through detailed functional characterization, we elucidated the unique advantages and conserved roles of pseudogene-associated lncRNA genes, particularly in the context of gene expression regulation and DNA repair. Leveraging cross-species expression profiling, we demonstrated the prominent contribution of pseudogene-associated lncRNA genes to aging-related transcriptome changes across nine human tissues and eight mouse tissues. Overall, our findings demonstrate enhanced functional attributes of pseudogene-associated lncRNA genes and shed light on their conserved and close association with aging.
Function and Development of Deep-sea Mussel Bacteriocytes Revealed by SnRNA-seq and Spatial Transcriptomics
The deep-sea chemosynthetic ecosystems are among one of the most unusual ecosystems on Earth, where most megafauna form close symbiotic associations with chemosynthetic microbes to obtain nutrition and shelter from the toxic environment. Despite the diverse forms of symbiotic organs in these deep-sea holobionts, the function and development of bacteriocytes, the host cells harboring symbionts, are still largely uncharacterized. Here, we have conducted the in situ decolonization assay and state-of-the-art single-nucleus and spatial transcriptomics to reveal the function and development of deep-sea mussel bacteriocytes. The bacteriocytes appear to optimize immune processes to facilitate recognition, engulfment, and elimination of endosymbionts. They also interact directly with them in carbohydrate and ammonia metabolism by exchanging metabolic intermediates via transporters such as SLC37A2 and RHBG-A. Bacteriocytes arise from three different proliferating cell types, and their successive development trajectory was delineated by multi-omics data and 3D reconstruction analyses. The molecular functions and the developmental processes of bacteriocytes were found to be guided by the same set of molluscan-conserved transcription factors and may be influenced by endosymbionts through sterol metabolism. The coordination in the functions and development of bacteriocytes and between the host and symbionts highlights the phenotypic plasticity of symbiotic cells, and underpins host-symbiont interdependence in adaptation to the deep sea.
Macrophages in Hematopoiesis and Related Blood Diseases
Emerging evidence indicates that macrophages play important roles in hematopoiesis in addition to their immune functions. The well-known immune-unrelated functions of macrophages include their roles in hematopoiesis, especially quality control of hematopoietic stem/progenitor cells (HSCs/HSPCs), supporting erythropoiesis, and megakaryopoiesis. Several studies, most using mouse models, have explored the roles of macrophages in hematopoiesis in different organs such as the yolk sac (YS), fetal liver (FL), bone marrow (BM) and spleen (SP). We have recently documented the potential roles and underlying mechanisms of macrophages in myeloproliferative neoplasm (MPN), aplastic anemia (AA), and idiopathic thrombocytopenic purpura (ITP). In this article, we review origin of macrophages, introduce the roles of macrophages in HSCs/HSPCs, erythropoiesis, and megakaryopoiesis in four hematopoietic organs, summarize the recent advances of macrophages in MPN, AA and ITP. Finally, we outline the unresolved questions that future studies should address to explore in greater depth of macrophages' role in both normal and disordered hematopoiesis.
Single-cell Isoform Sequencing Reveals Transcriptional Dysregulation in ASD Mouse Cortex Development
RNA splicing is pivotal in neural development, yet the role of isoform diversity across cell types remains unclear. Here, we combined metabolic RNA labeling and single-cell full-length transcriptome sequencing to capture transcriptional dynamics in developing mouse cortices. We observed predetermined cell states supported by nascent RNAs and characterized the driving isoforms of transcription factors that regulated the development of deep- and upper-layer neurons. Additionally, we investigated isoform regulation associated with autism spectrum disorder (ASD) during the embryonic development of BTBR T + Itpr3tf mice. Our findings indicated premature emergence of callosal projection neurons (CPNs) with an immature identity in ASD-affected cortices. These CPNs exhibited abnormal transcript usage, and the related RNA binding proteins included nearly 60% that have been reported to be ASD risk genes. We identified isoform switching events modulating neurogenesis and ASD development. Finally, we observed reduced isoform diversity in ASD potentially linked to dysregulated H3K27ac levels. Collectively, our study represents a significant advancement in understanding the molecular basis of cortical development and functions.
Mouse Forebrain Region Features and Cholinergic Neurons Subtyping: Integrated Analysis with Spatial Multi-omics
The forebrain regions display distinct yet under-characterized gene expression patterns. In this study, we analyzed region-specific feature genes in forebrain regions and Isocortex subregions at both the transcriptomic and proteomic levels. The key finding was the observation of a low correlation but high functional similarity between mRNA and protein expression, providing new insights into the relationship between gene expression forms in neuronal pathways and the neuronal activity states. Cholinergic neurons (CNs) play a vital role in forebrain sensory and motor regulation. With Spatial-transcriptome and immunofluorescence joint analysis (STIF), which overcomes the resolution limitations of the 10X Visium system, we identified CNs and CN-subtypes specific feature genes in the striatum and basal forebrain, providing crucial insights into the heterogeneity and functional diversity of these neuronal populations. The spatial distribution and expression patterns of the identified feature genes were validated using either external datasets or rolling circle amplification fluorescence in situ hybridization (RCA-FISH), coincident results were revealed.
Harnessing Large Cohorts and AI to Bridge Genomic Discovery and Clinical Practice
Noise2read: Accurately Rectify Millions of Erroneous Short Reads Through Graph Learning on Edit Distances
Although the per-base error rate of short-read sequencing data is very low at 0.1%-0.5%, the percentage/probability of erroneous reads in a dataset can be as high as 10%-15% or in the number millions. As current methods correct only some errors while introducing many new errors, we solve this problem by turning erroneous reads into their original states, without bringing up any non-existing reads to keep the data integrity. The novelty is originated in a computable rule translated from polymerase chain reaction (PCR) erring mechanism that: a rare read is erroneous if it has a neighbouring read of high abundance. With this principle, we construct a graph to link each pair of reads of tiny edit distances to detect a solid part of erroneous reads; then we consider these pairs of reads of tiny edit distances as training data to learn the erring mechanisms to identify possibly remaining hard-case errors between pairs of high-abundance reads. The proposed approach, noise2read, is competent to handle the rectification of erroneous reads from short-read sequencing data whenever PCR is involved. Compared with state-of-the-art methods on tens of evaluation datasets of unique molecular identifier (UMI) based ground truth, noise2read performs significantly better on 19 metrics. Case studies found that noise2read can greatly improve short-reads quality and make substantial impact on genome abundance quantification, isoform identification, single nucleotide polymorphisms (SNP) profiling, and genome editing efficiency estimation. Noise2read is publicly available at https://github.com/JappyPing/noise2read and https://ngdc.cncb.ac.cn/biocode/tool/7951.
Deep Transfer Learning Links Benign Glands to Prostate Cancer Progression via Transcriptomics
The field effect describes the phenomena where environmental exposures, infection, and genetic predisposition result in molecular changes in cells that predispose them to developing cancer. Though this is a well-established concept in pathology, it remains underexplored in the context of high-resolution omics. We utilized the Diagnostic Evidence Gauge of Single Cells (DEGAS) deep transfer learning framework to analyze prostate cancer spatial transcriptomics to identify cells and tissues that are highly associated with cancer progression. DEGAS highlighted morphologically benign glands that had reduced expression of MSMB, a differentiation marker that is decreased in aggressive tumors. These glands have upregulated genes associated with antigen presentation and aggressive neoplasms. Integration of single-cell transcriptomics and deep learning image analysis separately revealed altered immune-cell infiltration, suggesting a complex interplay in the tumor environment facilitating aggressiveness. We used immunohistochemistry to quantify the MSMB protein (PSP-94) expression on morphologically normal and tumor tissues from patients with and without 5-year distant metastasis. Samples from patients who developed metastasis consistently showed lower fractions of positively stained cells, indicating a subtle yet significant "field effect" in seemingly benign regions. These proteomic results validate the transcriptomic findings and further underscore that inflammatory or immune-related changes in ostensibly normal tissue may contribute to aggressive disease progression.
CanID: A Robust and Accurate RNA-seq Expression-based Diagnostic Classification Scheme for Pediatric Malignancies
Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92%-93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID.
scATAnno: Automated Cell Type Annotation for Single-cell ATAC Sequencing Data
Recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key analysis task is to determine cell type identity based on the epigenetic data. We introduce scATAnno, a python package designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow generates the reference atlases from publicly available datasets enabling accurate cell type annotation by integrating query data with reference atlases, without the use of scRNA-seq data. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect cell populations within the query data that are distinct from all cell types in the reference data. We compare and benchmark scATAnno against 5 other published approaches for cell annotation, demonstrating superior performance in multiple data sets and metrics. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), triple negative breast cancer (TNBC), and basal cell carcinoma (BCC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a useful tool for scATAC-seq reference building and cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems. scATAnno is available online at https://scatanno-main.readthedocs.io/.
Computational Analyses and Challenges of Single-cell ATAC-seq
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) has emerged as a powerful technique to study cell-specific epigenetic landscapes and to provide a multidimensional portrait of gene regulation. However, low genomic coverage per cell results in intrinsic data sparsity and missing-data issues, presenting unique methodological challenges. Consequently, numerous computational methods and techniques have been developed to address these challenges. This review provides a concise overview of published workflows for scATAC-seq analysis, covering preprocessing through downstream analysis including quality control, alignment, peak calling, dimension reduction, clustering, gene regulation score calculation, cell-type annotation, and multiomics integration. Additionally, we survey key scATAC-seq databases that offer curated, accessible resources; discuss emerging deep-learning methods and Artificial Intelligence (AI) foundation models tailored to scATAC-seq data; and highlight recent advances in spatial ATAC-seq technologies and associated computational approaches. Our objective is to equip readers with a clear understanding of current scATAC-seq methodologies so they can select appropriate tools and construct customized workflows for exploring gene regulation and cellular diversity.
A Telomere-to-Telomere Diploid Reference Genome and Centromere Structure of the Chinese Quartet
Recent advances in sequencing technologies have enabled the complete assembly of human genomes from telomere to telomere (T2T), resolving previously inaccessible regions such as centromeres and segmental duplications. Here, we present an updated, higher-quality, haplotype-phased T2T assembly of the Chinese Quartet (T2T-CQ), a family cohort comprising monozygotic twins and their parents, generated using high-coverage ONT ultralong and PacBio HiFi sequencing. The T2T-CQ assembly serves as a crucial reference genome for integrating publicly available multi-omics data and advances the utility of the Quartet reference materials. The T2T-CQ assembly scores highly on multiple metrics of continuity and completeness, with Genome Continuity Inspector (GCI) scores of 77.76 (maternal) and 76.41 (paternal), 21-mer quality values (QV) > 66, and Clipping Reveals Assembly Quality (CRAQ) scores > 99.6 for both haplotypes, enabling complete annotation of centromeric regions. Within these regions, we identified novel 13-mer higher-order repeat patterns on chromosome 17 which exhibited a monophyletic origin and emerged approximately 230 thousand years ago. Overall, this work establishes an essential genomic resource for the Han Chinese population and advances the development of a T2T pan-Chinese reference genome, which will significantly enable future investigations both into population-specific structural variants and the evolutionary dynamics of centromeres.
MoRNiNG: A Database of RNA Modification Sites Associated with RNA Secondary Structure Dynamics
RNA structures are essential building blocks of functional RNA molecules. Profiling secondary structures in vivo and in real time remains challenging because RNAs exhibit dynamic structures and complex conformations. Besides the canonical stem-loop secondary structure, non-canonical structure RNA G-quadruplex (rG4) has attracted interest for its potential as a drug target. Early studies have demonstrated that RNAs can form distinct secondary structures. However, how distinct RNA structures, formed from the same RNA sequences, function within the transcriptome is poorly understood, and factors driving and regulating structure transitions remain to be investigated. Inspired by an HOXB9 segment able to form multiple structures, we found that many RNA segments across the transcriptome exhibit multi-faceted structure-forming potential. In the case of HOXB9, we demonstrate that N6-methyladenosine (m6A) modification influences RNA structure and binding to RNA-binding proteins (RBPs). Therefore, we collected RNA modification sites naturally occurring within the putative G-quadruplex-forming sequences (PQSs) of transcripts and developed MoRNiNG, a database for RNA modifications in natural rG4. MoRNiNG is structured with reliability tiers determined by the resolution of RNA modification sites and is designed to accommodate various large datasets. We experimentally validated the influence of m6A, 5-methylcytosine (m5C), and adenosine to inosine (A-to-I) editing on rG4-forming sequences, providing evidence to support the modification switch concept. The diversity and transition of secondary structures from the same RNA segment offer valuable insights into the regulation of RNA structure dynamics. MoRNiNG is freely accessible at https://www.cityu.edu.hk/bms/morning.
DBP: Adaptive and Interpretable Factor Analysis for Single-cell RNA-seq Data with Deep Beta Processes
Factor analysis is a method that condenses multiple variables into a few latent factors. It can be used to extract the underlying sources of biological variation in high-dimensional data and distill them into interpretable gene programs. However, existing factorization methods lack adaptability in selecting the optimal number of factors and interpretability in capturing biological variation. To address these concerns, we propose Deep Beta Process (DBP), a deep probabilistic framework for adaptive and interpretable factor analysis of single-cell transcriptomic data. DBP achieves adaptive selection of factors through a stick-breaking Beta process and performs batch correction using an adversarial learning strategy. We validate the flexible factor extraction and robust batch correction capabilities of DBP on simulated datasets. We also demonstrate its superior performance in dimensionality reduction and biological interpretability while explaining biological variation from both cell and gene perspectives using factor and loading matrices. The application of DBP to a gastric adenocarcinoma dataset reveals malignant epithelial cell heterogeneity, providing valuable insights for investigating the molecular mechanisms of disease onset and progression. DBP is available at https://github.com/labomics/DBP and https://ngdc.cncb.ac.cn/biocode/tool/BT007954.
Distinct Co-methylation Patterns in African and European Populations and Their Genetic Associations
Human populations have substantial genetic diversity, but the extent of epigenetic diversity remains unclear, as population-specific DNA methylation (DNAm) has only been studied for ∼3.0% of CpGs. This study quantifies DNAm using whole-genome bisulfite sequencing (WGBS) and analyzes it alongside whole-genome genotype data to reveal a comprehensive picture of population-specific DNAm. Using a "co-methylated region" (CMR) approach, 36,657 CMRs were identified in 62 lymphoblastoid B cell line (LCL) WGBS samples, with validation in array data sets from 326 LCL samples. Between individuals of European and African ancestry, 101 CMRs exhibited population-specific DNAm patterns (Pop-CMRs), including 91 Pop-CMRs not found in previous investigations, which spanned genes (e.g., CCDC42, GYPE, MAP3K20, and OBI1) related to diseases (e.g., malaria infection and diabetes) with different prevalence and incidence rates between populations. Over half of the Pop-CMRs were asscoated with genetic variants, displaying population-specific allele frequencies and primarily mapping to genes involved in metabolic and infectious processes. Additionally, subsets of Pop-CMRs could be applicable in East Asian populations and peripheral blood-based tissues. This study provides insights into DNAm differences across the genome between populations and explores their associations with genetic variants and biological relevance, advancing our understanding of epigenetic roles in population specificity.
Regulation of Alternative Polyadenylation Events by PABPC1 Affects Erythroid Progenitor Cell Expansion
Erythropoiesis is precisely regulated by multilayered networks. It is crucial for maintaining steady-state hemoglobin levels and ensuring effective oxygen transport. Alternative polyadenylation (APA) is a post-transcriptional regulatory mechanism generating multiple mRNA isoforms from a single gene based on specific 3'-untranslated region sequences. While APA plays a vital role in various cellular processes, the underlying mechanism in erythropoiesis remains largely unexplored. In this study, we employed an integrative approach, combining bioinformatics analyses and experimental validations, to systematically investigate the role of APA in erythropoiesis. We mapped the APA landscape during erythroid differentiation and identified significant APA shifts essential for the differentiation of erythroid cells from burst-forming unit erythroid (BFU-E) to colony-forming unit erythroid (CFU-E). Notably, our findings highlighted polyadenylate-binding protein cytoplasmic 1 (PABPC1) as the primary regulator of APA during these stages. Functional analyses have revealed that knockdown of PABPC1 disrupts erythroid progenitor cell proliferation and differentiation. These results implicate an essential role of PABPC1 in modulating cell fate through APA regulation. Furthermore, we found that decreased PABPC1 levels increased the usage of the proximal polyadenylation sites in the TSC22D1 gene. This shift led to elevated expression of TSC22D1, uncovering a novel mechanism by which APA influences erythroid progenitor expansion and differentiation. Our findings provide novel insights into APA regulation in early erythropoiesis and suggest potential therapeutic strategies for diseases associated with erythropoietic disorders.
