Journal of Chemical Information and Modeling

MAGIC: A Multimodal Adaptive GRN Inference Constructor
Yao D, Zhang B, Zhan X and Wang W
Gene regulatory network (GRN) provides critical insights into the molecular mechanisms that govern cellular processes and disease pathogenesis, facilitating the identification of key regulatory factors and the discovery of potential therapeutic targets. Although numerous methods have been proposed to infer GRN from single-cell RNA sequencing (scRNA-seq) data, GRN inference remains challenging due to the inherent sparsity of scRNA-seq data and the naturally sparse connectivity of GRN. To address these challenges, this study proposes the Multimodal Adaptive GRN Inference Constructor (MAGIC), a method that improves GRN inference by aligning and integrating gene expression data, sequence information, and semantic features. Specifically, gene expression features reflect gene activity within cells, gene sequence features offer structural insights at the DNA level, and gene semantic features encapsulate functional meaning by leveraging biological knowledge bases. Furthermore, a consensus similarity network is constructed from multimodal gene similarity networks and integrated with known GRN to form a dual-topology network. To address the issue of sparse connectivity in GRN, a shared graph attention weight alignment module is employed. Following this, a Knowledge-Aware Multimodal Fusion Module is introduced to effectively integrate multimodal features by leveraging prior knowledge, thereby alleviating the inherent sparsity of scRNA-seq data. Finally, the fused features are used to infer GRNs. MAGIC achieved an average AUROC of 0.839 across seven scRNA-seq data sets using four types of ground-truth networks, outperforming other state-of-the-art models. Further analysis of two spatial transcriptomic data sets, bladder and breast cancer, demonstrates the robustness of MAGIC and its ability to uncover potential associations between transcription factors (TFs) and their target genes. MAGIC is publicly available at https://github.com/ydkvictory/MAGIC.
Materials Dual-Source Knowledge Retrieval-Augmented Generation for Local Large Language Models in Photocatalysts
Takahara W, Yamaguchi Y, Ogano M, Kakami F, Harashima Y, Takayama T, Takasuka S, Kudo A and Fujii M
Large language models (LLMs) have the potential to serve as collaborative assistants in scientific research. However, adapting them to specialized domains is difficult because it requires the integration of domain-specific knowledge. We propose Materials Dual-Source Knowledge Retrieval-Augmented Generation (MDSK-RAG), a retrieval-augmented generation (RAG) framework that enables domain specialization of LLMs for materials development under fully offline (no-Internet) operation to ensure data confidentiality. The framework unifies two complementary knowledge sources, experimental CSV data (practical knowledge) and scientific PDF literature (theoretical insights), by converting tabular records into template-based text, retrieving relevant passages from each source, summarizing them with a local LLM, and merging the summaries with the user query prior to generation. As a case study, we applied the framework to metal-sulfide photocatalysts using 740 in-house experimental records and 20 scientific PDFs. We evaluated the framework on a benchmark consisting of 14 expert-defined questions and used two-sided Wilcoxon signed-rank tests for paired comparisons. Models with fewer than 10 billion parameters were executed on a laptop, whereas larger models were run on a dedicated local server; the cloud-based LLM (GPT-4o) was evaluated via the cloud service. For practical deployment, gemma-2-9b-it (<10 billion parameters) was chosen as the primary local model; we additionally tested Qwen2.5-7B-Instruct and a larger gemma-2-27b-it to assess model choice and scalability. For gemma-2-9b-it, the framework increased the median cosine similarity to expert reference answers from 0.63 to 0.71, an absolute increase of 0.08 (corresponding to a relative percentage gain of 12.70%; Wilcoxon signed-rank test statistic: = 14.0, two-sided -value: = 1.34 × 10) and improved the median expert 5-point rating from 2 to 3, an absolute increase of 1 point (corresponding to a relative percentage gain of 50.00%; Wilcoxon signed-rank test statistic: = 3.5, two-sided -value: = 7.00 × 10). For reasoning-type questions, incomplete context retrieved by MDSK-RAG sometimes disrupted the model's reasoning process and led to incorrect conclusions, indicating remaining room for improvement. Comparable, statistically significant improvements were observed for the other local models (Qwen2.5-7B-Instruct and a larger gemma-2-27b-it) between conditions with and without the framework in the evaluation by cosine similarity to expert reference answers. In comparison to a cloud-based LLM, the gemma-2-9b-it with the framework outperformed GPT-4o. In this case study, the framework effectively incorporated practical experimental knowledge and theoretical literature into local LLM responses, improving accuracy for domain-specific queries. The framework presented here offers a practical and extensible adaptation of local LLMs to domain-specific scientific research.
ThermoPred: AI-Enhanced Quantum Chemistry Data Set and ML Toolkit for Thermochemical Properties of API-Like Compounds and Their Degradants
Santos DP, Dias-Silva JR, Júnior LHKQ and de Oliveira HCB
In this work, we present an open-access quantum-chemistry database of more than 14,500 API-like molecules and their degradation products, all optimized at the M06-2/6-31G(d) compound model. The data set delivers a comprehensive suite of thermochemical and quantum descriptors─including Gibbs free energy, enthalpy, electronic energy, vibrational frequencies and Cartesian geometries─tailored for large-scale modeling. Leveraging these data, we trained and validated three machine-learning models (XGBoost, Random Forest and Multi-Layer Perceptron) to enable rapid, accurate prediction of Gibbs free energy and enthalpy. These models are bundled in ThermoPred, an open-source Python package that offers a scalable, computationally efficient alternative to traditional quantum-chemical calculations. All data sets, models and source code are freely available to support reproducibility and foster community-driven development.
SynKit: A Graph-Based Python Framework for Rule-Based Reaction Modeling and Analysis
Phan TL, González Laffitte ME, Weinbauer K, Merkle D, Andersen JL, Fagerberg R, Gatter T and Stadler PF
Computational modeling of chemical reactions is fundamental to modern synthetic chemistry but is often hindered by a fragmented software ecosystem and the complexity of accurately representing the reaction mechanisms. To address this, we introduce SynKit, an open-source Python library that provides a unified, chemically intuitive framework for reaction informatics. SynKit performs core tasks such as reaction canonicalization and transformation classification, while other functionalities─such as synthetic route construction through rule composition─are supported through integration with external libraries. The newly introduced extends the traditional net-change representation of the by explicitly modeling the sequence of bond-forming and bond-breaking events, capturing transient intermediates, and providing deeper mechanistic insight. Designed for easy installation and broad compatibility, SynKit integrates smoothly into existing computational workflows for exploring complex . For more advanced network analyses, it interfaces with specialized tools (e.g., MØD) to support exhaustive mechanism enumeration and kinetics-aware studies. By combining advanced mechanistic modeling with an accessible, modular design, SynKit supports more reproducible and rigorous research in automated synthesis planning.
ProSECFPs: A Novel Fingerprint-Based Protein Representation Method for Missense Mutation Pathogenicity Prediction
Poles C, Di Stefano M, Piazza L, Bononi G, Poli G, Macchia M, Tuccinardi T and Giordano A
Developing effective computational representations of protein sequences is crucial for advancing diverse areas of computational biology and bioinformatics. Ideal representations must be computationally efficient, scalable, informative, flexible across contexts, and broadly applicable. To address these requirements, we propose Protein Sequence Extended-Connectivity Fingerprints (ProSECFPs), a novel fingerprinting method inspired by Extended-Connectivity Fingerprints (ECFPs), commonly used in chemoinformatics to represent small molecules. ProSECFPs effectively capture the complex physicochemical characteristics, sequence-specific details, and structural attributes intrinsic to protein sequences. We demonstrate the effectiveness and versatility of ProSECFPs by evaluating their performance in predicting the pathogenicity of missense mutations by applying a diverse set of machine learning (ML) and deep learning (DL) algorithms. Notably, our results indicate that ProSECFPs, especially their frequency-aware variants, achieve competitive or superior accuracy compared with established protein sequence descriptors. This enhanced performance arises from their ability to comprehensively integrate amino acid composition and detailed sequence information. ProSECFPs thus provide a robust, adaptable, and highly informative computational representation of proteins, serving as a powerful foundation for addressing interdisciplinary challenges in bioinformatics, genomics, and protein engineering.
Deep Learning vs Classical Methods in Potency and ADME Prediction: Insights from a Computational Blind Challenge
Fischer Y, Southiratn T, Triki D and Cedeno R
Reliable prediction of compound potency and the ADME profile is crucial in drug discovery. With the recent surge of AI and deep learning frameworks, it remains unclear whether these modern techniques offer statistically significant improvement over the well-established classical methods. The 2025 ASAP-Polaris-OpenADMET Antiviral Challenge provided a unique benchmarking opportunity to address this question, with over 65 teams of computational scientists worldwide. Our submissions were among the top performers in terms of Pearson correlation, ranked first in pIC prediction for SARS-CoV-2 Mpro and fourth in aggregated ADME. In this work, we present a retrospective analysis of our modeling strategies and highlight our lessons learned. Through rigorous statistical benchmarking, we demonstrate that while classical methods remain highly competitive for predicting potency, modern deep learning algorithms significantly outperformed traditional machine learning in ADME prediction. We also illustrate the importance of appropriate data curation and the benefits of leveraging public datasets via feature augmentation. Finally, we outline current limitations and identify future opportunities including the integration of structure-guided modeling. Overall, these results not only provide practical guidance for building robust predictive models but also offer valuable insights into the field of computational drug discovery.
Thermodynamics of PAM Recognition by Cas9 of
Bhattacharya S, Goyal K and Satpati P
The CRISPR/Cas9 system from (Cas9) requires a canonical 5'-NGG-3' PAM sequence in target DNA for effective genome editing. Base-specific interactions between the guanines (second and third position) and arginine dyad (R1333 and R1335) ensured specificity. We evaluated the PAM recognition strength of Cas9 by using alchemical free energy calculations, revealing the energetics that influence genome editing accuracy. Cas9 does not discriminate at the first position of the NGG sequence, but it penalizes mutations in the second and third positions. Cas9 imposes a higher penalty for guanine mutation in the third PAM position compared to the second due to the greater conformational rigidity of R1335 in relation to R1333. Conformational rigidity of R1335 prevents side-chain readjustment for new protein-DNA interactions in noncanonical PAMs. A guanine-to-cytosine substitution in either the second or third position of canonical PAM disrupts direct protein-PAM interactions and leads to solvent exposure. This happens due to strong electrostatic repulsion between the arginine dyad's guanidinium groups and the amine group of cytosine. Interestingly, the strength of Cas9 in disfavoring a single cytosine substitution (by >10 kcal/mol) is comparable to that of disfavoring double base substitutions in the NGG sequence. The ability of Cas9 to differentiate between noncanonical and canonical PAMs (ΔΔ) is directly related to the number of direct interactions between Cas9 and the PAM sequence, as well as the degree of solvent exposure. Loss of direct interactions and increased solvent exposure enhance ΔΔ. The calculated ΔΔ adequately explains the observed differences in DNA cleavage activity of Cas9 across various DNA substrates with different PAM sequences. This study connects thermodynamics, structures, and activity to elucidate PAM selectivity in Cas9 and may also apply to other CRISPR/Cas systems, offering valuable insights for the rational design of Cas9 variants with modified PAM specificities.
Probabilistic Isolation of Crystalline Inorganic Phases
Ritchie D, Gaultois MW, Gusev VV, Kurlin V, Rosseinsky MJ and Dyer MS
We present Probabilistic Isolation of Crystalline Inorganic Phases (PICIP), a tool to accelerate materials discovery by automating the process of isolating unknown crystalline inorganic phases that have been experimentally detected. PICIP can be used by any lab worker, is well suited to both traditional as well as automated high-throughput exploratory workflows, and is a novel approach to isolating unknown phases based on experimental information from sampled compositions. PICIP infers the composition of an unknown phase in a mixed phase sample from the average composition of the sample and the weighted average composition of the known phases in that sample, relying on experimental phase identification and quantification. We implement a novel algorithm that infers the probability density for the unknown phase over a linear representation of compositional phase space. The accuracy of the suggested target compositions can be increased by systematically combining information from different sampled compositions across multiple experiments. This allows for the effective adoption of an iterative sampling strategy that suggests target compositions that converge to the composition of the unknown phase. The linear representation used for compositional phase space can exploit chemical constraints such as charge neutrality to reduce the dimension of the space, while implicitly ensuring only valid compositions are suggested. Simulated exploration of phase fields shows that after four sequential samples, or two batches of five samples, the median purity of the unknown crystalline phase is above 90%. PICIP's probabilistic construction makes it robust to moderate levels of experimental error in phase quantification (13 wt %), and allows for the identification of scenarios where there are significant levels of experimental error.
Selective Activation of GPCRs: Molecular Dynamics Shows Siponimod Binds but Fails To Activate S1PR2, Unlike S1PR1
Soniya K, Avadhani K, Nanduru C and Halder A
G protein-coupled receptors (GPCRs) are central to drug discovery, accounting for nearly 40% of approved pharmaceuticals due to their regulatory role in diverse physiological processes. Given the high structural similarity among homologues, achieving receptor selectivity while minimizing off-target effects remains a major challenge in designing drugs that target GPCRs. Sphingosine-1-phosphate receptors (S1PRs), comprising five subtypes, are therapeutically important GPCRs critical for immune and cardiovascular functions. Siponimod, an FDA-approved drug for multiple sclerosis, selectively modulates S1PR1 over S1PR2, unlike earlier S1PR modulators. However, the molecular basis for this selectivity is unclear, as cellular and biochemical assays provide limited insights. In this study, we used long-time-scale molecular dynamics simulations to investigate how S1P and Siponimod binding affects the structural dynamics of S1PR1 and S1PR2. Both ligands exhibited strong active site binding in both receptors. Crucially, while S1P and Siponimod induced similar activation-linked conformational changes in S1PR1, Siponimod failed to trigger these rearrangements in S1PR2. Specifically, Siponimod binding to S1PR2 led to altered side-chain dynamics of key TM7 residues (viz., Y, F, F) and a drift of transmembrane helix 6 (TM6) toward orientations observed in the inactive state. These unique structural features differentiate Siponimod's behavior from S1P and explain its inability to modulate S1PR2. Our findings elucidate molecular determinants of Siponimod's selectivity toward S1PR1 and highlight these residues as potential differentiators for selective modulator design. This study demonstrates how structural and dynamic insights from atomistic simulations aid rational drug design for targets with high homology.
A QM-AI Approach for the Acceleration of Accurate Assessments of Halogen-π Interactions by Training Neural Networks
Engelhardt MU, Mier F, Zimmermann MO and Boeckler FM
Noncovalent interactions, such as halogen bonds (XB), play a crucial role in molecular recognition and drug design, yet halogen···π contacts remain comparatively underexplored. Here, we report a proof-of-concept QM-AI approach that integrates high-level quantum mechanical (QM) calculations with neural networks (NNs) to predict halogen···π interaction energies. Nearly 1.4 million MP2/TZVPP single-point calculations on halobenzene-benzene complexes were carried out to generate exhaustive training data, which were represented by simple geometric descriptors as input features for machine learning. The resulting neural network model is specifically designed to capture σ-hole-driven halogen···π interactions under well-defined geometric constraints. The resulting model reproduced reference interaction energies with excellent accuracy ( = 0.998, RMSE = 0.16 kJ/mol) and maintained strong performance on independent, randomly generated and PDB-derived test sets. Previously, we have demonstrated in a benchmarking study that "gold standard" CCSD(T) energies of this interaction can be appropriately represented by MP2/TZVPP calculations, but at a better calculation efficiency by 2 orders of magnitude (∼10). Consequently, we herein exploit a methodological "extension" from CCSD(T) → MP2 → NNs. Our approach maintains accuracy close to CCSD(T) benchmarks while achieving a runtime acceleration of up to 8 orders of magnitude (∼10) compared to MP2 calculations. This study demonstrates the feasibility of fast, accurate neural network models based on QM data for halogen···π interactions in a QM-AI approach.
Rapid and Accurate Protein Structure Database Search Using Inverse Folding Model and Contrastive Learning
Lyu Q, Wei H, Chen S, Peng Z and Yang J
Protein structure database search has become increasingly challenging due to the growing number of experimental and computational structures. We introduce mTM-align2, a novel two-step approach for rapid and accurate protein structure database search. In the first step, protein structures are first transformed into embeddings using a pretrained inverse folding model (ESM-IF) and 3D Zernike polynomials. The ESM-IF embeddings are further optimized through a contrastive learning network, which is trained on ∼7 million structure pairs. Structures with similar embeddings are returned on the fly in this step. The second step employs a rapid structure alignment program to refine top candidates, ensuring high precision and producing high-quality alignments. Extensive benchmarks reveal that mTM-align2 performs competitively compared to other leading methods, completing monomeric structure search in seconds with over 90% precision for the top 10 hits. The t-SNE visualization of the mTM-align2 embeddings for thousands of structures demonstrates that our embeddings are structurally informed, capturing the global structural features. A web server for mTM-align2 is accessible at https://yanglab.qd.sdu.edu.cn/mTM-align/.
Predicting and Analyzing Nitrate Adsorption on High-Entropy Alloys Based on Pair Distribution Function Using a Hybrid Machine Learning Framework
Huynh TN, He X and Nguyen KD
Incorporating five or more metals into a single structure creates a new family of alloys, high-entropy alloys (HEAs), which hold several exceptional properties, such as outstanding stabilities and continuous electronic structures, making them promising catalysts for a range of chemical conversions. Due to the high-dimensional design space, machine learning algorithms are frequently used for the optimization of high-entropy alloys to achieve enhanced catalytic performance. The precision of machine learning depends on the structural features of high-entropy alloys, and this work aims to explore whether pair distribution function (PDF) data of HEAs can be adopted as input features for the probabilistic optimization of HEA-based catalysts through machine learning. Here, we first address the challenge of the high dimensionality of PDF data through principal component analysis (PCA), and then use the PCA-reduced PDF as input features to predict the Gibbs free energy of nitrate adsorption on the FeCoNiCuZn HEA surfaces via a hybrid framework comprising a transformer-based model and a fine-tuned large language model (LLM). The results show that the as-constructed hybrid framework can accurately predict the Gibbs free energy of nitrate adsorption using PCA-reduced PDF data, with performance significantly superior to that of conventional algorithms such as random forest, support vector regression, and gradient boosting. In the meantime, the use of LLM can further improve the prediction accuracy and extract interpretable insights from the data set, eventually allowing for the predictive design of HEA-based catalysts with optimized activity and selectivity.
Mechanistic Disruption of the TREM2-DAP12 Transmembrane Complex by Alzheimer's Disease Mutations: A Multiscale Simulation Study
Zhong Z, Ulmschneider M and Lorenz CD
Triggering receptor expressed on myeloid cell 2 (TREM2) is an immunomodulatory receptor that plays a critical role in microglial activation through its association with the adaptor protein DNAX-activation protein 12 (DAP12). Variants in TREM2 have been implicated as genetic risk factors for Alzheimer's disease (AD), most notably the extracellular domain variant R47H. However, recent studies highlight that transmembrane domain (TMD) mutations, including W191X in isoform-219 of TREM2, may also increase AD risk. Nonetheless, the molecular mechanisms underlying the TREM2-DAP12 complex formation and the role of specific TMD mutations in disease pathology remain unclear. Here, we employ multiscale molecular dynamics (MD) simulations to investigate the structural and dynamic effects of key TREM2 TMD mutations on the complex formed with DAP12 within a lipid bilayer composed of a POPC:cholesterol (80:20) mixture. Specifically, we analyzed four mutations in isoform-230 (K186A, K186X, W194A, W194X) and three constructs in isoform-219 (wild type, W191A, W191X). Our previous studies identified that K186 forms a critical salt bridge with DAP12 residue D50 in isoform-230. In this study, we extend this understanding by combining atomistic simulations with unsupervised machine learning approaches to analyze conformational changes across mutant complexes. Across variants, we observe isoform- and mutation-specific effects on helix orientation, hydrogen bonding, and electrostatic interactions that destabilize complex formation. This study provides atomistic-level insight into how disease mutations perturb membrane protein signaling interfaces and introduces a robust simulation and data-driven framework for studying transmembrane complexes involved in neurodegeneration and immunoreceptor function.
Computational Pipeline for Accelerating the Design of Glycomimetics
Xiao Y, Lee AH, Mahmoud S, Sameem B, Wentworth D, Wang X, Miller GD, Grant OC, Foley BL and Woods RJ
To accelerate the rational design of glycomimetic inhibitors, based on derivatization of a carbohydrate ligand, we introduce a computational pipeline that automates the creation and modeling of analogs and computes their interaction energies. Putative glycomimetics are assembled by grafting small drug-like moieties onto the native carbohydrate scaffold in the presence of the receptor protein, with the moieties chosen from a virtual library of more than 1500 molecular fragments, selected for their synthetic accessibility. The method is illustrated for the case of glycomimetics but is generalizable to any bound ligand. A genetic algorithm (GA) was developed to identify the most likely orientation of the appended moieties in the receptor binding site. For validation, curated experimental data sets were assembled from the literature, consisting of 119 glycomimetics, with reported solution binding free energies, including 46 with corresponding high-resolution crystal structures of the glycomimetic complexes. These data sets were subdivided for protocol testing and "real-world" performance validation. The GA search resulted in an average root-mean-squared deviation (RMSD) of 1.5 Å for the added moieties, compared to their crystallographic data. The GA-generated structures were then subjected to molecular dynamics (MD) simulation, and the performance was evaluated for three post-MD approaches to computing interaction energies: the scoring function from AutoDock Vina-Carb, as well as the generalized Born and Poisson-Boltzmann surface area (GBSA/PBSA) implementations within the AMBER molecular mechanical (MM) force field. For the Test data set of structures with reported energies, the highest coefficient of determination (R = 0.67) was obtained with MM-PBSA when ligand conformational entropies were included. Current limitations of the protocol and experimental data sets are discussed.
Computational Insights into the Regioselective Hydroxylation of Nirmatrelvir Metabolized by Cytochrome P450 3A4
Tang Y, Fang J, Liu G, Tang Y and Li W
Nirmatrelvir, a covalent inhibitor of SARS-CoV-2 main protease (M), is the active pharmaceutical ingredient in Paxlovid─the first oral antiviral drug granted emergency use authorization by the U.S. Food and Drug Administration (FDA) for the treatment of COVID-19. Previous studies revealed that the hepatic metabolism of nirmatrelvir was predominantly mediated by cytochrome P450 3A4 (CYP3A4) and that nirmatrelvir can be metabolized to generate multiple products. However, the precise molecular mechanism underlying the regioselective metabolism of nirmatrelvir by CYP3A4 remains to be disclosed. In this study, we performed an integrated computational strategy combining molecular docking, molecular dynamics (MD) simulations, and quantum chemical (QC) calculations to investigate the mechanisms of interaction between CYP3A4 and nirmatrelvir. Our simulation results revealed key insights into the spatial proximity between nirmatrelvir's carbon atoms and the reactive iron(IV)-oxo species (Compound I, Cpd I), with the C23 position showing the greatest accessibility, suggesting a strong preference for this site. Density functional theory (DFT) and quantum mechanics/molecular mechanics (QM/MM) calculations further demonstrated that the transition state for C23 hydroxylation possesses the lowest activation energy barrier, consistent with the experimentally observed regioselectivity in metabolite formation.
Solvent-Site Prediction for Fragment Docking and Its Implication on Fragment-Based Drug Discovery
Almena Rodriguez L, Spanke VA and Kersten C
The accuracy in the posing and scoring of low-affinity fragments is still a main challenge in fragment-based virtual screenings. The positive impact of including structural or predicted water molecules during docking on the docking performance is discussed frequently and is not conclusive so far. We present a comprehensive statistical evaluation of the effect of including crystallographic or predicted water molecules on the docking performance of fragment redocking. Further, cross-docking fragments into binding sites occupied by larger ligands and were elucidated. These cross-dockings imitate realistic use cases of fragment hit identification and fragment growing or synthon-based virtual screenings, respectively. Therefore, a new benchmark data set, called Frag2Lead containing 103 fragment-protein and corresponding lead-protein complexes, was compiled. Inclusion of water molecules during docking had a general positive impact on docking performance, but the preferred combination of the docking tool and water model varied across the different targets. A consensus approach over multiple solvent models and docking tools turned out to be beneficial for both re- and cross-dockings. Implementing constraints by template docking or pharmacophore features is advantageous for pose prediction for fragment growing approaches.
Mechanistic Insights into the Allosteric Regulation of P53 Y220C by Small-Molecule Stabilizers
Wen Y, Niu B, Meng J, Chen D, Li X, Zhang S, Ågren H, Zheng M and Teng D
The p53 Y220C mutation is a recurrent hotspot alteration that induces local unfolding and long-range functional disruption, compromising its tumor suppressor activity. While small-molecule stabilizers targeting this mutation have shown therapeutic promise, their underlying allosteric regulatory mechanisms remain poorly defined. Here, we investigate two p53 Y220C stabilizers with near-identical scaffolds but over 60-fold difference in activity, serving as a model to dissect the structural basis of differential efficacy. Through microsecond-scale molecular dynamics simulations and residue interaction network analysis, we reveal that the more active compound not only engages the mutation-induced cavity but also restores long-range cooperative networks and DNA-binding interfaces by rewiring key allosteric communication pathways disrupted by the mutation. Our results uncover a multilayered allosteric rescue mechanism involving dynamic pocket engagement, hydrophobic core reconstruction, and intramolecular signal reactivation. These findings move beyond conventional binding-affinity explanations and highlight the importance of network-level conformational regulation in mutant p53 rescue. This work establishes a mechanistic foundation for rational stabilizer design, proposing a new strategy centered on allosteric network restoration and mutation-adaptable anchoring. It offers broader implications for targeting conformationally unstable transcription factors previously considered "undruggable".
How KRAS Mutations Impair Intrinsic GTP Hydrolysis: Experimental and Computational Investigations
Song LF, Rabara D, Bali SK, Pei J, Lau EY, Kirshner D, Lightstone FC, McCormick F, Stephen AG and Yang Y
Oncogenic KRAS mutations impair GTP hydrolysis and increase the active GTP-bound KRAS population, which leads to growth-factor-independent cell proliferation and survival of cancer cells. Despite notable successes of small-molecule inhibitors in the treatment of KRASG12C cancer, many of these small-molecule inhibitors preferentially bind to inactive (GDP-bound) mutant KRAS, whose availability is limited by the slow rate of intrinsic GTP hydrolysis. A better understanding of how KRAS mutations impair intrinsic hydrolysis is important for designing more effective small-molecule therapeutics. In this work, experimental and computational approaches were utilized to investigate how the most important oncogenic mutations affect the intrinsic hydrolysis of GTP. We found that Q61H, G12V, and G12R mutations impair intrinsic hydrolysis by around 7-fold, 9-fold, and more than 20-fold, respectively, whereas G12A, G12C, G12D, and G13D have less effect. Based on mechanistic investigations, we propose that KRAS mutations impair intrinsic hydrolysis by disrupting the interactions needed to align the nucleophilic water molecule with GTP for nucleophilic attack. These results can assist small-molecule inhibitor design and also benefit the development of other therapeutic strategies, such as rescuing hydrolysis.
Relative BAT: An Automated Tool for Relative Binding Free Energy Calculations by the Separated Topologies Approach
Heinzelmann G, Huggins DJ and Gilson MK
Absolute (ABFE) and relative (RBFE) binding free energy calculations with all-atom molecular dynamics (MD) can significantly reduce costs in the early stages of drug discovery. We introduce a new implementation of the Binding Affinity Tool (BAT.py) software, which adds RBFE calculations using separated topologies (SepTop) to the already established ABFE fully automated workflow. SepTop combines the advantages of ABFE and RBFE, being applicable to ligands that have very little or no similarity, while at the same time avoiding common challenges of ABFE calculations, such as occluded binding sites and problematic conformational changes of the receptor upon ligand binding. Three different thermodynamic paths for the relative calculations were implemented into the BAT software using the AMBER and OpenMM simulation engines, and here we test them on the BRD4(2) benchmark system. We discuss their correlation with ABFE, standard RBFE, and experimental results, and also their associated computational cost.
Boosting Drug Discovery: Expanding the Applicability of Fragment Dissolved Molecular Dynamics to Accelerate Binding Mode Elucidation
Peralta-Moreno MN, Granadino-Roldán JM, Tomas MS and Rubio-Martinez J
The use of small organic molecules has become one of the most popular strategies in computer-aided drug design (CADD) to facilitate the identification of potential drug-like compounds in the early stages of drug development. In this scenario, novel computational approaches such as the use of the fragment dissolved molecular dynamics (fdMD) methodology emerged as a new framework for the modeling of ligand-receptor interactions. Consisting of molecular dynamics (MD) simulations of the target protein solvated with multiple copies of the same fragment, the original approach is able to identify the most favorable binding site for the system studied in a reasonable simulation time scale (0.2-1 μs). In the present work, we have introduced the use of Gaussian accelerated molecular dynamics (GaMD) to facilitate system exploration, accelerate binding site identification and additionally enhance binding mode elucidation. For this purpose, up to 12 different systems with crystallographic information available have been employed for validation.
SiteMatcher: A Web Server for Structure-Based Drug Design Using Protein-Ligand Interaction Patterns
Ke D, Zhou W, Zhang Z, Jin C, Wu Y, Pan X, Wang X, Xiao X and Ji C
With the rapid growth of structural data in the Protein Data Bank, efficient mining and utilization of protein-ligand interaction pattern information from these structures can advance rational drug design. Here, we present SiteMatcher, a web server that integrates a curated repository of protein-ligand interaction motifs from the Protein Data Bank with an automated fragment-grafting engine to facilitate structure-based ligand design. The engine identifies motifs similar to user-defined local protein sites, extracts ligand fragments from these motifs, and merges them─either directly or via suitable linkers─with the seed compound to produce ligands complementary to the target pocket. By leveraging site similarity-based fragment reuse, SiteMatcher helps users explore diverse ligand design strategies efficiently. We hope that SiteMatcher will serve as a practical tool for medicinal chemists. The server is freely available at https://sitematcher.xundrug.cn.