Deep continual multitask out-of-hospital incident severity assessment from changing clinical features
When developing machine learning models to support emergency medical triage, it is important to consider how changes over time in the input features can negatively affect the models' performance. The objective of this study was to assess the effectiveness of novel deep continual learning pipelines in maximizing model performance when input features change over time, including the emergence of new features and the disappearance of existing ones.
CURENet: combining unified representations for efficient chronic disease prediction
Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient's health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In this paper, we present CURENet, a multimodal model (Combining Unified Representations for Efficient chronic disease prediction) that integrates unstructured clinical notes, lab tests, and patients' time-series data by utilizing large language models (LLMs) for clinical text processing and textual lab tests, as well as transformer encoders for longitudinal sequential visits. Curenet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses. We evaluated CURENet using the public MIMIC-III and private FEMH datasets, where it achieved over 94% accuracy in predicting the top 10 chronic conditions in a multi-label framework. Our findings highlight the potential of multimodal EHR integration to enhance clinical decision-making and improve patient outcomes.
Vaner2: towards more general biomedical named entity recognition using multi-task large language model encoders
Biomedical named entity recognition (BioNER) serves as the foundation for many downstream tasks, such as relation extraction, question answering, and clinical text analysis. BioNER was previously dominated by BERT-based models pretrained on large biomedical corpora. However, BERT-based models finetuned on specific BioNER datasets exhibit limited ability to generalize to other datasets. With the recent advances of large language models (LLMs), several works have fine-tuned autoregressive LLMs that are not inherently suitable for BioNER tasks, which limits model performance. In this study, building upon our previous work VANER, we utilized LLMs with the causal attention mask removed as a text encoder for sequence labeling. Using 39 BioNER datasets, we trained a multi-task NER model that extracts all entity types with one LLM forward pass. We also proposed a token-wise loss rescaling technique to deal with the data imbalance between different tags and entity types. Extensive experiments on independent test datasets demonstrated that our VANER2 model achieved the best generalization results compared with BERT-based baselines and several recent BioNER methods. VANER2 is freely available at https://github.com/ZhuLab-Fudan/VANER2.
Large language models in healthcare: a systematic evaluation on medical Q/A datasets
This study systematically evaluates the performance of state-of-the-art large language models (LLMs) in medical and healthcare applications, focusing on their accuracy in answering domain-specific questions. Using benchmark medical question-answering datasets-PubMedQA, MedQA, and MedMCQA-we assess a diverse set of LLMs, including GPT-4, Med-PaLM-2, OpenBioLLM, BioMistral, MediTron, MedAlpaca, and AlpaCare. Our analysis highlights the varying capabilities of these models across different datasets, emphasizing the impact of model scale, domain-specific fine-tuning, and dataset-specific challenges. Larger models such as OpenBioLLM-70B and Med-PaLM-2, consistently outperformed smaller models, showcasing the benefits of extensive training data and computational resources. However, smaller models, like BioMistral-7B, demonstrated competitive performance on specific datasets, suggesting their potential for resource-constrained environments. Beyond accuracy, we explore the broader implications of LLMs in healthcare, including their applications in medical diagnosis, patient care, clinical decision support, and drug discovery. Despite their promise, LLMs face critical challenges, such as the need for explainability, robust data security, bias mitigation, and hallucination reduction. We conclude that while challenges remain, LLMs hold significant potential to transform healthcare by enhancing efficiency, improving patient outcomes, and facilitating advancements in medical research. Addressing the limitations and promoting responsible innovation will be essential to unlocking their full potential for a patient-centered and equitable healthcare future.
Text-based prediction of ımmunohistochemical biomarkers in breast cancer using a generative large language model: a retrospective study
Immunohistochemical (IHC) biomarkers such as estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki-67 are essential for the classification and treatment of breast cancer. While radiomics-based models have demonstrated potential in non-invasive biomarker prediction, the utility of large language models (LLMs) for this task using only textual clinical data remains largely unexplored. This study aimed to evaluate the performance of ChatGPT-4o, a generative LLM, in predicting key IHC biomarkers based solely on structured radiological and pathological reports.
A multimodal approach for cardiac signals classification using deep learning with explainable AI methods
Cardiovascular diseases remain a leading cause of mortality worldwide, necessitating accurate and timely diagnosis. Electrocardiogram (ECG) and phonocardiogram (PCG) signals provide complementary information about cardiac function, electrical and mechanical activity, respectively. In this study, we propose a multimodal deep learning framework that integrates ECG and PCG using a dual-branch CNN-BiLSTM-SE architecture with cross-modal attention. Our preprocessing pipeline includes wavelet denoising, adaptive filtering, and normalization, with parameters tuned for each dataset's noise profile. We evaluate the model on multiple datasets: MIT-BIH Arrhythmia (47 subjects), PTB Diagnostic ECG (290 subjects), PhysioNet PCG Challenge 2016 (3126 subjects), PhysioNet PCG Challenge 2022 (942 subjects), and a custom multimodal dataset (500 subjects). The model achieves an overall accuracy of 97.0%, F1-scores ranging from 94.3% to 98.1%, and AUC values above 0.982 for all classes, outperforming single-modality and existing multimodal methods. Explainable AI techniques (SHAP, Grad-CAM, Integrated Gradients) reveal that the model focuses on clinically relevant features such as irregular R-R intervals in atrial fibrillation and systolic murmurs in valvular disease. The proposed approach offers a feasible, interpretable, and accurate decision-support system for cardiac diagnosis.
Reasoning with large language models in medicine: a systematic review of techniques, challenges and clinical integration
Large Language Models (LLMs) have emerged as transformative tools in healthcare, demonstrating unprecedented capabilities in medical reasoning tasks that require complex inference, pattern recognition, and decision-making under uncertainty. This comprehensive review examines the current state of LLMs applications in medical reasoning across diverse clinical contexts, including diagnostic reasoning, clinical decision support, medical imaging analysis, drug discovery, and patient management. We systematically analyze the methodological approaches used to adapt and evaluate LLMs, comparing their performance against traditional clinical decision support systems and human clinicians. We further provide a critical comparative analysis of architectural adaptations, fine-tuning techniques, and domain-specific evaluation protocols. Our review encompasses models such as GPT-4, PaLM, Med-PaLM, and BioGPT, highlighting how model design and training paradigms influence reasoning capabilities, generalization, and clinical applicability. We assess their ability to process multimodal data, generate hypotheses, and provide evidence-based recommendations. Distinct adaptation methods such as prompt engineering, few-shot learning, and reinforcement learning with human feedback are examined for their impact on medical accuracy and robustness. We identify key technical challenges including hallucinations, inherited biases, and accountability issues, along with ethical and deployment barriers. Unlike prior reviews, we emphasize open research problems in Artificial Intelligence (AI), including symbolic integration, and context-aware reasoning framing LLMs as computational systems that push the boundaries of interdisciplinary computer science. While LLMs offer promise for enhancing diagnostic accuracy and decision-making, substantial challenges remain. The review concludes by outlining future directions, including hybrid neuro-symbolic models, rigorous evaluation frameworks, and human-in-the-loop systems to ensure safe, transparent, and fair integration into healthcare. Rather than replacing clinicians, LLMs are best positioned as collaborative AI agents advancing the frontiers of intelligent, assistive technologies in medicine.
Construction of performance score dynamic prediction system for clinical departments using explainable machine learning
Accurate evaluation of clinical departmental performance is essential for public hospital management. However, existing approaches primarily rely on static, retrospective annual assessments and lack interpretability, limiting their ability to support early intervention and informed decision-making. To address these gaps, the study aimed to present a dynamic framework for predicting annual departmental performance score based on real-world hospital data using explainable machine learning.
Detecting self-harm in social media using term weighting schemes based on the distance between words and personal pronouns
Self-harm is an increasing public health problem with high prevalence rates in adolescents. Furthermore, it can be an indicator of different mental health disorders (e.g., depression). Recently, diverse computational methods have leveraged user-generated data from social media to study and identify this issue following a text classification perspective. In this context, previous work has shown that personal statements -phrases containing first-person pronouns- contain valuable information for modeling the profiles of authors, including some mental health conditions. Motivated by these discoveries, this paper examines the relevance of personal statements to tackle the self-harm detection task on social media. Furthermore, we adapted approaches that pay special attention to words in this type of sentence to reveal characteristics of users, specifically, self-harming behavior. Currently, these approaches assign the same level of importance to words contained in personal phrases without distinguishing which are more associated with the personal contexts of authors. Hence, we introduced a novel weighting factor that exploits the proximity between personal pronouns and words to quantify their relevance in the task. This novel weighting factor, inspired by findings from author profiling and depression detection studies, is being evaluated for the first time in the context of self-harm detection. Our experimental results demonstrated significant improvements over state-of-the-art methods in self-harm detection, including transformer-based methods and pretrained language models for mental healthcare. This refined approach not only surpasses the previous weighting factor in its application to depression detection, but also exceeds it by a difference of more than 3.9%.
A hybrid deep learning approach for accurate diagnosis of tibiofibula open and closed fractures using x-ray images
Open fractures are critical injuries that require prompt and accurate diagnosis to optimize treatment outcomes. Traditional methods often rely on manual interpretation of radiological images, which can be prone to human error. With advancements in deep learning, there is a significant opportunity to enhance the precision of fracture classification through automated systems.
Association between high-dose calcium supplementation during pregnancy and risk of placental disorders and low birth weight in Iranian women
Although calcium is crucial for maternal and fetal health, the association between high-dose calcium supplementation during pregnancy and adverse outcomes is not fully understood. This study examined the association between calcium supplementation during pregnancy and adverse pregnancy outcomes among Iranian women.
Disulfidptosis-associated gene signatures in sepsis: a diagnostic model based on an LLM-assisted bioinformatics analysis
This study investigated the involvement of disulfidptosis in the pathophysiology of sepsis by applying a bioinformatics analysis assisted by large language models (LLMs).
CTGFusionNet: fusion of deep learning models for predicting fetal distress-a multimodal approach
Cardiotocography (CTG) is a widely used technique for fetal monitoring. This study presents CTGFusionNet, a novel multimodal adaptive framework designed for prenatal analysis. The framework integrates attention-based adaptive Bi-Directional Convolutional Neural Networks (Bi-CNN) with Long Short-Term Memory (LSTM) networks to improve the accuracy of fetal distress prediction. The methodology begins with an initial data preprocessing phase, followed by signal segmentation and enhancement. Thereafter, the FHR and UC signals are transformed into two-dimensional representations using embedding layers and subsequently integrated through concatenation. The spatial features of the synchronized signals are extracted using the proposed adaptive Bi-CNN. Multi-head attention is then applied to emphasize the most relevant information, and the temporal features are captured using an LSTM network. In the final stage, the most relevant features from the perinatal clinical data are identified using the Relief, Lasso, and Information Gain algorithms and then integrated with the processed signals. Furthermore, classification results are obtained using a fully connected layer and sigmoid function. The results demonstrate that CTGFusionNet leads to significant improvements in performance measures, namely accuracy, sensitivity, and specificity, with values of 97.85%, 97.07%, and 98.65%, respectively. This suggests that CTGFusionNet-a multimodal approach that combines FHR, UC, and clinical data, provides a more reliable and precise method for the early detection and prediction of fetal distress. The proposed approach has the potential to significantly improve prenatal care outcomes by enabling accurate interventions.
GradCAM as an explicability method to evaluate the performance of deep learning models in classifying pediatric arteriovenous malformations (AVM) in arterial spin labeling sequences (ASL)
The study investigates the usefulness of Convolutional Neural Networks (CNNs) in accurately detecting arteriovenous malformations in pediatric medical imaging, particularly using arterial spin labeling sequences. It also aims to offer diagnostic explanations comparable to expert analysis.
Observe, align, and enhance: a hierarchical retrieval-augmented vision-language model for generating radiology reports
Radiology Report Generation (RRG) is designed to automatically generate diagnostic narratives based on radiological image interpretation, supporting clinicians in making diagnoses and relieving radiologists of reporting pressure. Previous approaches mainly use generative architectures developed to change visibly produced features into coherent written results. However, these approaches generally run into problems with precisely aligning written outputs to visible data, especially in the process of producing thorough diagnostic narratives. We propose a new, hierarchical retrieval-enhanced framework called Observe, Align, and Enhancement (OAE) for the creation of the radiology report to get around these problems: , which leverages retrieval techniques to enhance visual feature comprehension by identifying similar images and associated reports; , where the retrieved contextual reports guide the generation of an initial diagnostic report ensuring semantic consistency; and , an iterative refinement process that incorporates additional textual information to improve semantic coherence and diagnostic precision. The proposed framework is shown in a comprehensive assessment of the IU-XRAY and MIMIC-CXR datasets, which shows that it performs better than the current state-of-the-art approaches, including improved diagnostic accuracy and reported quality.
Reconstructing brain causal dynamics for subject and task fingerprints using fMRI time-series data
Recently, there has been a revived interest in system neuroscience causation models, driven by their unique capability to unravel complex relationships in multi-scale brain networks. In this paper, we present a novel method that leverages causal dynamics to achieve effective fMRI-based subject and task fingerprinting.
MOLiNAS: multi-objective lightweight neural architecture search for whole-slide multi-class blood cell segmentation
Blood cell analysis plays a key role in clinical diagnosis and hematological research. The accurate identification and quantification of different blood cell types is essential for the diagnosis of various diseases. The conventional manual method of blood cell analysis is both laborious and time-consuming, highlighting the need for automated segmentation techniques. In this paper, the blood cell segmentation problem is considered as a multi-class segmentation problem to detect the different types of blood cells in a given image. Two new multi-objective lightweight neural architecture search (NAS) algorithms (MOLiNAS) are designed to tackle the challenge of whole-slide multi-class blood cell segmentation problems. Our approaches integrate the most advantageous aspects of different approaches to search for the best U-shaped network architecture. The performance of our approaches is compared with lightweight networks and NAS studies in the literature. Our best solution (MOLiNASv2_sol3) achieves an IoU of 87.33 ± 1.53%, F1 score of 91.69 ± 1.20%, Precision of 93.50 ± 1.15%, and Recall of 91.34 ± 0.01%, outperforming lightweight networks such as EfficientNet, MobileNetv2, and MobileNetv3 across all segmentation metrics. Moreover, our approaches demonstrate highly competitive performance by utilizing up to 7.38 times fewer FLOPs and up to 4.03 times fewer trainable parameters than existing NAS studies while requiring only 0.07 million parameters. Additionally, ablation studies and cross-dataset evaluations demonstrate the robustness and generalizability of our approach.
Evaluating topological and graph-theoretical approaches to extract complex multimodal brain connectivity patterns in multiple sclerosis
Brain networks, or graphs, derived from magnetic resonance imaging (MRI) offer a powerful framework for representing the structural, morphological, and functional organization of the brain. Graph-theoretical metrics have been widely employed to characterize properties such as efficiency, integration, and communication within these networks. More recently, topological data analysis techniques, such as persistent homology and Betti curves, have emerged as complementary approaches for capturing higher-order network patterns. In this study, we present a comparative analysis of these feature-generation methodologies in the context of neurodegenerative disease. Specifically, we evaluate the effectiveness of Betti curves and graph-theoretical metrics in extracting features for distinguishing people with multiple sclerosis (PwMS) from healthy volunteers (HV). Features are derived from structural connectivity, morphological gray matter, and resting-state functional networks, using both single layer and multilayer graph architectures. Our experiments, conducted on a cohort of PwMS and HV, demonstrate that features extracted using Betti curves generally outperform those based on graph-theoretical metrics. Furthermore, we show that multimodal data in terms of feature concatenation and multilayer graph architectures provide a more comprehensive representation of alterations in complex brain mechanisms associated with MS, leading to improved classification performance. These findings highlight the potential of topological features and multimodal integration for enhancing the understanding and diagnosis of neurodegenerative disorders.
Structured reflective reasoning for precise medical knowledge graph retrieval augmented generation
Integrating large language models (LLMs) with medical knowledge graphs presents a promising frontier in healthcare AI, enabling more accurate clinical decision support, patient-specific recommendations, and interpretable diagnostic reasoning. However, the complexity of multi-step reasoning over medical ontologies and patient data graphs reveals the limitations of current Chain-of-Thought-based approaches. These methods struggle with incomplete subgraph retrieval, inefficient multi-hop reasoning across clinical entities, and challenges in contextualizing longitudinal patient data. To address these limitations, we propose SRR-RAG, a structured reasoning retrieval framework tailored for the medical domain. SRR-RAG enhances traditional retrieval-augmented generation by explicitly encoding clinical relationships, temporal constraints, and multi-hop dependencies within medical queries. This structured approach supports comprehensive reasoning across complex medical graphs, enabling accurate and interpretable responses for tasks like differential diagnosis or treatment planning. To mitigate semantic ambiguity and cognitive bias in structured query generation, we introduce type-aware pre-anchoring and reflective reasoning strategies. These mechanisms improve the alignment between natural language queries and graph-based medical knowledge, enhancing retrieval precision and clinical relevance. Extensive experiments on benchmark datasets and simulated electronic health records demonstrate that SRR-RAG significantly outperforms existing Graph RAG approaches in retrieval accuracy, reasoning completeness, and computational efficiency.
Neuro-signaling techniques in video gaming endorsements: a cognitive and neural dynamics approach
Technological advancements have significantly transformed the gaming industry, shifting from computer to mobile games. According to a global games market report, the gaming market was projected to reach $180.1 billion by 2021. Consequently, time and money invested in mobile games have risen significantly. The gaming strategy plays a vital role in enhancing conceptual thinking, problem-solving abilities, and cognitive skills. However, comprehensive studies comparing neural dynamics and cognitive skills while playing different types of games using electroencephalography (EEG) signals are still lacking. This study investigates the cognitive and neural effects of action and puzzle video games on brain activity, focusing on memory processing, cognitive load, and alterations in brain rhythms.
Semi-supervised abdominal multi-organ segmentation via dual-task de-biased consistency
Abdominal multi-organ segmentation suffers from the problems of unbalanced classes and difficult learning of dynamic organs, which leads to the segmentation effect being seriously affected. We present a dual-task de-bias consistency semi-supervised framework for the segmentation of several abdominal organs. First, multi-class hausdorff distance loss is proposed for unsupervised loss. This loss is more sensitive to shapes and boundaries, focusing on the distance of the maximum error. It effectively captures boundary errors of small classes and enhances overall segmentation performance by balancing the learning of each class. Secondly, a dual debiasing strategy is proposed for the two task branches of pixel and contour prediction, dynamically adjusting for data bias and learning bias. It adjusts the focus of attention timely to help the model learn small and difficult classes. Finally, experiments conducted on two different datasets demonstrate that the proposed model achieves optimal performance and can effectively improve the accuracy of small classes. On the Synapse dataset with 10% labels, our model achieves an improvement of 6.9 in avg DSC.
