JOURNAL OF BIOMEDICAL INFORMATICS - 雨日青学习小站

Attention-based synthetic data generation for calibration-enhanced survival analysis: A case study for chronic kidney disease using electronic health records

Kuo NI, Gallego B and Jorm L

Access to real-world healthcare data is constrained by privacy regulations and data imbalances, hindering the development of fair and reliable clinical prediction models. Synthetic data offers a potential solution, yet existing methods often fail to maintain calibration or enable subgroup-specific augmentation. This study introduces Masked Clinical Modelling (MCM), an attention-based synthetic data generation framework designed to enhance survival model calibration in both global and stratified analyses.

View more:

Pubmed

J Biomed Inform

Pre-coding skin cancer from free-text pathology reports using noise-robust neural networks

Niemi T, Defossez G, Germann S and Bulliard JL

Population-based cancer registries receive numerous free-text pathology reports from which cancer cases are manually coded according to international standards. Skin cancer is the most frequent cancer in Caucasian populations, and its incidence is increasing. We developed an AI-based method to identify skin cancer, locate relevant key terms in pathological reports, and suggest coding for the main clinical variables.

View more:

Pubmed

J Biomed Inform

A joint learning framework for analyzing data from national geriatric centralized networks: A new toolbox deciphering real-world complexity

Shen B, Zhang Y, Travison TG, Shardell M, McCoy RG, Saegusa T, Falvey J and Chen C

We propose JLNet, along with a companion R software package, as a systematic joint learning framework for analyzing data from national geriatric centralized networks, such as Medicare Claims. JLNet addresses key challenges in real-world, large-scale healthcare datasets, including hospital-level clustering and heterogeneity, patient-level variability from high-dimensional covariates, and losses to follow-up, while promoting easy implementation to ultimately support decision-making.

View more:

Pubmed

J Biomed Inform

TopicForest: embedding-driven hierarchical clustering and labeling for biomedical literature

Chang CH, Ondov B, Choi B, Peng X, He H and Xu H

The rapid expansion of biomedical literature necessitates effective approaches for organizing and interpreting complex research topics. Existing embedding-based topic modeling techniques provide flat clusters at single granularities, which ignores the reality of complex hierarchies of subjects. Our objective is to instead create a forest of topic trees, each of which start from a broad area and drill down to narrow specialties.

View more:

Pubmed

J Biomed Inform

A method for characterizing disease progression from acute kidney injury to chronic kidney disease

Fang Y, Nestor JG, Ta CN, Kneifati-Hayek JZ and Weng C

Patients with acute kidney injury (AKI) are at high risk of developing chronic kidney disease (CKD), but identifying those at greatest risk remains challenging. We used electronic health record (EHR) data to dynamically track AKI patients' clinical evolution and characterize AKI-to-CKD progression.

View more:

Pubmed

J Biomed Inform

PMC Article

Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study

Wang X, Preston A, Aning J and Zhou SM

The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.

View more:

Pubmed

J Biomed Inform

MDD-MARF: a multimodal depression detection model based on multi-level attention mechanism and residual fusion

Zhou J, Ge J, Chen Z, Tan J and Li Y

Depression is a serious mental disorder that significantly affects patients' work ability and social functioning. With the rapid development of artificial intelligence, researchers have begun to explore automatic depression detection methods based on multimodal data. However, multimodal data are often accompanied by a large amount of noise. Existing methods usually lack sufficient feature screening after extraction and are directly applied to downstream tasks, which may limit the model's generalization ability. In addition, current multimodal fusion strategies still face several challenges.

View more:

Pubmed

J Biomed Inform

Multi-objective optimization formulation for Alzheimer's disease trial patient selection

Moayedikia A, Fin S and Wiil UK

Clinical trial recruitment faces critical challenges with screen failure rates exceeding 80% in Alzheimer's disease (AD) trials. Traditional patient selection relies on expert consensus without systematic evaluation of trade-offs between statistical power, recruitment feasibility, safety, and cost. We developed a multi-objective optimization framework to systematically identify optimal eligibility criteria configurations that balance competing objectives in AD clinical trial design.

View more:

Pubmed

J Biomed Inform

Turning dialogues into event data: Lessons from GPT-based recognition of nursing actions

Beerepoot I, Brinkkemper S, Huntink E, Duman B, Reijers HA and Bleijenberg N

To assess the feasibility of using a large language model (LLM) to generate structured event logs from conversational data in home-based nursing care, with the goal of reducing the documentation burden and enabling process analysis.

View more:

Pubmed

J Biomed Inform

Corrigendum to "Drug repositioning with metapath guidance and adaptive negative sampling enhancement" [J. Biomed. Inform. 171 (2025) 104916]

Zhou Y, Shi X, Wang L, Xu J, Li D and Chen C

View more:

Pubmed

J Biomed Inform

Predicting drug-target interactions based on multivariate information fusion and graph contrast learning

Yang S, He PA, Zeng P, Meng Y, Zhang Z, Cui F, Yao Y, Yang J and Xu J

Drug-target interaction (DTI) prediction is of great significant in stimulating innovation and research in the medical field. In recent years, traditional experimental methods for predicting DTIs have proven to be time-consuming and costly. As a result, machine learning methods have been extensively applied to improve the prediction of drug-target interactions. However, the sparsity of inter-node connections often results in insufficiently learned node representations. Furthermore, many methods do not take into account the topological similarity between nodes when integrating similarities. This study proposes a model that integrates multiple sources of information and utilizes Graph Contrastive Learning (GCL) to predict potential drug and target interactions (MGCLDTI). Firstly, MGCLDTI employs the DeepWalk algorithm to extract global topological representations from the heterogeneous graph which incorporates multi-view information of drugs, targets, and diseases. Subsequently, a densification strategy is implemented to alleviate the noise impact arising from the sparsity of the DTI matrix. Furthermore, a GCL model with node masking is applied to enhance local structural awareness and optimize the embeddings of drugs and targets. Finally, DTI scores are predicted using the LightGBM algorithm. Comparative results against state-of-the-art methods demonstrate that MGCLDTI achieves superior predictive performance. Besides, ablation studies reveal the effectiveness of each component. Case studies also provide compelling evidence of MGCLDTI's accuracy in identifying potential DTIs.

View more:

Pubmed

J Biomed Inform

Multi-scale cancer driver gene prediction by flexible data selection and network topology guidance

Liu J, Ren Y, Xiao G, Li P, Sun C, Chen J, Ma F, Gao R, Mi J, Cong H, Wang M and Zhang Y

Efficient and comprehensive prioritization of cancer driver genes across individual patients, cancer cohorts, and pan-cancer is crucial for advancing cancer diagnosis and treatment. The existing methods are effective, but they seem to have reached a plateau in accuracy enhancement and lack broad-scale joint analysis, flexibility in adapting to cancer and interpretability.

View more:

Pubmed

J Biomed Inform

Study on multimodal spatially-constrained contrastive learning for knee osteoarthritis severity grading

Wu Y, Xiang Z, Tan Y, Hu J, Chen D, Zhao J and Wei H

To address the limitations of single-modal feature coverage and class distribution imbalance in knee osteoarthritis (KOA) classification, this study proposes a Multimodal Spatial-constraint Contrastive Learning (MSCL) model. First, dynamic and static plantar pressure data and human keypoint trajectories are synchronously acquired. The model first feeds dynamic plantar pressure and keypoint data into a multimodal spatial-temporal fusion branch, where graph convolutional networks and Transformers extract spatial-temporal representations of human keypoints and dynamic pressure patterns respectively, followed by Cross Attention fusion. Subsequently, static plantar pressure is processed through a pyramid CNN architecture to generate coarse-grained spatial constraint vectors, which serve as anatomical priors to regularize the fused representations. Finally, a contrastive learning framework is integrated to establish explicit mapping between the enhanced representations and Kellgren-Lawrence (KL) grading system, enabling precise KOA severity stratification. Experimental results demonstrate that the MSCL model achieves 0.94 macro-average accuracy in KL grading, with 7% improvement in F1-scores for imbalanced categories with limited samples. This work establishes a novel paradigm for accurate KOA assessment through multimodal gait analysis.

View more:

Pubmed

J Biomed Inform

Caption-augmented reasoning model with Hierarchical rank LoRA finetuing for medical visual question Answering

Li Y, Man J, Zhou Y and Liang L

Medical Visual Question Answering (VQA) is a quintessential application scenario of biomedical Multimodal Large Language Models (MLLMs). Previous studies mainly focused on input image-question pairs, neglecting the rich medical knowledge of the relevant captions of the pretrained datasets. This limits the model's reasoning capability and causes overfitting. This paper aims to effectively utilize the captions of pretrained datasets to solve the above issues.

View more:

Pubmed

J Biomed Inform

LLM-DQR: Large language model-based automated generation of data quality rules for electronic health records

Xie S, Cai H, Sun Y and Lv X

To develop and evaluate LLM-DQR, an automated approach using large language models to generate electronic health record data quality rules, addressing the limitations of current manual and automated methods that suffer from low efficiency, limited flexibility, and inadequate coverage of complex business logic.

View more:

Pubmed

J Biomed Inform

SemNovel - A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models

Peng X, Xie Y, He H, Ondov B, Raja K, Liu Q, Mei Q and Xu H

The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.

View more:

Pubmed

J Biomed Inform

Domain adaptation of stable diffusion for ultrasound inpainting: a synthetic data approach for enhanced thyroid nodule segmentation

Prochazka A and Zeman J

To enhance the cross-domain generalization of thyroid-nodule segmentation models by augmenting limited ultrasound training data with synthetic images generated by a fine-tuned Stable Diffusion model.

View more:

Pubmed

J Biomed Inform

Scalable scientific interest profiling using large language models

Liang Y, Zhang G, Sun E, Idnay B, Fang Y, Chen F, Ta C, Peng Y and Weng C

Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.

View more:

Pubmed

J Biomed Inform

Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction

Zhuang L, Tabatabaei SMH, Salehi-Rad R, Tran LM, Aberle DR, Prosper AE and Hsu W

Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.

View more:

Pubmed

J Biomed Inform

Multimodal large language models and mechanistic modeling for glucose forecasting in type 1 diabetes patients

Wolber JC, E Samadi M, Sellin J and Schuppert A

Management of type 1 Diabetes remains a significant challenge as blood glucose levels can fluctuate dramatically and are highly individual. We introduce an innovative approach that combines multimodal Large Language models (mLLMs), mechanistic modeling of individual glucose metabolism and machine learning (ML) for forecasting blood glucose levels.

View more:

Pubmed

J Biomed Inform

Clinical pathway-aware large language models for reliable and transparent medical dialogue

Wu J, Wu X, Zheng Y and Yang J

Large language models (LLMs) offer promising potential in answering real-time medical queries, but they often produce lengthy, generic, and even hallucinatory responses. We aim to develop a reliable and interpretable medical dialogue system that incorporates clinical reasoning and then mitigates the risk of hallucination.

View more:

Pubmed

J Biomed Inform