Effective contextual feature fusion and individualized information for automated sleep staging
Polysomnography (PSG)-based sleep stage interpretation is crucial for diagnosing sleep disorders. Over the past decade, scholars have shown that machine learning offers a faster and more convenient alternative to manual visual interpretation of sleep stages and patterns. However, neglecting contextual features and individual case differences has hindered model application in new sleep staging cases. In this paper, we propose a sleep staging model that integrates contextual feature fusion and an individualized framework. The model incorporates weighted features from multiple epochs into the scoring process, enabling accurate scoring of 30-second epoch signals. Our individualized framework is tailored for emerging cases in real-world scenarios. It aggregates unique case information to derive individualized pseudo-labels, significantly enhancing automatic sleep staging accuracy through non-independent training. This strategy effectively addresses model degradation caused by differences between training cases and single new cases. To demonstrate our approach's efficacy, we evaluated our automated sleep staging algorithm on the Sleep-EDF-20 and Sleep-EDF-78 datasets, achieving accuracy of 85.3% and 80.8%, respectively. Furthermore, our individualized framework achieved 79.1% accuracy on the UCD dataset. These results underscore its potential as an effective tool for sleep stage classification, supporting physicians and neurologists in diagnosing sleep disorders. The proposed framework is lightweight and suitable for integration into clinical decision support system for sleep medicine, with a clear pathway for collaboration with routine laboratory scoring processes to support practical deployment.
Colorectal disease diagnosis with deep triple-stream fusion and attention refinement
Colorectal cancer constitutes a significant proportion of global cancer-related mortality, underscoring the imperative for robust and early-stage diagnostic methodologies. In this study, we propose a novel end-to-end deep learning framework that integrates multiple advanced mechanisms to enhance the classification of colorectal disease from histopathologic and endoscopic images. Our model, named TripleFusionNet, leverages a unique triple-stream architecture by combining the strengths of EfficientNetB3, ResNet50, and DenseNet121, enabling the extraction of rich, multi-level feature representations from input images. To augment discriminative feature modeling, a Multi-Scale Attention Module is integrated, which concurrently performs spatial and channel-wise recalibration, thereby enabling the network to emphasize diagnostically salient regions. Additionally, we incorporate a Squeeze-Excite Refinement Block (SERB) to selectively enhance informative channel activations while attenuating noise and redundant signals. Feature representations from the individual backbones are adaptively fused through a Progressive Gated Fusion mechanism that dynamically learns context-aware weighting for optimal feature integration and redundancy mitigation. We validate our approach on two colorectal benchmarks: CRCCD_V1 (14 classes) and LC25000 (binary). On CRCCD_V1, the best performance is obtained by a conventional classifier trained on our 256-D TripleFusionNet embeddings-SVM (RBF) reaches 96.63% test accuracy with macro F1 96.62%, with the Stacking Ensemble close behind. With five-fold cross-validation, it yields comparable out-of-fold means (0.964 with small standard deviations), confirming stability across partitions. End-to-end image-based baselines, including TripleFusionNet, are competitive but are slightly surpassed by embedding-based classifiers, highlighting the utility of the learned representation. On LC25000, our method attains 100% accuracy. Beyond accuracy, the approach maintains strong precision, recall, F1, and ROC-AUC, and the fused embeddings transfer effectively to multiple conventional learners (e.g., Random Forest, XGBoost). These results confirm the potential of the model for real-world deployment in computer-aided diagnosis workflows, particularly within resource-constrained clinical settings.
Benchmarking pathology foundation models for predicting microsatellite instability in colorectal cancer histopathology
The rapid evolution of pathology foundation models necessitates rigorous benchmarking for clinical tasks. We evaluated three leading foundation models, UNI, Virchow2, and CONCH, for predicting microsatellite instability status from colorectal cancer whole-slide images, an essential routine clinical test. Our comprehensive framework assessed stain, tissue, and resolution invariance using datasets from The Cancer Genome Atlas (TCGA, USA; n = 409) and Pathology Artificial Intelligence Platform (PAIP, South Korea; training n = 47, testing n = 21 and n = 78). We developed an efficient pipeline with minimal preprocessing, omitting stain normalization, color augmentation, and tumor segmentation. To improve contextual encoding, we applied a five-crop strategy per patch, averaging embeddings from the center and four peripheral crops. We compared three slide-level aggregation and four efficient adaptation strategies. CONCH, using 2-cluster aggregation and ProtoNet adaptation, achieved top balanced accuracies (0.775 and 0.778) in external validation on PAIP. Conversely, UNI, with mean-averaging aggregation and ANN adaptation, excelled in TCGA cross-validation (0.778) but not in external validation (0.764), suggesting potential overfitting. The proposed 5-Crop augmentation enhances robustness to scale in UNI and CONCH and reflects intrinsic invariance achieved by Virchow2 through large-scale pretraining. For prescreening, CONCH demonstrated specificity of 0.65 and 0.45 at sensitivities of 0.90 and 0.94, respectively, highlighting its effectiveness in identifying stable cases and minimizing number of rapid molecular tests needed. Interestingly, a fine-tuned ResNet34 adaptation achieved superior performance (0.836) in the smaller internal validation cohort, suggesting current pathology foundation models training recipes may not sufficiently generalize without task-specific fine-tuning. Interpretability analyses using CONCH's multimodal embeddings identified plasma cells as key morphological features differentiating microsatellite instability from stability, validated by pathologists (accuracy up to 92.4 %). This study underscores the feasibility and clinical significance of adapting foundation models to enhance diagnostic efficiency and patient outcomes.
Trends and applications of variational autoencoders in medical imaging analysis
Automated medical imaging analysis plays a crucial role in modern healthcare, with deep learning emerging as a widely adopted solution. However, traditional supervised learning methods often struggle to achieve optimal performance due to increasing challenges such as data scarcity and variability. In response, generative artificial intelligence has gained significant attention, particularly Variational Autoencoders (VAEs), which have been extensively utilized to address various challenges in medical imaging. This review analyzed 118 articles published in the Web of Science database between 2018 and 2024. Bibliometric analysis was conducted to map research trends, while a curated compilation of datasets and evaluation metrics were extracted to underscore the importance of standardization in deep learning workflows. VAEs have been applied across multiple healthcare applications, including anomaly detection, segmentation, classification, synthesis, registration, harmonization, and clustering. Findings suggest that VAE-based models are increasingly applied in medical imaging, with Magnetic Resonance Imaging emerging as the dominant modality and image synthesis as a primary application. The growing interest in this field highlights the potential of VAEs to enhance medical imaging analysis by overcoming existing limitations in data-driven healthcare solutions. This review serves as a valuable resource for researchers looking to integrate VAE models into healthcare applications, offering an overview of current advancements.
A CNN-Transformer fusion network for Diabetic retinopathy image classification
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, yet current diagnosis relies on labor-intensive and subjective fundus image interpretation. Here we present a convolutional neural network-transformer fusion model (DR-CTFN) that integrates ConvNeXt and Swin Transformer algorithms with a lightweight attention block (LAB) to enhance feature extraction. To address dataset imbalance, we applied standardized preprocessing and extensive image augmentation. On the Kaggle EyePACS dataset, DR-CTFN outperformed ConvNeXt and Swin Transformer in accuracy by 3.14% and 8.39%, while also achieving a superior area under the curve (AUC) by 1% and 26.08%. External validation on APTOS 2019 Blindness Detection and a clinical DR dataset yielded accuracies of 84.45% and 85.31%, with AUC values of 95.22% and 95.79%, respectively. These results demonstrate that DR-CTFN enables rapid, robust, and precise DR detection, offering a scalable approach for early diagnosis and prevention of vision loss, thereby enhancing the quality of life for DR patients.
ESAM2-BLS: Enhanced segment anything model 2 for efficient breast lesion segmentation in ultrasound imaging
Ultrasound imaging, as an economical, efficient, and non-invasive diagnostic tool, is widely used for breast lesion screening and diagnosis. However, the segmentation of lesion regions remains a significant challenge due to factors such as noise interference and the variability in image quality. To address this issue, we propose a novel deep learning model named enhanced segment anything model 2 (SAM2) for breast lesion segmentation (ESAM2-BLS). This model is an optimized version of the SAM2 architecture. ESAM2-BLS customizes and fine-tunes the pre-trained SAM2 model by introducing an adapter module, specifically designed to accommodate the unique characteristics of breast ultrasound images. The adapter module directly addresses ultrasound-specific challenges including speckle noise, low contrast boundaries, shadowing artifacts, and anisotropic resolution through targeted architectural elements such as channel attention mechanisms, specialized convolution kernels, and optimized skip connections. This optimization significantly improves segmentation accuracy, particularly for low-contrast and small lesion regions. Compared to traditional methods, ESAM2-BLS fully leverages the generalization capabilities of large models while incorporating multi-scale feature fusion and axial dilated depthwise convolution to effectively capture multi-level information from complex lesions. During the decoding process, the model enhances the identification of fine boundaries and small lesions through depthwise separable convolutions and skip connections, while maintaining a low computational cost. Visualization of the segmentation results and interpretability analysis demonstrate that ESAM2-BLS achieves an average Dice score of 0.9077 and 0.8633 in five-fold cross-validation across two datasets with over 1600 patients. These results significantly improve segmentation accuracy and robustness. This model provides an efficient, reliable, and specialized automated solution for early breast cancer screening and diagnosis.
Multi-representational deep transfer learning for classifying hemorrhagic metastases and non-neoplastic intracranial hematomas in multi-modal brain MRI scans
With an increasing incidence of malignant tumors, occurrence of brain metastases (BMs) has increased. BM represents the most common adult malignant brain tumors. BM is associated with hemorrhages, cystic necrosis, and calcification, which leads to significant diagnostic challenges when differentiating between hemorrhagic brain metastasis (HBM) and non-neoplastic intracranial hematomas (nn-ICH). This study addressed the limitations of small sample sizes, limited imaging features, and underutilized machine learning techniques reported in previous radiomic studies and introduced a novel multi-representation deep transfer learning (MRDTL) framework. Compared to existing radiomics feature analysis methods, MRDTL utilizes multi-modal MRI scans with two substantial merits: (1) A multi-representation fusion (MRF) module which extracted typical feature combinations by explicitly learning the complementarities between multi-modal sequences and multiple representations; (2) a neighborhood embedding (NE) module that measured metrics and clustering on cross-centric data to enhance transferable representations and improve model generalization. On the self-constructed HBMRI dataset, MRDTL outperformed five other baseline methods in AUC, F1-score, and accuracy. It improved accuracy to 94.5% and 93.5% in Co-site and Separate site testing, respectively, and overall provided more reliable diagnostic insights.
Multistain multicompartment automatic segmentation in renal biopsies with thrombotic microangiopathies and other vasculopathies
Automatic tissue segmentation is a necessary step for the bulk analysis of whole slide images (WSIs) from paraffin histology sections in kidney biopsies. However, existing models often fail to generalize across the main nephropathological staining methods and to capture the severe morphological distortions in arteries, arterioles, and glomeruli common in thrombotic microangiopathy (TMA) or other vasculopathies. Therefore, we developed an automatic multi-staining segmentation pipeline covering six key compartments: Artery, Arteriole, Glomerulus, Cortex, Medulla, and Capsule/Other. This framework enables downstream tasks such as counting and labeling at instance-, WSI- or biopsy-level. Biopsies (n = 158) from seven centers: Cologne, Turin, Milan, Weill-Cornell, Mainz, Maastricht, Budapest, were classified by expert nephropathologists into TMA (n = 87) or Mimickers (n = 71). Ground truth expert segmentation masks were provided for all compartments, and expert binary TMA classification labels for Glomerulus, Artery, Arteriole. The biopsies were divided into training (n = 79), validation (n = 26), and test (n = 53) subsets. We benchmarked six deep learning models for semantic segmentation (U-Net, FPN, DeepLabV3+, Mask2Former, SegFormer, SegNeXt) and five models for classification (ResNet-34, DenseNet-121, EfficientNet-v2-S, ConvNeXt-Small, Swin-v2-B). We obtained robust segmentation results across all compartments. On the test set, the best models achieved Dice coefficients of 0.903 (Cortex), 0.834 (Medulla), 0.816 (Capsule/Other), 0.922 (Glomerulus), 0.822 (Artery), and 0.553 (Arteriole). The best classification models achieved Accuracy of 0.724 and 0.841 for Glomerulus and Artery plus Arteriole compartments, respectively. Furthermore, we release NePathTK (NephroPathology Toolkit), a powerful open-source end-to-end pipeline integrated with QuPath, enabling accurate segmentation for decision support in nephropathology and large-scale analysis of kidney biopsies.
Multimodal framework for TACE treatment response prediction in patients with hepatocellular carcinoma
Transarterial chemoembolization (TACE) is a first-line treatment for intermediate-stage hepatocellular carcinoma (HCC) that can cause side effects. An accurate prediction of TACE response is important to improve clinical outcomes and avoid unnecessary toxicity. This study pursues a dual objective: to propose a standardized evaluation pipeline that enables reproducible benchmarking of state-of-the-art approaches on publicly available datasets, including both internal and external validation with public dataset, and to introduce a novel multimodal framework that integrates clinical variables, radiomic and deep features extracted from CT scans using the Vision Transformer MedViT to predict treatment response. Experiments were conducted using two publicly available datasets, the HCC-TACE-Seg, used as training and internal validation sets, and the WAW-TACE cohort, used as external validation set. The results demonstrated that the proposed method outperforms existing approaches. Independent validation on the external WAW-TACE dataset achieved promising results, confirming the robustness of the model and its potential to support treatment planning.
Med-SCoT: Structured chain-of-thought reasoning and evaluation for enhancing interpretability in medical visual question answering
Most existing medical visual question answering (Med-VQA) methods emphasize answer accuracy while neglecting the reasoning process, limiting interpretability and reliability in clinical settings. To address this issue, we introduce Med-SCoT, a vision-language model that performs structured chain-of-thought (SCoT) reasoning by explicitly decomposing inference into four stages: Summary, Caption, Reasoning, and Conclusion. To facilitate training, we propose a multi-model collaborative correction (CoCo) annotation pipeline and construct three Med-VQA datasets with structured reasoning chains. We further develop SCoTEval, a comprehensive evaluation framework combining metric-based scores and large language model (LLM) assessments to enable fine-grained analysis of reasoning quality. Experimental results demonstrate that Med-SCoT achieves advanced answer accuracy while generating structured, clinically aligned and logically coherent reasoning chains. Moreover, SCoTEval exhibits high agreement with expert judgments, validating its reliability for structured reasoning assessment. The code, data, and models are available at: https://github.com/qiaodongxing/Med-SCoT.
CRAD: Cognitive Aware Feature Refinement with Missing Modalities for Early Alzheimer's Progression Prediction
Accurate diagnosis and early prediction of Alzheimer's disease (AD) often require multiple neuroimageing modalities, but in many cases, only one or two modalities are available. This missing modality hinders the accuracy of diagnosis and is a critical challenge in clinical practice. Multimodal knowledge distillation (KD) offers a promising solution by aligning complete knowledge from multimodal data with that of partial modalities. However, current methods focus on aligning high-level features, which limit their effectiveness due to insufficient transfer of reliable knowledge. In this work, we propose a novel Consistency Refinement-driven Multi-level Self-Attention Distillation framework (CRAD) for Early Alzheimer's Progression Prediction, which enables the cross-modal transfer of more robust shallow knowledge with self-attention to refine features. We develop a multi-level distillation module to progressively distill cross-modal discriminating knowledge, enabling lightweight yet reliable knowledge transfer. Moreover, we design a novel self-attention distillation module (PF-CMAD) to transfer disease-relevant intermediate knowledge, which leverages feature self-similarity to capture cross-modal correlations without introducing trainable parameters, enabling interpretable and efficient distillation. We incorporate a consistency-evaluation-driven confidence regularization strategy within the distillation process. This strategy dynamically refines knowledge using adaptive distillation controllers that assess teacher confidence. Comprehensive experiments demonstrate that our method achieves superior accuracy and robust cross-dataset generalization performance using only MRI for AD diagnosis and early progression prediction. The code is available at https://github.com/LiuFei-AHU/CRAD.
DuetMatch: Harmonizing semi-supervised brain MRI segmentation via decoupled branch optimization
The limited availability of annotated data in medical imaging makes semi-supervised learning increasingly appealing for its ability to learn from imperfect supervision. Recently, teacher-student frameworks have gained popularity for their training benefits and robust performance. However, jointly optimizing the entire network can hinder convergence and stability, especially in challenging scenarios. To address this for medical image segmentation, we propose DuetMatch, a novel dual-branch semi-supervised framework with asynchronous optimization, where each branch optimizes either the encoder or decoder while keeping the other frozen. To improve consistency under noisy conditions, we introduce Decoupled Dropout Perturbation, enforcing regularization across branches. We also design Pairwise CutMix Cross-Guidance to enhance model diversity by exchanging pseudo-labels through augmented input pairs. To mitigate confirmation bias from noisy pseudo-labels, we propose Consistency Matching, refining labels using stable predictions from frozen teacher models. Extensive experiments on benchmark brain MRI segmentation datasets, including ISLES2022 and BraTS, show that DuetMatch consistently outperforms state-of-the-art methods, demonstrating its effectiveness and robustness across diverse semi-supervised segmentation scenarios.
Anatomy-informed deep learning and radiomics for neurofibroma segmentation in whole-body MRI
Neurofibromatosis type 1 (NF1) is a genetic disorder characterized by the development of multiple neurofibromas (NFs) throughout the body. Accurate segmentation of these tumors in whole-body magnetic resonance imaging (WB-MRI) is critical for quantifying tumor burden and clinical decision-making. This study aims to develop a pipeline for NF segmentation in fat-suppressed T2-weighted WB-MRI that incorporates anatomical context and radiomics to improve accuracy and specificity.
PET/CT-based deep learning model predicts distant metastasis after SBRT for early-stage NSCLC: A multicenter study
Distant metastasis (DM) is the most frequent recurrence mode following stereotactic body radiation therapy (SBRT) for early-stage non-small cell lung cancer (NSCLC). Assessing DM risk prior to treatment initiation is critical. This study aimed to develop and validate a deep learning fusion model, based on 18F-FDG PET/CT images, to predict DM risk. A total of 566 patients from 5 hospitals were allocated into a training set (n = 347), an internal test set (n = 139), and an external test set (n = 80). Deep learning features were extracted from CT, PET, and fusion images using a variational autoencoder. Metastasis-free survival prognostic models were developed via fully connected networks. The fusion model demonstrated superior predictive capability compared to the CT or PET models alone, achieving C-indices of 0.864 (training), 0.819 (internal test), and 0.782 (external test). The model successfully stratified patients into high- and low-risk groups with significantly differentiated MFS (e.g., training set: HR=8.425, p < 0.001; internal test set, HR=6.828, p < 0.001; external test set: HR=4.376, p = 0.011). It was identified as an independent prognostic factor for MFS (HR=14.387, p < 0.001). In conclusions, the 18F-FDG PET/CT deep learning-based fusion model provides a robust prediction of distant metastasis risk and MFS in early-stage NSCLC patients receiving SBRT. This tool may offer objective data to inform individualized treatment decisions.
Hallucinated domain generalization network with domain-aware dynamic representation for medical image segmentation
Due to variations in medical image acquisition protocols, segmentation models often exhibit degraded performance when applied to unseen domains. We argue that such degradation primarily stems from overfitting to source domains and insufficient dynamic adaptability to target domains. To address this issue, we propose a hallucinated domain generalization network with domain-aware dynamic representation for medical image segmentation, which introduces a novel "hallucination during training, dynamic representation during testing" scheme to effectively improve generalization. Specifically, we design an uncertainty-aware dynamic hallucination module that achieves adaptive transformation through Bézier curves and estimates potential domain shift by introducing the uncertainty-aware offset variable driven by channel-wise variance, generating diverse synthetic images. This approach breaks the limitations of source domain distributions while preserving original anatomical structures, effectively alleviating the model's overfitting to the specific styles of source domains. Furthermore, we develop a domain-aware dynamic representation module that treats source domain knowledge as a foundation for understanding unknown domains. Concretely, we obtain unbiased estimates of global style prototypes through domain-wise statistical aggregation and the momentum update strategy. Then, input features are mapped to the unified source domain space through global style prototypes and similarity weights, mitigating performance degradation caused by domain shift during the testing phase. Extensive experiments on four heterogeneously distributed fundus image datasets and six multi-center prostate MRI datasets demonstrate that our approach outperforms state-of-the-art methods.
AtlasSeg: Atlas prior guided dual-U-Net for tissue segmentation in fetal brain MRI
Accurate automatic tissue segmentation in fetal brain MRI is a crucial step in clinical diagnosis but remains challenging, particularly due to the dynamically changing anatomy and tissue contrast during fetal development. Existing segmentation networks can only implicitly learn age-related features, leading to a decline in accuracy at extreme gestational ages (GAs). To improve segmentation performance throughout gestation, we introduce AtlasSeg, a dual-U-shape convolution network that explicitly integrates GA-specific information as guidance. By providing a publicly available fetal brain atlas with segmentation labels corresponding to relevant GAs, AtlasSeg effectively extracts age-specific patterns in the atlas branch and generates precise tissue segmentation in the segmentation branch. Multi-scale spatial attention feature fusions are constructed during both encoding and decoding stages to enhance feature flow and facilitate better information interactions between two branches. We compared AtlasSeg with ten well-established networks with a seven-tissue segmentation task in our in-house and two public datasets, achieving the highest average Dice similarity coefficient. The improvement was particularly evident in extreme early or late GA cases, where training data was scare. Furthermore, AtlasSeg exhibited minimal performance degradation on low-quality images with contrast changes and noise, attributed to its anatomical shape priors. Overall, AtlasSeg demonstrated enhanced segmentation accuracy, better consistency across fetal ages, and robustness to perturbations, making it a powerful tool for reliable fetal brain MRI tissue segmentation, particularly suited for diagnostic assessments during early gestation.
Path and bone-contour regularized unpaired MRI-to-CT translation
Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods, which predominantly rely on cycle consistency and contrastive learning frameworks, frequently encounter challenges in accurately translating anatomical features that are highly discernible on CT but less distinguishable on MRI, such as bone structures. This limitation renders these approaches less suitable for applications in radiation therapy, where precise bone representation is essential for accurate treatment planning. To address this challenge, we propose a path- and bone-contour regularized approach for unpaired MRI-to-CT translation. In our method, MRI and CT images are projected to a shared latent space, where the MRI-to-CT mapping is modeled as a continuous flow governed by neural ordinary differential equations. The optimal mapping is obtained by minimizing the transition path length of the flow. To enhance the accuracy of translated bone structures, we introduce a trainable neural network to generate bone contours from MRI and implement mechanisms to directly and indirectly encourage the model to focus on bone contours and their adjacent regions. Evaluations conducted on three datasets demonstrate that our method outperforms existing unpaired MRI-to-CT translation approaches, achieving lower overall error rates. Moreover, in a downstream bone segmentation task, our approach exhibits superior performance in preserving the fidelity of bone structures. Our code is available at: https://github.com/kennysyp/PaBoT.
Efficient frequency-decomposed transformer via large vision model guidance for surgical image desmoking
Surgical image restoration plays a vital clinical role in improving visual quality during surgery, particularly in minimally invasive procedures where the operating field is frequently obscured by surgical smoke. However, surgical image desmoking still has limited progress in algorithm development and customized learning strategies. In this regard, this work focuses on the task of desmoking from both theoretical and practical perspectives. First, we analyze the intrinsic characteristics of surgical smoke degradation: (1) spatial localization and dynamics, (2) distinguishable frequency-domain patterns, and (3) the entangled representation of anatomical content and degradative artifacts. These observations motivated us to propose an efficient frequency-aware Transformer framework, namely SmoRestor, which aims to separate and restore true anatomical structures from complex degradations. Specifically, we introduce a high-order Fourier-embedded neighborhood attention transformer that enhances the model's ability to capture structured degradation patterns across both spatial and frequency domains. Besides, we utilize the semantic priors encoded by large vision models to disambiguate content from degradation through targeted guidance. Moreover, we propose an innovative transfer learning paradigm that injects knowledge from large models to the main network, enabling it to effectively distinguish meaningful content from ambiguous corruption. Experimental results on both public and in-house datasets demonstrate substantial improvements in quantitative performance and visual quality. The source code will be available.
BoneVisionNet: A deep learning approach for the classification of bone tumours from radiographs using a triple fusion attention network of transformer and CNNs with XAI visualizations
Diagnosis of bone tumours present numerous challenges due to the complexity of pathology and varying morphologies of bone tumours. Current methods rely on manual techniques that are time-consuming and prone to errors. Hence, there is a need for more accurate and automated methods to assist medical professionals. The proposed work aims to solve this challenge by developing a deep learning-based architecture for bone tumour classification using radiographs. The proposed BoneVisionNet is developed using a combination of three specialized DL networks. The proposed approach leverages Convolution-Enhanced Image Transformer for global feature extraction which is further refined using a Global Context Block (GCB). In parallel, the Attention Boosted Mid-Level Feature Extraction Network (ABMLFE-Net) targets mid-level features and DenseNet-169 focuses on local feature extraction. The feature maps from the ABMLFE-Net and DenseNet-169 are fused using element-wise multiplication and is followed by an Efficient Channel Attention (ECA) layer for feature refinement. The global features that are refined by GCB are concatenated with the enhanced feature maps from the ECA layer, resulting in an refined multi-scale feature map. The BoneVisionNet attained an accuracy of 84.35 % when tested on the BTXRD dataset, outperforming CNN and transformer-based networks for classifying bone tumours from radiographs. To the best of our knowledge, this study represents the first application of a triple-track architecture for the classification of bone tumours from X-ray images. XAI visualisations using Grad-CAM, LIME, and SHAP help to further validate the performance of the model by ensuring transparency in the decision-making process.
Deep spatiotemporal clutter filtering of transthoracic echocardiographic images: Leveraging contextual attention and residual learning
This study presents a deep autoencoder network for filtering reverberation clutter from transthoracic echocardiographic (TTE) images. Given the spatiotemporal nature of this type of clutter, the filtering network employs 3D convolutional layers to suppress it throughout the cardiac cycle. The design of the network incorporates two key features that contribute to the effectiveness of the filter: (1) an attention mechanism for focusing on cluttered regions and leveraging contextual information, and (2) residual learning for preserving fine image structures. A diverse set of artifact patterns was simulated and superimposed onto ultra-realistic synthetic TTE sequences from six ultrasound vendors, generating input for the filtering network. The corresponding artifact-free sequences served as ground-truth. The performance of the filtering network was evaluated using unseen synthetic and in vivo artifactual sequences. Results from the in vivo dataset confirmed the network's strong generalization capabilities, despite being trained solely on synthetic data and simulated artifacts. The suitability of the filtered sequences for downstream processing was assessed by computing segmental strain curves. A significant reduction in the discrepancy between the strain profiles of the cluttered and clutter-free segments was observed after filtering. The trained network processes a TTE sequence in a fraction of a second, enabling real-time clutter filtering and potentially improving the precision of clinically relevant indices derived from TTE sequences. The source code of the proposed method and example video files of the filtering results are available at: https://github.com/MahdiTabassian/Deep-Clutter-Filtering/tree/main.
Coronary artery calcification segmentation with sparse annotations in intravascular OCT: Leveraging self-supervised learning and consistency regularization
Assessing coronary artery calcification (CAC) is crucial in evaluating the progression of atherosclerosis and planning percutaneous coronary intervention (PCI). Intravascular Optical Coherence Tomography (OCT) is a commonly used imaging tool for evaluating CAC at micrometer-scale level and in three-dimensions for optimizing PCI. While existing deep learning methods have proven effective in OCT image analysis, they are hindered by the lack of large-scale, high-quality labels to train deep neural networks that can reach human level performance in practice. In this work, we propose an annotation-efficient approach for segmenting CAC in intravascular OCT images, leveraging self-supervised learning and consistency regularization. We employ a transformer encoder paired with a simple linear projection layer for self-supervised pre-training on unlabeled OCT data. Subsequently, a transformer-based segmentation model is fine-tuned on sparsely annotated OCT pullbacks with a contrast loss using a combination of unlabeled and labeled data. We collected 2,549,073 unlabeled OCT images from 7,108 OCT pullbacks for pre-training, and 1,106,347 sparsely annotated OCT images from 3,025 OCT pullbacks for model training and testing. The proposed approach consistently outperformed existing sparsely supervised methods on both internal and external datasets. In addition, extensive comparisons under full, partial, and sparse annotation schemes substantiated its high annotation efficiency. With 80% reduction in image labeling efforts, our method has the potential to expedite the development of deep learning models for processing large-scale medical image data.
