Further observations on the security of Speck32-like ciphers using machine learning
With the widespread deployment of Internet of Things across various industries, the security of communications between different devices is one of the critical concerns to consider. The lightweight cryptography emerges as a specialized solution to address security requirements for resource-constrained environments. Consequently, the comprehensive security evaluation of the lightweight cryptographic primitives-from the structure of ciphers and cryptographic components-has become imperative. In this article, we focus on the security evaluation of rotation parameters in the Speck32-like lightweight cipher family. We establish a machine learning-driven security evaluation framework for the rotational parameter selection principles-the core of Speck32's design architecture. To assess different parameters security, we develop neural-differential distinguishers with considering of two distinct input difference models: (1) the low-Hamming-weight input differences and (2) the input differences from optimal differential characteristics. Our methodology achieves the security evaluation of 256 rotation parameters using the accuracy of neural distinguishers as the evaluation criteria. Our results illustrate the parameter (7,3) has stronger ability to resist machine learning-aided distinguishing attack compared to the standard (7,2) configuration. To our knowledge, this represents the first comprehensive study applying machine learning techniques for security assessment of Speck32-like ciphers. Furthermore, we investigate the reason for the difference in the accuracy of neural distinguishers with different rotation parameters. Our experimental results demonstrate that the bit bias in output differences and truncated differences is the important factor affecting the accuracy of distinguishers.
Next generation sequencing under attack: investigating insider threats and organizational behaviour
Next generation sequencing (NGS) has become a cornerstone of modern genomics, enabling high-throughput analysis of DNA and RNA with wide applications across medicine, research, and biotechnology. However, the growing adoption of NGS technologies has introduced significant cyber-biosecurity risks, particularly those arising from insider threats and organizational shortcomings. While technical vulnerabilities have received attention, the human and behavioral dimensions of cybersecurity in NGS environments remain underexplored. This study investigates the role of human factors and organizational behavior in shaping cyber-biosecurity risks in NGS workflows. A mixed-method approach was employed, combining survey data from 120 participants across four countries with statistical analyses including chi-square tests, cross-tabulations, and cluster analysis. The study assessed cybersecurity training availability, employee engagement, training effectiveness, and awareness of insider threats. Findings reveal substantial gaps in training frequency and participation, with 36% of respondents reporting no access to NGS-specific cybersecurity training. Only a minority of participants felt confident in detecting cyber threats, and 32.5% had never applied cybersecurity knowledge in practice. Chi-square results indicate significant associations between training frequency and threat recognition, training relevance, and knowledge application. Cluster analysis further categorized organizations into "robust," "moderate," and "emergent" cybersecurity maturity profiles. The study offers an evidence-based framework to enhance cyber-biosecurity in NGS settings by addressing human-centric risks. It recommends role-specific training, frequent policy updates, and improved organizational communication to mitigate insider threats. These insights support the development of targeted interventions and policies to strengthen the cybersecurity culture in genomics organizations.
Predicting academic performance for students' university: case study from Saint Cloud State University
Predicting students' performance is one of the essential educational data mining approaches aimed at observing learning outcomes. Predicting grade point average (GPA) helps to monitor academic performance and assists advisors in identifying students at risk of failure, major changes, or dropout. To enhance prediction performance, this study employs a long short-term memory (LSTM) model using a rich set of academic and demographic features. The dataset, drawn from 29,455 students at Saint Cloud State University (SCSU) over eight years (2016-2024), was carefully preprocessed by eliminating irrelevant and missing data, encoding categorical variables, and normalizing numerical features. Feature importance was determined using a permutation-based method to identify the most impactful variables on term GPA prediction. Furthermore, model hyperparameters, including the number of LSTM layers, units per layer, batch size, learning rate, and activation functions, were fine-tuned using experimental validation with the Adam optimizer and learning rate scheduling. Two experiments were conducted at both the college and department levels. The proposed model outperformed traditional machine learning models such as linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and support vector regressor (SVR), and it surpasses two deep learning models, recurrent neural network (RNN) and convolutional neural network (CNN), achieving 9.54 mean absolute percentage error (MAPE), 0.0059 mean absolute error (MAE), 0.0001 root mean square error (RMSE), and an R² score of 99%.
Self-learning type-2 fuzzy systems with adaptive rule reduction for time series forecasting
In rapidly changing scenarios, uncertainty and chaotic oscillations often obstruct time series prediction. However, Type-1 fuzzy systems face challenges in handling high uncertainty levels, therefore, Type-2 fuzzy systems become a better solution. Nonetheless, the complexity of Type-2 fuzzy models can produce overwhelming rules, compromising interpretability and computational efficiency. We present a Self-Learning Type-2 Fuzzy System with adaptive rule reduction that optimizes the rule base as forecast accuracy begins to deteriorate after adaptation. Our model combines participatory learning (PL) and Kernel Recursive Least Squares (KRLS) for online learning, an Adaptive reduced rule strategy to eliminate repeating rules and gain computational efficiency. Our approach incorporates a compatibility measure rooted in Type-2 fuzzy sets, paving the way for an improved consideration of uncertainty. Complex datasets, including Mackey-Glass chaotic time series and Taiwan Capitalization Weighted Stock Index (TAIEX), are used to evaluate the model, which demonstrates its superior forecasting performance compared to state-of-the-art models. Experiments show that our solution, through the development of a few rules, obtains lower error measures maintaining a small rule base, thus proving to be a scalable approach amenable to on-line deployment in fast paced environments such as those appearing in the financial markets, industrial processes and others that demand highly accurate time series forecasts in the presence of uncertainty.
A path aggregation network with deformable convolution for visual object detection
One of the main challenges encountered in visual object detection is the multi-scale issue. Many approaches have been proposed to tackle this issue. In this article, we propose a novel neck that can perform effective fusion of multi-scale features for a single-stage object detector. This neck, named the deformable convolution and path aggregation network (DePAN), is an integration of a path aggregation network with a deformable convolution block added to the feature fusion branch to improve the flexibility of feature point sampling. The deformable convolution block is implemented by repeated stacking of a deformable convolution cell. The DePAN neck can be plugged in and easily applied to various models for object detection. We apply the proposed neck to the baseline models of Yolov6-N and YOLOV6-T, and test the improved models on COCO2017 and PASCAL VOC2012 datasets, as well as a medical image dataset. The experimental results verify the effectiveness and applicability in real-world object detection.
Evaluating machine learning models for predictive accuracy in cryptocurrency price forecasting
Our research investigates the predictive performance and robustness of machine learning classification models and technical indicators for algorithmic trading in the volatile cryptocurrency market. The main aim is to identify reliable approaches for informed decision-making and profitable strategy development. With the increasing global adoption of cryptocurrency, robust trading models are essential for navigating its unique challenges and seizing investment opportunities. This study contributes to the field by offering a novel comparison of models, including logistic regression, random forest, and gradient boosting, under different data configurations and resampling techniques to address class imbalance. Historical data from cryptocurrency exchanges and data aggregators is collected, preprocessed, and used to train and evaluate these models. The impact of class imbalance, resampling techniques, and hyperparameter tuning on model performance is investigated. By analyzing historical cryptocurrency data, the methodology emphasizes hyperparameter tuning and backtesting, ensuring realistic model assessment. Results highlight the importance of addressing class imbalance and identify consistently outperforming models such as random forest, XGBoost, and gradient boosting. Our findings demonstrate that these models outperform others, indicating promising avenues for future research, particularly in sentiment analysis, reinforcement learning, and deep learning. This study provides valuable guidance for navigating the complex landscape of algorithmic trading in cryptocurrencies. By leveraging the findings and recommendations presented, practitioners can develop more robust and profitable trading strategies tailored to the unique characteristics of this emerging market.
Zero-shot cross-lingual stance detection adversarial language adaptation
Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This article makes the first such effort by introducing a novel approach to zero-shot cross-lingual stance detection, multilingual translation-augmented bidirectional encoder representations from Transformers (BERT) (MTAB), aiming to enhance the performance of a cross-lingual classifier in the absence of explicit training data for target languages. Our technique employs translation augmentation to improve zero-shot performance and pairs it with adversarial learning to further boost model efficacy. Through experiments on datasets labeled for stance towards vaccines in four languages-English, German, French, Italian, we demonstrate the effectiveness of our proposed approach, showcasing improved results in comparison to a strong baseline model as well as ablated versions of our model. Our experiments demonstrate the effectiveness of model components, not least the translation-augmented data as well as the adversarial learning component, to the improved performance of the model. We have made our source code accessible on GitHub: https://github.com/amcs18pd05/MTAB-cross-lingual-vaccine-stance-detection-2.
Precise distance measurement with stereo camera: experimental results
Today, image processing is used in many areas, especially artificial intelligence. This is because images are thought to contain a lot of information. In addition, many distance measurement studies have used image processing techniques. However, no studies have reached these high sensitivity and accuracy rates that can be used in engineering. The motivation of the study is to obtain the results of the experimental application of the image processing method, which can measure distances with high sensitivity and can also be used in engineering fields. In the study, the distances of 19 different target objects were measured using Total Station, Laser Meter, and Developed Prototype (Image Meter). Total Station measurement results were used as a reference and the Laser Meter and Image Meter results were compared. As a result of the comparison, it was determined that the developed Image Meter had a smaller error rate in 11 of the 19 comparisons. Results were obtained with an average error of 1.24% as a result of 19 measurements made with the developed Image Meter. The experimental results were also compared with theoretical calculation. As a result of the comparisons, it was determined that the results with the developed Image Meter were acceptable and could be improved with mechanical arrangements.
Multicriteria scheduling of two-subassembly products with batch availability and precedence constraints
This article studies the multicriteria problems of scheduling a set of n products on a fabrication facility, focusing on batch availability and precedence constraints. Each product is composed of two distinct subassemblies: a common subassembly, shared across all products, and a unique subassembly unique to each product. The common subassemblies are processed together in batches, with each batch requiring an initial setup, while unique subassemblies are handled individually. The availability of a common subassembly is contingent upon the completion of its entire batch (., batch availability), whereas a unique subassembly becomes available immediately after its processing. The product completion time is determined by the availability of both subassemblies. Strict (weak) precedence means that if a product precedes another, then the latter can start only after the former is completed (the latter cannot start earlier than the former). We propose O(n)-time algorithms to simultaneously optimize makespan and maximum cost, as well as to lexicographically optimize two maximum costs and makespan under strict or weak precedence constraints.
The accuracy-bias trade-offs in AI text detection tools and their impact on fairness in scholarly publication
Artificial intelligence (AI) text detection tools are considered a means of preserving the integrity of scholarly publication by identifying whether a text is written by humans or generated by AI. This study evaluates three popular tools (GPTZero, ZeroGPT, and DetectGPT) through two experiments: first, distinguishing human-written abstracts from those generated by ChatGPT o1 and Gemini 2.0 Pro Experimental; second, evaluating AI-assisted abstracts where the original text has been enhanced by these large language models (LLMs) to improve readability. Results reveal notable trade-offs in accuracy and bias, disproportionately affecting non-native speakers and certain disciplines. This study highlights the limitations of detection-focused approaches and advocates a shift toward ethical, responsible, and transparent use of LLMs in scholarly publication.
Multi-task advanced convolutional neural network for robust lymphoblastic leukemia diagnosis, classification, and segmentation
Acute lymphoblastic leukemia (ALL), a hematologic malignancy characterized by the overproduction of immature lymphocytes, a type of white blood cell. Accurate and timely diagnosis of ALL is crucial for effective management. This article introduces a novel multi-task advanced convolutional neural network (MTA-CNN) framework for ALL detection in medical imaging data by simultaneously performing, expression classification, and disease detection. The MTA-CNN is based on a deep learning architecture that can handle multiple tasks simultaneously, allowing it to learn more comprehensive and generalizable features. With, expression classification, and disease detection tasks, the MTA-CNN effectively leverages the complementary information from each task to improve overall performance. The proposed framework employs CNNs to extract informative features from medical images. These features capture the spatial and temporal characteristics of the data, which are essential for accurate ALL diagnosis. The cascaded structure of the MTA-CNN allows the model to learn features at different levels of abstraction, from low-level to high-level, enabling it to capture both fine-grained and coarse-grained information. To ensure the reliability of the detection results, non-maximum suppression is employed to eliminate redundant detections, focusing only on the most likely candidates. Additionally, the MTA-CNN's ability to accurately localize key facial landmarks provides valuable information for further analysis, including identifying abnormal structures or changes in anatomical features associated with ALL. Experimental results on a comprehensive dataset of medical images demonstrate the superiority of the MTA-CNN over other learning methods. The proposed framework achieved an accuracy of 0.978, precision of 0.979, recall of 0.967, F1-score of 0.973, specificity of 0.991, Cohen's kappa of 0.979, and negative predictive value (NPV) of 0.990. These metrics significantly outperform baseline models, highlighting the MTA-CNN's ability to accurately identify and classify ALL cases. The MTA-CNN offers a promising approach for improving the efficiency and accuracy of ALL diagnosis.
A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets
Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.
A survey on surface reconstruction based on 3D Gaussian splatting
Surface reconstruction is a foundational topic in computer graphics and has gained substantial research interest in recent years. With the emergence of advanced neural radiance fields (NeRFs) and 3D Gaussian splatting (3D GS), numerous innovative many novel algorithms for 3D model surface reconstruction have been developed. The rapid expansion of this field presents challenges in tracking ongoing advancements. This survey aims to present core methodologies for the surface reconstruction of 3D models and establish a structured roadmap that encompasses 3D representations, reconstruction methods, datasets, and related applications. Specifically, we introduce 3D representations using 3D Gaussians as the central framework. Additionally, we provide a comprehensive overview of the rapidly evolving surface reconstruction methods based on 3D Gaussian splatting. We categorize the primary phases of surface reconstruction algorithms for 3D models into scene representation, Gaussian optimization, and surface structure extraction. Finally, we review the available datasets, applications, and challenges and suggest potential future research directions in this domain. Through this survey, we aim to provide valuable resources that support and inspire researchers in the field, fostering advancements in 3D reconstruction technologies.
A social information sensitive model for conversational recommender systems
Conversational recommender systems (CRS) facilitate natural language interactions for more effective item suggestions. While these systems show promise, they face challenges in effectively utilizing and integrating informative data with conversation history through semantic fusion. In this study we present an innovative framework for extracting social information from conversational datasets by inferring ratings and constructing user-item interaction and user-user relationship graphs. We introduce a social information sensitive semantic fusion (SISSF) method that employs contrastive learning (CL) to bridge the semantic gap between generated social information and conversation history. We evaluated the framework on two public datasets (ReDial and INSPIRED) using both automatic and human evaluation metrics. Our SISSF framework demonstrated significant improvements over baseline models across all metrics. For the ReDial dataset, SISSF achieved superior performance in recommendation tasks (R@1: 0.062, R@50: 0.437) and conversational quality metrics (Distinct-2: 4.223, Distinct-3: 5.595, Distinct-4: 6.155). Human evaluation showed marked improvement in both fluency (1.81) and informativeness (1.63). We observed similar performance gains on the INSPIRED dataset, with notable improvements in recommendation accuracy (R@1: 0.046, R@10: 0.129, R@50: 0.269) and response diversity (Distinct-2: 2.061, Distinct-3: 4.293, Distinct-4: 6.242). The experimental results consistently validate the effectiveness of our approach in both recommendation and conversational tasks. These findings suggest that incorporating social context through CL can significantly improve the personalization and relevance of recommendations in conversational systems.
An evolutionary Bi-LSTM-DQN framework for enhanced recognition and classification in rural information management
As deep learning and reinforcement learning technologies advance, intelligent rural information management is transforming substantially. This article presents an innovative framework, the evolutionary bidirectional long short-term memory deep Q-network (EBLM-DQN), which integrates evolutionary algorithms, reinforcement learning, and bidirectional long short-term memory (Bi-LSTM) networks to significantly improve the accuracy and efficiency of rural information management, particularly for recognizing and classifying information relevant to farmers. The proposed framework begins with data preprocessing using disambiguation techniques and data complementation, followed by temporal feature extraction a Bi-LSTM layer. It then employs a deep Q-network (DQN) to adjust and optimize weights dynamically. After feature extraction and weight optimization, evolutionary algorithms are used to select the optimal weights, enabling precise recognition and classification of conditions encountered by farmers seeking assistance. Experimental results indicate that the EBLM-DQN framework outperforms existing frameworks on public datasets and real-world applications, providing higher classification accuracy. This framework offers valuable technical support and a reference for future optimization and development of rural information management systems.
Noise injection into Freeman chain codes
This article presents a novel method for direct noise injection into geometric shapes described by eight-or four-directional Freeman chain codes. Noise is applied to randomly selected segments of a chain code sequence using a set of predefined actions. The design of alterations retains topological characteristics of shapes. The method is tested on various shapes, including open, self-intersecting, and simple shapes, among which the latest two may contain holes. Fractal dimension and mean distance from original are utilised in order to analyse the amount of injected noise in sequences of chain codes. The proposed method enables efficient noise injection directly into Freeman chain codes for use in data augmentation and regularization during neural network training.
The DBCV index is more informative than DCSI, CDbw, and VIASCKDE indices for unsupervised clustering internal assessment of concave-shaped and density-based clusters
Clustering methods are unsupervised machine learning techniques that aggregate data points into specific groups, called , according to specific criteria defined by the clustering algorithm employed. Since clustering methods are unsupervised, no ground truth or gold standard information is available to assess its results, making it challenging to know the results obtained are good or not. In this context, several clustering internal rates are available, like Silhouette coefficient, Calinski-Harabasz index, Davies-Bouldin, Dunn index, Gap statistic, and Shannon entropy, just to mention a few. Even if popular, these clustering internal scores work well only when used to assess convex-shaped and well-separated clusters, but they fail when utilized to evaluate concave-shaped and nested clusters. In these concave-shaped and density-based cases, other coefficients can be informative: Density-Based Clustering Validation Index (DBCVI), Compose Density between and within clusters Index (CDbw), Density Cluster Separability Index (DCSI), Validity Index for Arbitrary-Shaped Clusters based on the kernel density estimation (VIASCKDE). In this study, we describe the DBCV index precisely, and compare its outcomes with the outcomes obtained by CDbw, DCSI, and VIASCKDE on several artificial datasets and on real-world medical datasets derived from electronic health records, produced by density-based clustering methods such as density-based spatial clustering of applications with noise (DBSCAN). To do so, we propose an innovative approach based on clustering result worsening or improving, rather than focusing on searching the "right" number of clusters like many studies do. Moreover, we also recommend open software packages in R and Python for its usage. Our results demonstrate the higher reliability of the DBCV index over CDbw, DCSI, and VIASCKDE when assessing concave-shaped, nested, clustering results.
Predicting sport event outcomes using deep learning
Predicting the outcomes of sports events is inherently difficult due to the unpredictable nature of gameplay and the complex interplay of numerous influencing factors. In this study, we present a deep learning framework that combines a one-dimensional convolutional neural network (1D CNN) with a Transformer architecture to improve prediction accuracy. The 1D CNN effectively captures local spatial patterns in structured match data, while the Transformer leverages self-attention mechanisms to model long-range dependencies. This hybrid design enables the model to uncover nuanced feature interactions critical to outcome prediction. We evaluate our approach on a benchmark sports dataset, where it outperforms traditional machine learning methods and standard deep learning models in both accuracy and robustness. Our results demonstrate the promise of integrating convolutional and attention-based mechanisms for enhanced performance in sports analytics and predictive modeling.
A review of deep learning methods in aquatic animal husbandry
Aquatic animal husbandry is crucial for global food security and supports millions of livelihoods around the world. With the growing demand for seafood, this industry has become economically significant for many regions, contributing to local and global economies. However, as the industry grows, it faces various major challenges that are not encountered in small-scale setups. Traditional methods for classifying, detecting, and monitoring aquatic animals are often time-consuming, labor-intensive, and prone to inaccuracies. The labor-intensive nature of these operations has led many aquaculture operators to move towards automation systems. Yet, for an automation system to be effectively deployed, it needs an intelligent decision-making system, which is where deep learning techniques come into play. In this article, an extensive methodological review of machine learning methods, primarily the deep learning methods used in aquatic animal husbandry are concisely summarized. This article focuses on the use of deep learning in three key areas: classification, localization, and segmentation. Generally, classification techniques are vital in distinguishing between different species of aquatic organisms, while localization methods are used to identify the respective animal's position within a video or an image. Segmentation techniques, on the other hand, enable the precise delineation of organism boundaries, which is essential information in accurate monitoring systems. Among these key areas, segmentation techniques, particularly through the U-Net model, have shown the best results, even achieving a high segmentation performance of 94.44%. This article also highlights the potential of deep learning to enhance the precision, productivity, and sustainability of automated operations in aquatic animal husbandry. Looking ahead, deep learning offers huge potential to transform the aquaculture industry in terms of cost and operations. Future research should focus on refining existing models to better address real-world challenges such as sensor input quality and multi-modal data across various environments for better automation in the aquaculture industry.
Majority clustering for imbalanced image classification
Class imbalance is a prevalent challenge in image classification tasks, where certain classes are significantly underrepresented compared to others. This imbalance often leads to biased models that perform poorly in predicting minority classes, affecting the overall performance and reliability of image classification systems. In this article, an under-sampling approach based on reducing the samples of majority class is used along with the unsupervised clustering approach for partitioning the majority class into clusters within the datasets. The proposed technique, Majority Clustering for Imbalanced Image Classification (MCIIC) improves the traditional binary classification problems by converting it into multi-class problem, thereby creating the more balanced classification solution to the problems where one need to detect rare samples present in the dataset. By utilizing the elbow method, we determine the optimal number of clusters for the majority class and assign each cluster a new class label. This complete process ensures a balanced and symmetrical class distribution, effectively addressing imbalances both between and within classes and helps to perform imbalanced classification. The effectiveness of the proposed model is evaluated on various benchmark datasets, demonstrating their ability to improve the predictive performance of the proposed MCIIC on imbalanced image datasets. Through empirical evaluation, we showcase the impact of proposed technique on model accuracy, precision, recall, and F1-score, highlighting its importance as a pre-processing step in handling imbalanced image datasets. The results highlight the significance of proposed model as a practical approach to address the challenges posed by imbalanced data distributions in machine learning tasks.
Multimodal image fusion for enhanced vehicle identification in intelligent transport
Target detection in remote sensing is essential for applications such as law enforcement, military surveillance, and search-and-rescue. With advancements in computational power, deep learning methods have excelled in processing unimodal aerial imagery. The availability of diverse imaging modalities including, infrared, hyperspectral, multispectral, synthetic aperture radar, and Light Detection and Ranging (LiDAR) allows researchers to leverage complementary data sources. Integrating these multi-modal datasets has significantly enhanced detection performance, making these technologies more effective in real-world scenarios. In this work, we propose a novel approach that employs a deep learning-based attention mechanism to generate depth maps from aerial images. These depth maps are fused with RGB images to achieve enhanced feature representation. For image segmentation, we use Markov Random Fields (MRF), and for object detection, we adopt the You Only Look Once (YOLOv4) framework. Furthermore, we introduce a hybrid feature extraction technique that combines Histogram of Oriented Gradients (HOG) and Binary Robust Invariant Scalable Keypoints (BRISK) descriptors within the Vision Transformer (ViT) framework. Finally, a Residual Network with 18 layers (ResNet-18) is used for classification. Our model is evaluated on three benchmark datasets Roundabout Aerial, AU-Air, and Vehicle Aerial Imagery Dataset (VAID) achieving precision scores of 98.4%, 96.2%, and 97.4%, respectively, for object detection. Experimental results demonstrate that our approach outperforms existing state-of-the-art methods in vehicle detection and classification for aerial imagery.
