Real-Time Thinning Algorithms for 2D and 3D Images using GPU processors
The skeletonization of binary images is a common task in many image processing and machine learning applications. Some of these applications require very fast image processing. We propose novel techniques for efficient 2D and 3D thinning of binary images using GPU processors. The algorithms use bit-encoded binary images to process multiple points simultaneously in each thread. The simpleness of a point is determined based on Boolean algebra using only bitwise logical operators. This avoids computationally expensive decoding and encoding steps and allows for additional parallelization. The 2D algorithm is evaluated using a dataset of handwritten characters images. It required an average computation time of 3.53 ns for 32×32 pixels and 0.25 ms for 1024×1024 pixels. This is 52 to 18,380 times faster than a multi-threaded border-parallel algorithm. The 3D algorithm was evaluated based on clinical images of the human vasculature and required computation times of 0.27 ms for 128×128×128 voxels and 20.32 ms for 512×512×512 voxels, which is 32 to 46 times faster than the compared border-sequential algorithm using the same GPU processor. The proposed techniques enable efficient real-time 2D and 3D skeletonization of binary images, which could improve the performance of many existing machine learning applications.
Semivariogram Analysis of Bone Images Implemented on FPGA Architectures
Osteoporotic fractures are a major concern for the healthcare of elderly and female populations. Early diagnosis of patients with a high risk of osteoporotic fractures can be enhanced by introducing second-order statistical analysis of bone image data using techniques such as variogram analysis. Such analysis is computationally intensive thereby creating an impediment for introduction into imaging machines found in common clinical settings. This paper investigates the fast implementation of the semivariogram algorithm, which has been proven to be effective in modeling bone strength, and should be of interest to readers in the areas of computer-aided diagnosis and quantitative image analysis. The semivariogram is a statistical measure of the spatial distribution of data, and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. A semi-variance, (), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of . Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with pixels is () Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from DXA scans are utilized for the experiments. Implementation results show that a significant advantage in computational speed is attained by the architectures with respect to implementation on a personal computer with an Intel i7 multi-core processor.
Advances in real-time object tracking: Extensions for robust object tracking with a Monte Carlo particle filter
The huge amount of literature on real-time object tracking continuously reports good results with respect to accuracy and robustness. However, when it comes to the applicability of these approaches to real-world problems, often no clear statements about the tracking situation can be made. This paper addresses this issue and relies on three novel extensions to Monte Carlo particle filtering. The first, , together with the second, , leads to faster convergence and a more accurate pose estimation. The third, removes jitter and ensures convergence. These extensions significantly increase robustness and accuracy, and further provide a basis for an algorithm we found to be essential for tracking systems performing in the real world: . Relying on the extensions above, it reports qualitative states of tracking as follows. indicates if the pose has already been found. gives a statement about the confidence of the currently tracked pose. detects when the algorithm fails. determines the degree of occlusion if only parts of the object are visible. Building on tracking state detection, a scheme is proposed as a measure of which views of the object have already been learned and which areas require further inspection. To the best of our knowledge, this is the first tracking system that explicitly addresses the issue of estimating the tracking state. Our open-source framework is available online, serving as an easy-access interface for usage in practice.
Real-time field sports scene classification using colour and frequency space decompositions
This paper presents a novel approach to recognize a scene presented in an image with specific application to scene classification in field sports video. We propose different variants of the algorithm ranging from bags of visual words to the simplified real-time implementation, that takes only the most important areas of similar colour into account. All the variants feature similar accuracy which is comparable to very well-known image indexing techniques like SIFT or HoGs. For the comparison purposes, we also developed a specific database which is now available online. The algorithm is suitable in scene recognition task thanks to changes in speed and robustness to the image resolution, thus, making it a good candidate in real-time video indexing systems. The procedure features high simplicity thanks to the fact that it is based on the very well-known Fourier transform.
Journal of Real-Time Image Processing: third issue of volume 17
Implementing a real-time, AI-based, people detection and social distancing measuring system for Covid-19
COVID-19 is a disease caused by a severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including many deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons using thermal images. By exploiting YOLOv2 (you look at once) approach, a novel deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and to automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The proposed technique has been deployed in a low-cost embedded system () which is composed of a fixed camera. The proposed approach is implemented in a distributed surveillance video system to visualize people from several cameras in one centralized monitoring system. The achieved results show that the proposed method is suitable to set up a surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.
A new approach for the detection of pneumonia in children using CXR images based on an real-time IoT system
Pneumonia is responsible for high infant morbidity and mortality. This disease affects the small air sacs (alveoli) in the lung and requires prompt diagnosis and appropriate treatment. Chest X-rays are one of the most common tests used to detect pneumonia. In this work, we propose a real-time Internet of Things (IoT) system to detect pneumonia in chest X-ray images. The dataset used has 6000 chest X-ray images of children, and three medical specialists performed the validations. In this work, twelve different architectures of Convolutional Neural Networks (CNNs) trained on ImageNet were adapted to operate as the resource extractors. Subsequently, the CNNs were combined with consolidated learning methods, such as k-Nearest Neighbor (kNN), Naive Bayes, Random Forest, Multilayer Perceptron (MLP), and Support Vector Machine (SVM). The results showed that the VGG19 architecture with the SVM classifier using the RBF kernel was the best model to detect pneumonia in these chest radiographs. This combination reached 96.47%, 96.46%, and 96.46% for Accuracy, F1 score, and Precision values, respectively. Compared to other works in the literature, the proposed approach had better results for the metrics used. These results show that this approach for the detection of pneumonia in children using a real-time IoT system is efficient and is, therefore, a potential tool to aid in medical diagnoses. This approach will allow specialists to obtain faster and more accurate results and thus provide the appropriate treatment.
Real-time FPGA-based implementation of the AKAZE algorithm with nonlinear scale space generation using image partitioning
The first step in a scale invariant image matching system is scale space generation. Nonlinear scale space generation algorithms such as AKAZE, reduce noise and distortion in different scales while retaining the borders and key-points of the image. An FPGA-based hardware architecture for AKAZE nonlinear scale space generation is proposed to speed up this algorithm for real-time applications. The three contributions of this work are (1) mapping the two passes of the AKAZE algorithm onto a hardware architecture that realizes parallel processing of multiple sections, (2) multi-scale line buffers which can be used for different scales, and (3) a time-sharing mechanism in the memory management unit to process multiple sections of the image in parallel. We propose a time-sharing mechanism for memory management to prevent artifacts as a result of separating the process of image partitioning. We also use approximations in the algorithm to make hardware implementation more efficient while maintaining the repeatability of the detection. A frame rate of 304 frames per second for a image resolution is achieved which is favorably faster in comparison with other work.
Low-energy motion estimation memory system with dynamic management
The digital video coding process imposes severe pressure on memory traffic, leading to considerable power consumption related to frequent DRAM accesses. External off-chip memory demand needs to be minimized by clever architecture/algorithm co-design, thus saving energy and extending battery lifetime during video encoding. To exploit temporal redundancies among neighboring frames, the motion estimation (ME) algorithm searches for good matching between the current block and blocks within reference frames stored in external memory. To save energy during ME, this work performs memory accesses distribution analysis of the test zone search (TZS) ME algorithm and, based on this analysis, proposes both a multi-sector scratchpad memory design and dynamic management for the TZS memory access. Our dynamic memory management, called neighbor management, reduces both static consumption-by employing sector-level power gating-and dynamic consumption-by reducing the number of accesses for ME execution. Additionally, our dynamic management was integrated with two previously proposed solutions: a hardware reference frame compressor and the Level C data reuse scheme (using a scratchpad memory). This system achieves a memory energy consumption savings of and, when compared to the baseline solution composed of a reference frame compressor and data reuse scheme, the memory energy consumption was reduced by at a cost of just loss in coding efficiency, on average. When compared with related works, our system presents better memory bandwidth/energy savings and coding efficiency results.
A high-performance two-dimensional transform architecture of variable block sizes for the VVC standard
The versatile video coding standard H.266/VVC release has been accompanied with various new contributions to improve the coding efficiency beyond the high-efficiency video coding (HEVC), particularly in the transformation process. The adaptive multiple transform (AMT) is one of the new tools that was introduced in the transform module. It involves five transform types from the discrete cosine transform/discrete sine transform families with larger block sizes. The DCT-II has a fast computing algorithm, while the DST-VII relies on a complex matrix multiplication. This has led to an additional computational complexity. The approximation of the DST-VII can be used for the transform optimization. At the hardware level, this method can provide a gain in power consumption, logic resources use and speed. In this paper, a unifed two-dimensional transform architecture that enables exact and approximate DST-VII computation of sizes and is proposed. The exact transform computation can be processed using either multipliers or the MCM algorithm, while the approximate transform computation is based on additions and bit-shifting operations. All the designs are implemented under the Arria 10 FPGA device. The synthesis results show that the proposed design implementing the approximate transform matrices is the most efficient method with only 4% of area consumption. It reduces the logic utilization by more than 65% compared to the multipliers-based exact transform design, while about 53% of hardware cost saving is obtained when compared to the MCM-based computation. Furthermore, the approximate-based 2D transform architecture can operate at 78 MHz allowing a real-time coding for 2K and 4K videos at 100 and 25 frames/s, respectively.
Developing a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19
COVID-19 is a virus, which is transmitted through small droplets during speech, sneezing, coughing, and mostly by inhalation between individuals in close contact. The pandemic is still ongoing and causes people to have an acute respiratory infection which has resulted in many deaths. The risks of COVID-19 spread can be eliminated by avoiding physical contact among people. This research proposes real-time AI platform for people detection, and social distancing classification of individuals based on thermal camera. YOLOv4-tiny is proposed in this research for object detection. It is a simple neural network architecture, which makes it suitable for low-cost embedded devices. The proposed model is a better option compared to other approaches for real-time detection. An algorithm is also implemented to monitor social distancing using a bird's-eye perspective. The proposed approach is applied to videos acquired through thermal cameras for people detection, social distancing classification, and at the same time measuring the skin temperature for the individuals. To tune up the proposed model for individual detection, the training stage is carried out by thermal images with various indoor and outdoor environments. The final prototype algorithm has been deployed in a low-cost Nvidia Jetson devices (Xavier and Jetson Nano) which are composed of fixed camera. The proposed approach is suitable for a surveillance system within sustainable smart cities for people detection, social distancing classification, and body temperature measurement. This will help the authorities to visualize the fulfillment of the individuals with social distancing and simultaneously monitoring their skin temperature.
FAM: focal attention module for lesion segmentation of COVID-19 CT images
The novel coronavirus pneumonia (COVID-19) is the world's most serious public health crisis, posing a serious threat to public health. In clinical practice, automatic segmentation of the lesion from computed tomography (CT) images using deep learning methods provides an promising tool for identifying and diagnosing COVID-19. To improve the accuracy of image segmentation, an attention mechanism is adopted to highlight important features. However, existing attention methods are of weak performance or negative impact to the accuracy of convolutional neural networks (CNNs) due to various reasons (e.g. low contrast of the boundary between the lesion and the surrounding, the image noise). To address this issue, we propose a novel focal attention module (FAM) for lesion segmentation of CT images. FAM contains a channel attention module and a spatial attention module. In the spatial attention module, it first generates rough spatial attention, a shape prior of the lesion region obtained from the CT image using median filtering and distance transformation. The rough spatial attention is then input into two 7 × 7 convolution layers for correction, achieving refined spatial attention on the lesion region. FAM is individually integrated with six state-of-the-art segmentation networks (e.g. UNet, DeepLabV3+, etc.), and then we validated these six combinations on the public dataset including COVID-19 CT images. The results show that FAM improve the Dice Similarity Coefficient (DSC) of CNNs by 2%, and reduced the number of false negatives (FN) and false positives (FP) up to 17.6%, which are significantly higher than that using other attention modules such as CBAM and SENet. Furthermore, FAM significantly improve the convergence speed of the model training and achieve better real-time performance. The codes are available at GitHub (https://github.com/RobotvisionLab/FAM.git).
A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models
As seen in the COVID-19 pandemic, one of the most important measures is physical distance in viruses transmitted from person to person. According to the World Health Organization (WHO), it is mandatory to have a limited number of people in indoor spaces. Depending on the size of the indoors, the number of persons that can fit in that area varies. Then, the size of the indoor area should be measured and the maximum number of people should be calculated accordingly. Computers can be used to ensure the correct application of the capacity rule in indoors monitored by cameras. In this study, a method is proposed to measure the size of a prespecified region in the video and count the people there in real time. According to this method: (1) predetermining the borders of a region on the video, (2) identification and counting of people in this specified region, (3) it is aimed to estimate the size of the specified area and to find the maximum number of people it can take. For this purpose, the You Only Look Once (YOLO) object detection model was used. In addition, Microsoft COCO dataset pre-trained weights were used to identify and label persons. YOLO models were tested separately in the proposed method and their performances were analyzed. Mean average precision (mAP), frame per second (fps), and accuracy rate metrics were found for the detection of persons in the specified region. While the YOLO v3 model achieved the highest value in accuracy rate and mAP (both 0.50 and 0.75) metrics, the YOLO v5s model achieved the highest fps rate among non-Tiny models.
Butterfly network: a convolutional neural network with a new architecture for multi-scale semantic segmentation of pedestrians
The detection of multi-scale pedestrians is one of the challenging tasks in pedestrian detection applications. Moreover, the task of small-scale pedestrian detection, i.e., accurate localization of pedestrians as low-scale target objects, can help solve the issue of occluded pedestrian detection as well. In this paper, we present a fully convolutional neural network with a new architecture and an innovative, fully detailed supervision for semantic segmentation of pedestrians. The proposed network has been named butterfly network (BF-Net) because of its architecture analogous to a butterfly. The proposed BF-Net preserves the ability of simplicity so that it can process static images with a real-time image processing rate. The sub-path blocks embedded in the architecture of the proposed BF-Net provides a higher accuracy for detecting multi-scale objective targets including the small ones. The other advantage of the proposed architecture is replacing common batch normalization with conditional one. In conclusion, the experimental results of the proposed method demonstrate that the proposed network outperform the other state-of-the-art networks such as U-Net + + , U-Net3 + , Mask-RCNN, and Deeplabv3 + for the semantic segmentation of the pedestrians.
Complexity and compression efficiency analysis of AV1 video codec
The recent research effort aiming to provide a royalty-free video format resulted in AOMedia Video 1 (AV1), which was launched in 2018. AV1 was developed by the Alliance for Open Media (AOMedia), which groups several major technology companies such as Google, Netflix, Apple, Samsung, Intel, and many others. AV1 is currently one of the most prominent video formats and has introduced several complex coding tools and partitioning structures in comparison to its predecessors. A study of the computational effort required by the different AV1 coding steps and partition structures is essential for understanding its complexity distribution when implementing fast and efficient codecs compatible with this format. Thus, this paper presents two main contributions: first, a profiling analysis aiming at understanding the computational effort required by each individual coding step of AV1; and second, a computational cost and coding efficiency analysis related to the AV1 superblock partitioning process. Experimental results show that the two most complex coding steps of the reference software implementation are the inter-frame prediction and transform, which represent 76.98% and 20.57% of the total encoding time, respectively. Also, the experiments show that disabling ternary and asymmetric quaternary partitions provide the best relationship between coding efficiency and computational cost, increasing the bitrate by only 0.25% and 0.22%, respectively. Disabling all rectangular partitions provides an average time reduction of about 35%. The analyses presented in this paper provide insightful recommendations for the development of fast and efficient AV1-compatible codecs with a methodology that can be easily replicated.
Explaining decisions of a light-weight deep neural network for real-time coronary artery disease classification in magnetic resonance imaging
In certain healthcare settings, such as emergency or critical care units, where quick and accurate real-time analysis and decision-making are required, the healthcare system can leverage the power of artificial intelligence (AI) models to support decision-making and prevent complications. This paper investigates the optimization of healthcare AI models based on time complexity, hyper-parameter tuning, and XAI for a classification task. The paper highlights the significance of a lightweight convolutional neural network (CNN) for analysing and classifying Magnetic Resonance Imaging (MRI) in real-time and is compared with CNN-RandomForest (CNN-RF). The role of hyper-parameter is also examined in finding optimal configurations that enhance the model's performance while efficiently utilizing the limited computational resources. Finally, the benefits of incorporating the XAI technique (e.g. GradCAM and Layer-wise Relevance Propagation) in providing transparency and interpretable explanations of AI model predictions, fostering trust, and error/bias detection are explored. Our inference time on a MacBook laptop for 323 test images of size 100x100 is only 2.6 sec, which is merely 8 milliseconds per image while providing comparable classification accuracy with the ensemble model of CNN-RF classifiers. Using the proposed model, clinicians/cardiologists can achieve accurate and reliable results while ensuring patients' safety and answering questions imposed by the General Data Protection Regulation (GDPR). The proposed investigative study will advance the understanding and acceptance of AI systems in connected healthcare settings.
