SOFTWARE QUALITY JOURNAL

Machine learning for mHealth apps quality evaluation: An approach based on user feedback analysis
Haoues M, Mokni R and Sellami A
Mobile apps for healthcare (mHealth apps for short) have been increasingly adapted to help users manage their health or to get healthcare services. User feedback analysis is a pertinent method that can be used to improve the quality of mHealth apps. The objective of this paper is to use supervised machine learning algorithms to evaluate the quality of mHealth apps according to the ISO/IEC 25010 quality model based on user feedback. For this purpose, a total of 1682 user reviews have been collected from 86 mHealth apps provided by Google Play Store. Those reviews have been classified initially into the ISO/IEC 25010 eight quality characteristics, and further into Negative, Positive, and Neutral opinions. This analysis has been performed using machine learning and natural language processing techniques. The best performances were provided by the Stochastic Gradient Descent (SGD) classifier with an accuracy of 82.00% in classifying user reviews according to the ISO/IEC 25010 quality characteristics. Moreover, Support Vector Machine (SVM) classified the collected user reviews into Negative, Positive, and Neutral with an accuracy of 90.50%. Finally, for each quality characteristic, we classified the collected reviews according to the sentiment polarity. The best performance results were obtained for the Usability, Security, and Compatibility quality characteristics using SGD classifier with an accuracy equal to 98.00%, 97.50%, and 96.00%, respectively. The results of this paper will be effective to assist developers in improving the quality of mHealth apps.
Ergo, SMIRK is safe: a safety case for a machine learning component in a pedestrian automatic emergency brake system
Borg M, Henriksson J, Socha K, Lennartsson O, Sonnsjö Lönegren E, Bui T, Tomaszewski P, Sathyamoorthy SR, Brink S and Helali Moghadam M
Integration of machine learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF and AMLAS provide high-level guidance but the details must be chiseled out for each specific case. We initiated a research project with the goal to demonstrate a complete safety case for an ML component in an open automotive system. This paper reports results from an industry-academia collaboration on safety assurance of SMIRK, an ML-based pedestrian automatic emergency braking demonstrator running in an industry-grade simulator. We demonstrate an application of AMLAS on SMIRK for a minimalistic operational design domain, i.e., we share a complete safety case for its integrated ML-based component. Finally, we report lessons learned and provide both SMIRK and the safety case under an open-source license for the research community to reuse.
An empirical investigation on the challenges of creating custom static analysis rules for defect localization
Mendonça DS and Kalinowski M
Custom static analysis rules, i.e., rules specific for one or more applications, have been successfully applied to perform corrective and preventive software maintenance. Pattern-driven maintenance (PDM) is a method designed to support the creation of such rules during software maintenance. However, as PDM was recently proposed, few maintainers have reported on its usage. Hence, the challenges and skills needed to apply PDM properly are unknown. In this paper, we investigate the challenges faced by maintainers on applying PDM for creating custom static analysis rules for defect localization. We conducted an observational study on novice maintainers creating custom static analysis rules by applying PDM. The study was divided into three tasks: (i) identifying a defect pattern, (ii) programming a static analysis rule to locate instances of the pattern, and (iii) verifying the located instances. We analyzed the efficiency and acceptance of maintainers on applying PDM and their comments on task challenges. We observed that previous knowledge on debugging, the subject software, and related technologies influenced the performance of maintainers as well as the time to learn the technology involved in rule programming. The results strengthen our confidence that PDM can help maintainers in producing custom static analysis rules for locating defects. However, a proper selection and training of maintainers is needed to apply PDM effectively. Also, using a higher level of abstraction can ease static analysis rule programming for novice maintainers.
Editorial
Gaston C, Kosmatov N and Le Gall P
Building an open-source system test generation tool: lessons learned and empirical analyses with EvoMaster
Arcuri A, Zhang M, Belhadi A, Marculescu B, Golmohammadi A, Galeotti JP and Seran S
Research in software testing often involves the development of software prototypes. Like any piece of software, there are challenges in the development, use and verification of such tools. However, some challenges are rather specific to this problem domain. For example, often these tools are developed by PhD students straight out of bachelor/master degrees, possibly lacking any industrial experience in software development. Prototype tools are used to carry out empirical studies, possibly studying different parameters of novel designed algorithms. Software scaffolding is needed to run large sets of experiments efficiently. Furthermore, when using AI-based techniques like evolutionary algorithms, care needs to be taken to deal with their randomness, which further complicates their verification. The aforementioned represent some of the challenges we have identified for this domain. In this paper, we report on our experience in building the open-source EvoMaster tool, which aims at system-level test case generation for enterprise applications. Many of the challenges we faced would be common to any researcher needing to build software testing tool prototypes. Therefore, one goal is that our shared experience here will boost the research community, by providing concrete solutions to many development challenges in the building of such kind of research prototypes. Ultimately, this will lead to increase the impact of scientific research on industrial practice.
A family of experiments about how developers perceive delayed system response time
Cornejo O, Briola D, Micucci D, Ginelli D, Mariani L, Santos Parrilla A and Juristo N
Collecting and analyzing data about developers working on their development tasks can help improve development practices, finally increasing the productivity of teams. Indeed, monitoring and analysis tools have already been used to collect data from productivity tools. Monitoring inevitably consumes resources and, depending on their extensiveness, may significantly slow down software systems, interfering with developers' activity. There is thus a challenging trade-off between monitoring and validating applications in their operational environment and preventing the degradation of the user experience. The lack of studies about developers perceive an overhead introduced in an application makes it extremely difficult to fine-tune techniques working in the field. In this paper, we address this challenge by presenting an empirical study that quantifies how developers perceive overhead. The study consists of three replications of an experiment that involved 99 computer science students in total, followed by a small-scale experimental assessment of the key findings with 12 professional developers. Results show that non-negligible overhead can be introduced for a short period into applications without developers perceiving it and that the sequence in which complex operations are executed influences the perception of the system response time. This information can be exploited to design better monitoring techniques.