Big Data

Analysis on Research Situation of Soybean Quality Evaluation Based on Bibliometrics
Gao Y, Tang P, Tang X, Wang D, Luo J and Wu J
Soybeans are a high-quality vegetable protein resource and a fundamental strategic material integral to the national economy and public livelihood. To investigate the research status of soybean quality evaluation, this study analyzes relevant literature from Web of Science and China Knowledge Network (2000-2024). Using bibliometric methods with Excel and VOSviewer, we examined publication years, keywords, authors, sources, countries/regions, and institutions, generating visualizations to intuitively illustrate the field's developmental status. Results indicate that over the past 25 years, soybean quality evaluation research has emerged as a focal point in crop science, with institutions predominantly located in China and the United States. Key journals in this domain include , , and , among others. Research primarily focuses on soybean physical characteristics and the component-quality relationship. Interdisciplinary advancements have positioned spectral analysis, intelligent systems, and multitechnology fusion as innovative frontiers in this field. These findings enhance researchers' understanding of current trends and support evidence-based decision-making in soybean quality evaluation.
Monitoring Carbon Emission from Key Industries Based on VF-LSTM Model
Wang Y, Xiang T, Luo S, Gao Y and Kong X
Human activities that generate greenhouse gas emissions pose a significant threat to urban green and sustainable development. Production activities in key industrial sectors are a primary contributor to high urban carbon emissions. Therefore, effectively reducing carbon emissions in these sectors is crucial for achieving urban carbon peak and neutrality goals. Carbon emission monitoring is a critical approach that aids governmental bodies in understanding changes in industrial carbon emissions, thereby supporting decision-making and carbon reduction efforts. However, current industry-oriented carbon monitoring methods suffer from issues such as low frequency, poor accuracy, and inadequate privacy security. To address these challenges, this article proposes a novel privacy-protected "electricity-carbon'' nexus model, long short-term memory with the vertical federated framework (VF-LSTM), to monitor carbon emissions in key urban industries. The vertical federated framework ensures "usable but invisible" privacy protection for multisource data from various participants. The embedded long short-term memory model accurately captures industry-specific carbon emissions. Using data from key industries (steel, petrochemical, chemical, and nonferrous industries), this article constructs and validates the performance of the proposed industry-level carbon emission monitoring model. The results demonstrate that the model has high accuracy and robustness, effectively monitoring industry carbon emissions while protecting data privacy.
DMHANT: DropMessage Hypergraph Attention Network for Information Propagation Prediction
Ouyang Q, Chen H, Liu S, Pu L, Ge D and Fan K
Predicting propagation cascades is crucial for understanding information propagation in social networks. Existing methods always focus on structure or order of infected users in a single cascade sequence, ignoring the global dependencies of cascades and users, which is insufficient to characterize their dynamic interaction preferences. Moreover, existing methods are poor at addressing the problem of model robustness. To address these issues, we propose a predication model named DropMessage Hypergraph Attention Networks, which constructs a hypergraph based on the cascade sequence. Specifically, to dynamically obtain user preferences, we divide the diffusion hypergraph into multiple subgraphs according to the time stamps, develop hypergraph attention networks to explicitly learn complete interactions, and adopt a gated fusion strategy to connect them for user cascade prediction. In addition, a new drop immediately method DropMessage is added to increase the robustness of the model. Experimental results on three real-world datasets indicate that proposed model significantly outperforms the most advanced information propagation prediction model in both MAP@k and Hits@K metrics, and the experiment also proves that the model achieves more significant prediction performance than the existing model under data perturbation.
Evolutionary Trends in Decision Sciences Education Research from Simulation and Games to Big Data Analytics and Generative Artificial Intelligence
Akpan IJ, Razavi R and Akpan AA
Decision sciences (DSC) involves studying complex dynamic systems and processes to aid informed choices subject to constraints in uncertain conditions. It integrates multidisciplinary methods and strategies to evaluate decision engineering processes, identifying alternatives and providing insights toward enhancing prudent decision-making. This study analyzes the evolutionary trends and innovation in DSC education and research trends over the past 25 years. Using metadata from bibliographic records and employing the science mapping method and text analytics, we map and evaluate the thematic, intellectual, and social structures of DSC research. The results identify "knowledge management," "decision support systems," "data envelopment analysis," "simulation," and "artificial intelligence" (AI) as some of the prominent critical skills and knowledge requirements for problem-solving in DSC before and during the period (2000-2024). However, these technologies are evolving significantly in the recent wave of digital transformation, with data analytics frameworks (including techniques such as big data analytics, machine learning, business intelligence, data mining, and information visualization) becoming crucial. DSC education and research continue to mirror the development in practice, with sustainable education through virtual/online learning becoming prominent. Innovative pedagogical approaches/strategies also include computer simulation and games ("play and learn" or "role-playing"). The current era witnesses AI adoption in different forms as conversational Chatbot agent and generative AI (GenAI), such as chat generative pretrained transformer in teaching, learning, and scholarly activities amidst challenges (academic integrity, plagiarism, intellectual property violations, and other ethical and legal issues). Future DSC education must innovatively integrate GenAI into DSC education and address the resulting challenges.
Maximizing Influence in Social Networks Using Combined Local Features and Deep Learning-Based Node Embedding
Bouyer A, Beni HA, Oskouei AG, Rouhi A, Arasteh B and Liu X
The influence maximization problem has several issues, including low infection rates and high time complexity. Many proposed methods are not suitable for large-scale networks due to their time complexity or free parameter usage. To address these challenges, this article proposes a local heuristic called Embedding Technique for Influence Maximization (ETIM) that uses shell decomposition, graph embedding, and reduction, as well as combined local structural features. The algorithm selects candidate nodes based on their connections among network shells and topological features, reducing the search space and computational overhead. It uses a deep learning-based node embedding technique to create a multidimensional vector of candidate nodes and calculates the dependency on spreading for each node based on local topological features. Finally, influential nodes are identified using the results of the previous phases and newly defined local features. The proposed algorithm is evaluated using the independent cascade model, showing its competitiveness and ability to achieve the best performance in terms of solution quality. Compared with the collective influence global algorithm, ETIM is significantly faster and improves the infection rate by an average of 12%.
Optimizing Multilayer Networks Through Time-Dependent Decision-Making: A Comparative Study
Menguc K and Yilmaz A
This research highlights the importance of accurately analyzing real-world multilayer network problems and introduces effective solutions. Whether simulating protein-protein network, transportation network, or a social network, representation and analysis over these networks are crucial. Multilayer networks, that contain added layers, may undergo dynamic transformations over time akin to single-layer networks that experience changes over time. These dynamic networks, that expand and contract, can be optimized by guidance from human operators if the transient changes are known and can be controlled. For the expansion and contraction of networks, this study introduces two distinct algorithms designed to make optimal decisions across dynamic changes of a multilayer network. The main strategy is to minimize the standard deviation across betweenness centrality of the edges in a complex network. The approaches we introduce incorporate diverse constraints into a multilayer weighted network, probing the network's expansion or contraction under various conditions represented as objective functions. The addition of changing of objective function enhances the model's adaptability to solve a wide array of problem types. In this way, complex network structures representing real-world problems can be mathematically modeled which makes it easier to make informed decisions.
The Impact of Cloaking Digital Footprints on User Privacy and Personalization
Goethals S, Matz S, Provost F, Martens D and Ramon Y
Our online lives generate a wealth of behavioral records--which are stored and leveraged by technology platforms. These data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). We explore the concept of : allowing users to hide parts of their digital footprints from predictive algorithms, to prevent unwanted inferences. This article addresses two open questions: (i) can cloaking be effective in the longer term, as users continue to generate new digital footprints? And (ii) what is the potential impact of cloaking on the accuracy of inferences? We introduce a novel strategy focused on cloaking "metafeatures" and compare its efficacy against just cloaking the raw footprints. The main findings are (i) while cloaking effectiveness does indeed diminish over time, using metafeatures slows the degradation; (ii) there is a tradeoff between privacy and personalization: cloaking undesired inferences also can inhibit desirable inferences. Furthermore, the metafeature strategy-which yields more stable cloaking-also incurs a larger reduction in desirable inferences.
A Study of Public Opinion Reversal Recognition of Emergency Based on Hypernetwork
Wang X
With the rapid development of social media and online platforms, the speed and influence of emergency dissemination in cyberspace have significantly increased. The swift changes in public opinion, especially the phenomenon of opinion reversals, exert profound impacts on social stability and government credibility. The hypernetwork structure, characterized by its multilayered and multidimensional complexity, offers a new theoretical framework for analyzing multiagents and their interactions in the evolution of public opinion. Based on hypernetwork theory, this study constructs a four-layer subnet model encompassing user interaction network, event evolution network, semantic association network, and emotional conduction network. By extracting network structural features and conducting cross-layer linkage analysis, an identification system for public opinion reversals in emergencies is established. Taking the donation incident involving Hongxing Erke during the Henan rainstorm in 2021 as a case study, an empirical analysis of the public opinion reversal process is conducted. The research results indicate that the proposed hypernetwork model can effectively identify key nodes in public opinion reversals. The multi-indicator collaborative identification system for public opinion reversals aids in rapidly and effectively detecting signals of such reversals. This study not only provides new methodological support for the dynamic identification of public opinion reversals but also offers theoretical references and practical guidance for public opinion monitoring and emergency response decision-making in emergencies.
Dual-Path Graph Neural Network with Adaptive Auxiliary Module for Link Prediction
Yang Z, Lin Z, Yang Y and Li J
Link prediction, which has important applications in many fields, predicts the possibility of the link between two nodes in a graph. Link prediction based on Graph Neural Network (GNN) obtains node representation and graph structure through GNN, which has attracted a growing amount of attention recently. However, the existing GNN-based link prediction approaches possess some shortcomings. On the one hand, because a graph contains different types of nodes, it leads to a great challenge for aggregating information and learning node representation from its neighbor nodes. On the other hand, the attention mechanism has been an effect instrument for enhancing the link prediction performance. However, the traditional attention mechanism is always monotonic for query nodes, which limits its influence on link prediction. To address these two problems, a Dual-Path Graph Neural Network (DPGNN) for link prediction is proposed in this study. First, we propose a novel Local Random Features Augmentation for Graph Convolution Network as a baseline of one path. Meanwhile, Graph Attention Network version 2 based on dynamic attention mechanism is adopted as a baseline of the other path. And then, we capture more meaningful node representation and more accurate link features by concatenating the information of these two paths. In addition, we propose an adaptive auxiliary module for better balancing the weight of auxiliary tasks, which brings more benefit to link prediction. Finally, extensive experiments verify the effectiveness and superiority of our proposed DPGNN for link prediction.
A Basketball Big Data Platform for Box Score and Play-by-Play Data
Vinué G
This is the second part of a research diptych devoted to improving basketball data management in Spain. The Spanish ACB (Association of Basketball Clubs, acronym in Spanish) is the top European national competition. It attracts most of the best foreign players outside the NBA (National Basketball Association, in North America) and also accelerates the development of Spanish players who ultimately contribute to the success of the Spanish national team. However, this sporting excellence is not reciprocated by an advanced treatment of the data generated by teams and players, the so-called statistics. On the contrary, their use is still very rudimentary. An earlier article published in this journal in 2020 introduced the first open web application for interactive visualization of the box score data from three European competitions, including the ACB. Box score data refer to the data provided once the game is finished. Following the same inspiration, this new research aims to present the work carried out with more advanced data, namely, play-by-play data, which are provided as the game runs. This type of data allow us to gain greater insight into basketball performance, providing information that cannot be revealed with box score data. A new dashboard is developed to analyze play-by-play data from a number of different and novel perspectives. Furthermore, a comprehensive data platform encompassing the visualization of the ACB box score and play-by-play data is presented.
A Fast Survival Support Vector Regression Approach to Large Scale Credit Scoring via Safe Screening
Wang H and Hong L
Survival models have found wider and wider applications in credit scoring recently due to their ability to estimate the dynamics of risk over time. In this research, we propose a Buckley-James safe sample screening support vector regression (BJS4VR) algorithm to model large-scale survival data by combing the Buckley-James transformation and support vector regression. Different from previous support vector regression survival models, censored samples here are imputed using a censoring unbiased Buckley-James estimator. Safe sample screening is then applied to discard samples that guaranteed to be non-active at the final optimal solution from the original data to improve efficiency. Experimental results on the large-scale real lending club loan data have shown that the proposed BJS4VR model outperforms existing popular survival models such as RSFM, CoxRidge and CoxBoost in terms of both prediction accuracy and time efficiency. Important variables highly correlated with credit risk are also identified with the proposed method.
Content-Aware Human Mobility Pattern Extraction
Li S, Fan C, Li T, Chen R, Liu Q and Gong J
Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.
Research on the Influence of Information Iterative Propagation on Complex Network Structure
Qian Y, Nian F, Wang Z and Yao Y
Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.
Deep Learning-Based Decision Support System for Nurse Staff in Hospitals
Chen J, He F, Tang L and Gu L
To promote the informatization management of hospital human resources and advance the application of hospital information technology. The application of deep learning (DL) technologies in health care, particularly in hospital settings, has shown significant promise in enhancing decision-making processes for nurse staff. Utilizing a hospital management decision support system based on data warehouse theory and business intelligence technology to achieve multidimensional analysis and display of data. This research explores the development and implementation of a DL-Based Clinical Decision Support System (DL-CDSS) tailored for nurses in hospitals. DL-CDSS utilizes advanced neural network architectures to analyze complex clinical data, including patient records, vital signs, and diagnostic reports, aiming to assist nurses in making informed decisions regarding patient care. By leveraging large-scale datasets from Hospital Information Systems, DL-CDSS provides real-time recommendations for treatment plans, medication administration, and patient monitoring. The system's effectiveness is demonstrated through improved accuracy in clinical decision-making, reduction in medication errors, and optimized workflow efficiency. The system analyzes and displays nurses data from hospitals in terms of quantity, distribution, structure, forecasting, analysis reports, and peer comparisons, providing head nurses with multilevel, multiperspective data mining analysis results. Challenges such as data integration, model interpretability, and user interface design are addressed to ensure seamless integration into nursing practice, also concludes with insights into the potential benefits of DL-CDSS in promoting patient safety, enhancing health care quality, and supporting nursing professionals in delivering optimal care.
Research on Sports Injury Rehabilitation Detection Based on IoT Models for Digital Health Care
Wu Z, Huang Z, Tang N, Wang K, Bian C, Li D, Kuraki V and Schmid F
Physical therapists specializing in sports rehabilitation detection help injured athletes recover from their wounds and avoid further harm. Sports rehabilitators treat not just commonplace sports injuries but also work-related musculoskeletal injuries, discomfort, and disorders. Sensor-equipped Internet of Things (IoT) monitors the real-time location of medical equipment such as scooters, cardioverters, nebulizer treatments, oxygenation pumps, or other monitor gear. Analysis of medicine deployment across sites is possible in real time. Health care delivery based on digital technology to improve access, affordability, and sustainability of medical treatment is known as digital health care. The challenging characteristics of such sports injury rehabilitation for digital health care are playing position, game strategies, and cybersecurity. Hence, in this research, have been designed to improve sports injury rehabilitation detection for digital health care. The health care sector may benefit significantly from IoT adoption since it allows for enhanced patient safety; health care investment management includes controlling the hospital's pharmaceutical stock and monitoring the heat and humidity levels. Digital health describes a group of programmers made to aid health care delivery, whether by assisting with clinical decision-making or streamlining back-end operations in health care institutions. A effectively predicts the rise in sports injury rehabilitation detection with faster digital health care based on IoT. The research concludes that the effectively indicates sports injury rehabilitation detection for digital health care. The experimental analysis of outperforms the IoT method in terms of performance, accuracy, prediction ratio, and mean square error rate.
Balancing Protection and Quality in Big Data Analytics Pipelines
Polimeno A, Mignone P, Braghin C, Anisetti M, Ceci M, Malerba D and Ardagna CA
Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article presents the architecture and technical implementation of a data engine addressing this conflict by integrating a new governance solution based on access control within a big data analytics pipeline. Our data engine enriches traditional components for data governance with an access control system that enforces access to data in a big data environment based on data transformations. Data are then used along the pipeline only after sanitization, protecting sensitive attributes before their usage, in an effort to facilitate the balance between protection and quality. The solution was tested in a real-world smart city scenario using the data of the Oslo city transportation system. Specifically, we compared the different predictive models trained with the data views obtained by applying the secure transformations required by different user roles to the same data set. The results show that the predictive models, built on data manipulated according to access control policies, are still effective.
Cloud Resource Scheduling Using Multi-Strategy Fused Honey Badger Algorithm
Xie H, Li C, Ye Z, Zhao T, Xu H, Du J and Bai W
Cloud resource scheduling is one of the most significant tasks in the field of big data, which is a combinatorial optimization problem in essence. Scheduling strategies based on meta-heuristic algorithms (MAs) are often chosen to deal with this topic. However, MAs are prone to falling into local optima leading to decreasing quality of the allocation scheme. Algorithms with good global search ability are needed to map available cloud resources to the requirements of the task. Honey Badger Algorithm (HBA) is a newly proposed algorithm with strong search ability. In order to further improve scheduling performance, an Improved Honey Badger Algorithm (IHBA), which combines two local search strategies and a new fitness function, is proposed in this article. IHBA is compared with 6 MAs in four scale load tasks. The comparative simulation results obtained reveal that the proposed algorithm performs better than other algorithms involved in the article. IHBA enhances the diversity of algorithm populations, expands the individual's random search range, and prevents the algorithm from falling into local optima while effectively achieving resource load balancing.
Introduction to the Special Issue on Big Data and the Internet of Things in Complex Information Systems
Chang V, Kacsuk P, Wills G and Behringer R
Enhancing Real-Time Patient Monitoring in Intensive Care Units with Deep Learning and the Internet of Things
Bai Y, Gu B and Tang C
The demand for intensive care units (ICUs) is steadily increasing, yet there is a relative shortage of medical staff to meet this need. Intensive care work is inherently heavy and stressful, highlighting the importance of optimizing these units' working conditions and processes. Such optimization is crucial for enhancing work efficiency and elevating the level of diagnosis and treatment provided in ICUs. The intelligent ICU concept represents a novel ward management model that has emerged through advancements in modern science and technology. This includes communication technology, the Internet of Things (IoT), artificial intelligence (AI), robotics, and big data analytics. By leveraging these technologies, the intelligent ICU aims to significantly reduce potential risks associated with human error and improve patient monitoring and treatment outcomes. Deep learning (DL) and IoT technologies have huge potential to revolutionize the surveillance of patients in the ICUs due to the critical and complex nature of their conditions. This article provides an overview of the most recent research and applications of linical data for critically ill patients, with a focus on the execution of AI. In the ICU, seamless and continuous monitoring is critical, as even little delays in patient care decision-making can result in irreparable repercussions or death. This article looks at how modern technologies like DL and the IoT can improve patient monitoring, clinical results, and ICU processes. Furthermore, it investigates the function of wearable and advanced health sensors coupled with IoT networking systems, which enable the secure connection and analysis of various forms of patient data for predictive and remote analysis by medical professionals. By assessing existing patient monitoring systems, outlining the roles of DL and IoT, and analyzing the benefits and limitations of their integration, this study hopes to shed light on the future of ICU patient care and identify opportunities for further research.
Prognostic Modeling for Liver Cirrhosis Mortality Prediction and Real-Time Health Monitoring from Electronic Health Data
Zhang C, Iqbal MFB, Iqbal I, Cheng M, Sarhan N, Awwad EM and Ghadi YY
Liver cirrhosis stands as a prominent contributor to mortality, impacting millions across the United States. Enabling health care providers to predict early mortality among patients with cirrhosis holds the potential to enhance treatment efficacy significantly. Our hypothesis centers on the correlation between mortality and laboratory test results along with relevant diagnoses in this patient cohort. Additionally, we posit that a deep learning model could surpass the predictive capabilities of the existing Model for End-Stage Liver Disease score. This research seeks to advance prognostic accuracy and refine approaches to address the critical challenges posed by cirrhosis-related mortality. This study evaluates the performance of an artificial neural network model for liver disease classification using various training dataset sizes. Through meticulous experimentation, three distinct training proportions were analyzed: 70%, 80%, and 90%. The model's efficacy was assessed using precision, recall, F1-score, accuracy, and support metrics, alongside receiver operating characteristic (ROC) and precision-recall (PR) curves. The ROC curves were quantified using the area under the curve (AUC) metric. Results indicated that the model's performance improved with an increased size of the training dataset. Specifically, the 80% training data model achieved the highest AUC, suggesting superior classification ability over the models trained with 70% and 90% data. PR analysis revealed a steep trade-off between precision and recall across all datasets, with 80% training data again demonstrating a slightly better balance. This is indicative of the challenges faced in achieving high precision with a concurrently high recall, a common issue in imbalanced datasets such as those found in medical diagnostics.
Special Issue: Big Scientific Data and Machine Learning in Science and Engineering
Pourkamali-Anaraki F