IEEE Transactions on Computational Social Systems

Adaptive 3DCNN-based Interpretable Ensemble Model for Early Diagnosis of Alzheimer's Disease
Pan D, Luo G, Zeng A, Zou C, Liang H, Wang J, Zhang T, Yang B and
Adaptive interpretable ensemble model based on three-dimensional Convolutional Neural Network (3DCNN) and Genetic Algorithm (GA), i.e., 3DCNN+EL+GA, was proposed to differentiate the subjects with Alzheimer's Disease (AD) or Mild Cognitive Impairment (MCI) and further identify the discriminative brain regions significantly contributing to the classifications in a data-driven way. Plus, the discriminative brain sub-regions at a voxel level were further located in these achieved brain regions, with a gradient-based attribution method designed for CNN. Besides disclosing the discriminative brain sub-regions, the testing results on the datasets from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) indicated that 3DCNN+EL+GA outperformed other state-of-the-art deep learning algorithms and that the achieved discriminative brain regions (e.g., the rostral hippocampus, caudal hippocampus, and medial amygdala) were linked to emotion, memory, language, and other essential brain functions impaired early in the AD process. Future research is needed to examine the generalizability of the proposed method and ideas to discern discriminative brain regions for other brain disorders, such as severe depression, schizophrenia, autism, and cerebrovascular diseases, using neuroimaging.
Correlation studies of Hippocampal Morphometry and Plasma NFL Levels in Cognitively Unimpaired Subjects
Dong Q, Li Z, Liu W, Chen K, Su Y, Wu J, Caselli RJ, Reiman EM, Wang Y and Shen J
Alzheimer's disease(AD) is being the burden of society and family. Applying computing-aided strategies to reveal its pathology is one of the research highlights. Plasma neurofilament light (NFL) is an emerging noninvasive and economic biomarker for AD molecular pathology. It is valuable to reveal the correlations between the plasma NFL levels and neurodegeneration, especially hippcampal deformations at the preclinical stage. The negative correlation between plasma NFL levels and hippocampal volumes has been documented. However, the relationship between the plasma NFL levels and the hippocampal morphometry details at the preclinical stage is still elusive. This study seeks to demonstrate the capacity of our proposed surface-based hippocampal morphometry system to discern the plasma NFL positive (NFL+>41.9 pg/L) level and plasma NFL negative (NFL-<41.9pg/L) level and illustrate its superiority to the hippocampal volume measurement by drawing the cohort of 154 CU middle aged and elderly adults. We also apply this morphometry measure and a proposed sparse coding based classification algorithm to classify CU individuals with NFL+ and NFL- levels. Experimental results show that the proposed hippocampal morphometry system offers stronger statistical power to discriminate CU subjects with NFL+ and NFL- levels, comparing with the hippocampal volume measure. Furthermore, this system can discriminate plasma NFL levels in CU individuals (Accuracy=0.86). Both the group level and individual level analysis results indicate that the association between plasma NFL levels and the hippocampal shapes can be mapped at the preclinical stage.
Mitigating COVID-19 Transmission in Schools With Digital Contact Tracing
Sun HC, Liu XF, Du ZW, Xu XK and Wu Y
Precision mitigation of COVID-19 is in pressing need for postpandemic time with the absence of pharmaceutical interventions. In this study, the effectiveness and cost of digital contact tracing (DCT) technology-based on-campus mitigation strategy are studied through epidemic simulations using high-resolution empirical contact networks of teachers and students. Compared with traditional class, grade, and school closure strategies, the DCT-based strategy offers a practical yet much more efficient way of mitigating COVID-19 spreading in the crowded campus. Specifically, the strategy based on DCT can achieve the same level of disease control as rigid school suspensions but with significantly fewer students quarantined. We further explore the necessary conditions to ensure the effectiveness of DCT-based strategy and auxiliary strategies to enhance mitigation effectiveness and make the following recommendation: social distancing should be implemented along with DCT, the adoption rate of DCT devices should be assured, and swift virus tests should be carried out to discover asymptomatic infections and stop their subsequent transmissions. We also argue that primary schools have higher disease transmission risks than high schools and, thereby, should be alerted when considering reopenings.
COVID-19 Patient Count Prediction Using LSTM
Iqbal M, Al-Obeidat F, Maqbool F, Razzaq S, Anwar S, Tubaishat A, Khan MS and Shah B
In December 2019, a pandemic named COVID-19 broke out in Wuhan, China, and in a few weeks, it spread to more than 200 countries worldwide. Every country infected with the disease started taking necessary measures to stop the spread and provide the best possible medical facilities to infected patients and take precautionary measures to control the spread. As the infection spread was exponential, there arose a need to model infection spread patterns to estimate the patient volume computationally. Such patients' estimation is the key to the necessary actions that local governments may take to counter the spread, control hospital load, and resource allocations. This article has used long short-term memory (LSTM) to predict the volume of COVID-19 patients in Pakistan. LSTM is a particular type of recurrent neural network (RNN) used for classification, prediction, and regression tasks. We have trained the RNN model on Covid-19 data (March 2020 to May 2020) of Pakistan and predict the Covid-19 Percentage of Positive Patients for June 2020. Finally, we have calculated the mean absolute percentage error (MAPE) to find the model's prediction effectiveness on different LSTM units, batch size, and epochs. Predicted patients are also compared with a prediction model for the same duration, and results revealed that the predicted patients' count of the proposed model is much closer to the actual patient count.
A Short-Term Prediction Model at the Early Stage of the COVID-19 Pandemic Based on Multisource Urban Data
Wang R, Ji C, Jiang Z, Wu Y, Yin L and Li Y
The ongoing coronavirus disease 2019 (COVID-19) pandemic spread throughout China and worldwide since it was reported in Wuhan city, China in December 2019. 4 589 526 confirmed cases have been caused by the pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), by May 18, 2020. At the early stage of the pandemic, the large-scale mobility of humans accelerated the spread of the pandemic. Rapidly and accurately tracking the population inflow from Wuhan and other cities in Hubei province is especially critical to assess the potential for sustained pandemic transmission in new areas. In this study, we first analyze the impact of related multisource urban data (such as local temperature, relative humidity, air quality, and inflow rate from Hubei province) on daily new confirmed cases at the early stage of the local pandemic transmission. The results show that the early trend of COVID-19 can be explained well by human mobility from Hubei province around the Chinese Lunar New Year. Different from the commonly-used pandemic models based on transmission dynamics, we propose a simple but effective short-term prediction model for COVID-19 cases, considering the human mobility from Hubei province to the target cities. The performance of our proposed model is validated by several major cities in Guangdong province. For cities like Shenzhen and Guangzhou with frequent population flow per day, the values of [Formula: see text] of daily prediction achieve 0.988 and 0.985. The proposed model has provided a reference for decision support of pandemic prevention and control in Shenzhen.
Ranking of Importance Measures of Tweet Communities: Application to Keyword Extraction From COVID-19 Tweets in Japan
Harakawa R and Iwahashi M
This article presents a method that detects tweet communities with similar topics and ranks the communities by . By identifying the tweet communities that have high importance measures, it is possible for users to easily find important information about the coronavirus disease (COVID-19). Specifically, we first construct a community network, whose nodes are tweet communities obtained by applying a community detection method to a tweet network. The community network is constructed based on textual similarities between tweet communities and sizes of tweet communities. Second, we apply algorithms for calculating centrality to the community network. Because the obtained centrality is based on tweet community sizes as well, we call it the importance measure in distinction to conventional centrality. The importance measure can simultaneously evaluate the importance of topics in the entire data set and occupancy (or dominance) of tweet communities in the network structure. We conducted experiments by collecting Japanese tweets about COVID-19 from March 1, 2020 to May 15, 2020. The results show that the proposed method is able to extract keywords that have a high correlation with the number of people infected with COVID-19 in Japan. Because users can browse the keywords from a small number of central tweet communities, quick and easy understanding of important information becomes feasible.
Analysis, Modeling, and Representation of COVID-19 Spread: A Case Study on India
Mishra R, Gupta HP and Dutta T
Coronavirus outbreak is one of the challenging pandemics for the entire human population on Earth. Techniques, such as the isolation of infected people and maintaining social distancing, are the only preventive measures against the pandemic. The actual estimation of the number of infected peoples with limited data is an indeterminate problem faced by data scientists. There are several techniques in the existing literature, including reproduction number and case fatality rate, for predicting the duration of a pandemic and infectious population. This article presents a case study of different techniques for analyzing, modeling, and representing the data associated with a pandemic such as COVID-19. We further propose an algorithm for estimating infection transmission states in a particular area. This work also presents an algorithm for estimating end time of a pandemic from the susceptible infectious and recovered model. Finally, this article presents the empirical and data analysis to study the impact of transmission probability, rate of contact, infectious, and susceptible population on the pandemic spread.
A Mass-Conservation Model for Stability Analysis and Finite-Time Estimation of Spread of COVID-19
Rastgoftar H and Atkins E
The COVID-19 global pandemic has significantly impacted people throughout the United States and the World. While it was initially believed the virus was transmitted from animal to human, person-to-person transmission is now recognized as the main source of community spread. This article integrates data into physics-based models to analyze stability of the rapid COVID-19 growth and to obtain a data-driven model for spread dynamics among the human population. The proposed mass-conservation model is used to learn the parameters of pandemic growth and to predict the growth of total cases, deaths, and recoveries over a finite future time horizon. The proposed finite-time prediction model is validated by finite-time estimation of the total numbers of infected cases, deaths, and recoveries in the United States from March 12, 2020 to December 9, 2020.
COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis
Naseem U, Razzak I, Khushi M, Eklund PW and Kim J
Social media (and the world at large) have been awash with news of the COVID-19 pandemic. With the passage of time, news and awareness about COVID-19 spread like the pandemic itself, with an explosion of messages, updates, videos, and posts. Mass hysteria manifest as another concern in addition to the health risk that COVID-19 presented. Predictably, public panic soon followed, mostly due to misconceptions, a lack of information, or sometimes outright misinformation about COVID-19 and its impacts. It is thus timely and important to conduct an assessment of the early information flows during the pandemic on social media, as well as a case study of evolving public opinion on social media which is of general interest. This study aims to inform policy that can be applied to social media platforms; for example, determining what degree of moderation is necessary to curtail misinformation on social media. This study also analyzes views concerning COVID-19 by focusing on people who interact and share social media on Twitter. As a platform for our experiments, we present a new large-scale sentiment data set COVIDSENTI, which consists of 90 000 COVID-19-related tweets collected in the early stages of the pandemic, from February to March 2020. The tweets have been labeled into positive, negative, and neutral sentiment classes. We analyzed the collected tweets for sentiment classification using different sets of features and classifiers. Negative opinion played an important role in conditioning public sentiment, for instance, we observed that people favored lockdown earlier in the pandemic; however, as expected, sentiment shifted by mid-March. Our study supports the view that there is a need to develop a proactive and agile public health presence to combat the spread of negative sentiment on social media following a pandemic.
Sentiment Analysis of Lockdown in India During COVID-19: A Case Study on Twitter
Gupta P, Kumar S, Suman RR and Kumar V
With the rapid increase in the use of the Internet, sentiment analysis has become one of the most popular fields of natural language processing (NLP). Using sentiment analysis, the implied emotion in the text can be mined effectively for different occasions. People are using social media to receive and communicate different types of information on a massive scale during COVID-19 outburst. Mining such content to evaluate people's sentiments can play a critical role in making decisions to keep the situation under control. The objective of this study is to mine the sentiments of Indian citizens regarding the nationwide lockdown enforced by the Indian government to reduce the rate of spreading of Coronavirus. In this work, the sentiment analysis of tweets posted by Indian citizens has been performed using NLP and machine learning classifiers. From April 5, 2020 to April 17, 2020, a total of 12 741 tweets having the keywords "Indialockdown" are extracted. Data have been extracted from Twitter using Tweepy API, annotated using TextBlob and VADER lexicons, and preprocessed using the natural language tool kit provided by the Python. Eight different classifiers have been used to classify the data. The experiment achieved the highest accuracy of 84.4% with LinearSVC classifier and unigrams. This study concludes that the majority of Indian citizens are supporting the decision of the lockdown implemented by the Indian government during corona outburst.
Dynamical SEIR Model With Information Entropy Using COVID-19 as a Case Study
Nie Q, Liu Y, Zhang D and Jiang H
Social network information is a measure of the number of infections. Understanding the effect of social network information on disease spread can help improve epidemic forecasting and uncover preventive measures. Many driving factors for the transmission mechanism of infectious diseases remain unclear. Some experts believe that redundant information on social media may increase people's panic to evade the restrictions or refuse to report their symptoms, which increases the actual infection rate. We analyze the engagement in the COVID-19 topics on the Internet and find that the infection rate is not only related to the total amount of information. In our research, information entropy is introduced into the quantification of the impact of social network information. We find that the amount of information with different distributions has different effects on disease transmission. Furthermore, we build a new dynamic susceptible-exposed-infected-recovered (SEIR) model with information entropy to simulate the epidemic situation in China. Simulation results show that our modified model is effective in predicting the COVID-19 epidemic peaks and sizes.
Detecting Community Depression Dynamics Due to COVID-19 Pandemic in Australia
Zhou J, Zogan H, Yang S, Jameel S, Xu G and Chen F
The recent Coronavirus Infectious Disease 2019 (COVID-19) pandemic has caused an unprecedented impact across the globe. We have also witnessed millions of people with increased mental health issues, such as depression, stress, worry, fear, disgust, sadness, and anxiety, which have become one of the major public health concerns during this severe health crisis. Depression can cause serious emotional, behavioral, and physical health problems with significant consequences, both personal and social costs included. This article studies community depression dynamics due to the COVID-19 pandemic through user-generated content on Twitter. A new approach based on multimodal features from tweets and term frequency-inverse document frequency (TF-IDF) is proposed to build depression classification models. Multimodal features capture depression cues from emotion, topic, and domain-specific perspectives. We study the problem using recently scraped tweets from Twitter users emanating from the state of New South Wales in Australia. Our novel classification model is capable of extracting depression polarities that may be affected by COVID-19 and related events during the COVID-19 period. The results found that people became more depressed after the outbreak of COVID-19. The measures implemented by the government, such as the state lockdown, also increased depression levels.
The Pandemic Holiday Blip in New York City
Vierlboeck M, Nilchiani RR and Edwards CM
When it comes to pandemics, such as the one caused by the Coronavirus disease COVID-19, various issues and problems have arisen for the healthcare infrastructure and institutions. With increasing number of patients in need of urgent medical care and hospitalizations, the healthcare systems and regional hospitals may approach their maximum service capacity and may face shortage of various parameters, such as supplies including PPE, medications, therapeutic devices, ventilators, beds, and many more. The article at hand describes the development and framework of a simulation model that enables the modeling and evaluation of the COVID-19 pandemic progress. To achieve this, the model dynamically mimics and simulates the developments and time-dependent behavior of various crucial parameters of the pandemic, among others, the daily infection numbers and death rate. In addition, the model enables the simulation of single events and scenarios that occur outside of the regular pandemic developments as anomalies, such as holidays. Unlike traditional models, the proposed framework is based on factors and parameters closely derived from reality, such as the contact rate of individuals, which allows for a much more realistic representation. In addition, the real connection enables the assessment of effects of various influences regarding the development and progress of the pandemic, such as hospitalization numbers over time. All the aforementioned points are possible within the simulation framework and do not require awaiting the unfolding of the effects in reality. Thus, the model is capable of dynamically predicting how different scenarios turn out. The abilities of the model are demonstrated, illustrated, and proven in a specific case study that shows the impact of holidays, such as Passover and Easter in New York City when quarantine measures might have been ignored, and an increase in extended family gatherings temporarily occurred. As a result, the simulation showed significant impacts and disproportionate number of patients in need of medical care that could be potentially detrimental in reality. For example, compared to the previous trajectory of the pandemic, for a temporary increase of 50% in the contact rate of individuals, the model showed that the total number of cases would increase by 461 090, the maximum number of required hospitalizations would rise to 79 733, and the total number of fatalities would climb by 19 125 over 90 days. In addition to its function and proven capabilities, the model can and is furthermore planned to be adapted to other areas, not necessarily only metropolitan regions in order to expand the utilization of its predictive power. Such predictions could be used to derive regulatory measures and to test various policies for COVID-19 containment.
Thread Structure Learning on Online Health Forums with Partially Labeled Data
Liu Y, Shi J and Chen Y
Thread structures, the reply relationships between posts, in online forums are very important for readers to understand the thread content, as well as for improving the effectiveness of automated forum information retrieval, expert findings, etc. However, most online forums only have partially labeled structures, which means that some reply relationships are known while the others are unknown. To address this problem, studies have been performed to learn and predict thread structures. However, existing work does not leverage the partially available thread structures to learn the complete thread structure. We have also observed that many online health forums are a type of person-centric forums, where persons are mentioned across posts, providing hints about the reply relationships between posts. In this paper, we first proposed to learn the complete thread structures by leveraging the partially known structures based on a statistical machine learning model: thread conditional random fields (threadCRF). Then we proposed to use person resolution, the process of identifying the same person mentioned in different contexts, together with threadCRF for thread structure learning. We have empirically verified the effectiveness of the proposed approaches.
Modeling Behavioral Response to Vaccination Using Public Goods Game
Soltanolkottabi M, Ben-Arieh D and Wu CH
Epidemics of infectious disease can be traced back to the early days of mankind. Only in the last two centuries vaccination has become a viable strategy to prevent such epidemics. In addition to the clinical efficacy of this strategy, the behavior and public attitudes affect the success of vaccines. This paper describes modeling the efficacy of vaccination considering the cost and benefit of vaccination to individual players. The model is based on the public goods game and is presented as a spatial game on a lattice. Using this model, individuals can contribute to the public health by paying the cost of vaccination or choose to be protected by the public who is vaccinated rather than pay the cost and share the risk of vaccination. Thus, in this model individuals can choose to stay susceptible, can become infected, or choose to vaccinate once in each episode. This paper presents the behavioral changes of the population and the cost to the society as a function of the cost of vaccines, cost of being infected, and the "fear factor" created by the public media.
Extreme-scale Dynamic Exploration of a Distributed Agent-based Model with the EMEWS Framework
Ozik J, Collier NT, Wozniak JM, Macal C and An G
Agent-based models (ABMs) integrate multiple scales of behavior and data to produce higher-order dynamic phenomena and are increasingly used in the study of important social complex systems in biomedicine, socio-economics and ecology/resource management. However, the development, validation and use of ABMs is hampered by the need to execute very large numbers of simulations in order to identify their behavioral properties, a challenge accentuated by the computational cost of running realistic, large-scale, potentially distributed ABM simulations. In this paper we describe the Extreme-scale Model Exploration with Swift (EMEWS) framework, which is capable of efficiently composing and executing large ensembles of simulations and other "black box" scientific applications while integrating model exploration (ME) algorithms developed with the use of widely available 3rd-party libraries written in popular languages such as R and Python. EMEWS combines novel stateful tasks with traditional run-to-completion many task computing (MTC) and solves many problems relevant to high-performance workflows, including scaling to very large numbers (millions) of tasks, maintaining state and locality information, and enabling effective multiple-language problem solving. We present the high-level programming model of the EMEWS framework and demonstrate how it is used to integrate an active learning ME algorithm to dynamically and efficiently characterize the parameter space of a large and complex, distributed Message Passing Interface (MPI) agent-based infectious disease model.
Information Diffusion on Social Media During Natural Disasters
Dong R, Li L, Zhang Q and Cai G
Social media analytics has drawn new quantitative insights of human activity patterns. Many applications of social media analytics, from pandemic prediction to earthquake response, require an in-depth understanding of how these patterns change when human encounter unfamiliar conditions. In this paper, we select two earthquakes in China as the social context in Sina-Weibo (or Weibo for short), the largest Chinese microblog site. After proposing a formalized Weibo information flow model to represent the information spread on Weibo, we study the information spread from three main perspectives: individual characteristics, the types of social relationships between interactive participants, and the topology of real interaction networks. The quantitative analyses draw the following conclusions. First, the shadow of Dunbar's number is evident in the "declared friends/followers" distributions, and the number of each participant's friends/followers who also participated in the earthquake information dissemination show the typical power-law distribution, indicating a rich-gets-richer phenomenon. Second, an individual's number of followers is the most critical factor in user influence. Strangers are very important forces for disseminating real-time news after an earthquake. Third, two types of real interaction networks share the scale-free and small-world property, but with a looser organizational structure. In addition, correlations between different influence groups indicate that when compared with other online social media, the discussion on Weibo is mainly dominated and influenced by verified users.
Breast Cancer Symptom Clusters Derived from Social Media and Research Study Data Using Improved K-Medoid Clustering
Ping Q, Yang CC, Marshall SA, Avis NE and Ip EH
Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients' functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. The present study seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved K-Medoid clustering. A total of 50,426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared to that of the research study data, making the social media data easier to partition. The proposed revised K-Medoid clustering helps to improve the clustering performance by re-assigning some of the negative-ASW (average silhouette width) symptoms to other clusters after initial K-Medoid clustering. This retains an overall non-decreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal (GI) related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment and pain-related symptoms. We recommend an integrative approach taking advantage of both data sources. Social media data could provide context for the interpretation of clustering results derived from research study data, while research study data could compensate for the risk of lower precision and recall found using social media data.
Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network
Bui N, Yen J and Honavar V
Online health communities constitute a useful source of information and social support for patients. American Cancer Society's Cancer Survivor Network (CSN), a 173,000-member community, is the largest online network for cancer patients, survivors, and caregivers. A discussion thread in CSN is often initiated by a cancer survivor seeking support from other members of CSN. Discussion threads are multi-party conversations that often provide a source of social support e.g., by bringing about a change of sentiment from negative to positive on the part of the thread originator. While previous studies regarding cancer survivors have shown that members of an online health community derive benefits from their participation in such communities, causal accounts of the factors that contribute to the observed benefits have been lacking. We introduce a novel framework to examine the temporal causality of sentiment dynamics in the CSN. We construct a Probabilistic Computation Tree Logic representation and a corresponding probabilistic Kripke structure to represent and reason about the changes in sentiments of posts in a thread over time. We use a sentiment classifier trained using machine learning on a set of posts manually tagged with sentiment labels to classify posts as expressing either positive or negative sentiment. We analyze the probabilistic Kripke structure to identify the prima facie causes of sentiment change on the part of the thread originators in the CSN forum and their significance. We find that the sentiment of replies appears to causally influence the sentiment of the thread originator. Our experiments also show that the conclusions are robust with respect to the choice of the (i) classification threshold of the sentiment classifier; (ii) and the choice of the specific sentiment classifier used. We also extend the basic framework for temporal causality analysis to incorporate the uncertainty in the states of the probabilistic Kripke structure resulting from the use of an imperfect state transducer (in our case, the sentiment classifier). Our analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers and moderators of an online community such as CSN can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants. The proposed methodology for analysis of temporal causality has broad applicability in a variety of settings where the dynamics of the underlying system can be modeled in terms of state variables that change in response to internal or external inputs.