Comparison of Soft Indicator and Poisson Kriging for the Noise-Filtering and Downscaling of Areal Data: Application to Daily COVID-19 Incidence Rates
This paper addresses two common challenges in analyzing spatial epidemiological data, specifically disease incidence rates recorded over small areas: filtering noise caused by small local population sizes and deriving estimates at different spatial scales. Geostatistical techniques, including Poisson kriging (PK), have been used to address these issues by accounting for spatial correlation patterns and neighboring observations in smoothing and changing spatial support. However, PK has a limitation in that it can generate unrealistic rates that are either negative or greater than 100%. To overcome this limitation, an alternative method that relies on soft indicator kriging (IK) is presented. The performance of this method is compared to PK using daily COVID-19 incidence rates recorded in 2020-2021 for each of the 581 municipalities in Belgium. Both approaches are used to derive noise-filtered incidence rates for four different dates of the pandemic at the municipality level and at the nodes of a 1 km spacing grid covering the country. The IK approach has several attractive features: (1) the lack of negative kriging estimates, (2) the smaller smoothing effect, and (3) the better agreement with observed municipality-level rates after aggregation, in particular when the original rate was zero.
Spatiotemporal Optimization for the Placement of Automated External Defibrillators Using Mobile Phone Data
With over 350,000 cases occurring each year, out-of-hospital cardiac arrest (OHCA) remains a severe public health concern in the United States. The correct and timely use of automated external defibrillators (AEDs) has been widely acknowledged as an effective measure to improve the survival rate of OHCA. While general guidelines have been provided by the American Heart Association (AHA) for AED deployment, the lack of detailed instructions hindered the adoption of such guidelines under dynamic scenarios with various time and space distributions. Formulating the AED deployment as a location optimization problem under budget and resource constraints, we proposed an overlayed spatio-temporal optimization (OSTO) method, which accounted for the spatiotemporal heterogeneity of potential OHCAs. To highlight the effectiveness of the proposed model, we applied the proposed method to Washington DC using user-generated anonymized mobile device location data. The results demonstrated that optimization-based planning provided an improved AED coverage level. We further evaluated the effectiveness of adding additional AEDs by analyzing the cost-coverage increment curve. In general, our framework provides a systematic approach for municipalities to integrate inclusive planning and budget-limited efficiency into their final decision-making. Given the high practicality and adaptability of the framework, the OSTO is highly amenable to different healthcare facilities' deployment tasks with flexible demand and resource restraints.
Spatial prediction of COVID-19 pandemic dynamics in the United States
The impact of COVID-19 across the United States has been heterogeneous, with rapid spread and greater mortality in some areas compared with others. We used geographically-linked data to test the hypothesis that the risk for COVID-19 is defined by location and sought to define which demographic features are most closely associated with elevated COVID-19 spread and mortality. We leveraged geographically-restricted social, economic, political, and demographic information from US counties, to develop a computational framework using structured Gaussian processing to predict county-level case and death counts during the pandemic's initial and nationwide phases. After identifying the most predictive information sources by location, we applied an unsupervised clustering algorithm and topic modelling to identify groups of features most closely associated with COVID-19 spread. Our model successfully predicted COVID-19 case counts of unseen locations, after examining case counts and demographic information of neighboring locations, with overall Pearson's correlation coefficient and the proportion of variance explained of 0.96 and 0.84 during the initial phase and 0.95 and 0.87, respectively, during the nationwide phase. Aside from population metrics, presidential vote margin was the most consistently selected spatial feature in our COVID-19 prediction models. Urbanicity and 2020 presidential vote margins were more predictive than other demographic features. Models trained using death counts showed similar performance metrics. Topic modeling showed that counties with similar socioeconomic and demographic features tended to group together, and some of these grouped feature sets were associated with COVID-19 dynamics. Clustering of counties based on these feature groups found by topic modeling revealed groups of counties that experienced markedly different COVID-19 spread. We conclude that topic modeling can be used to group similar features and identify counties with similar features in epidemiologic research.
Rurality and Origin-Destination Trajectories of Medical School Application and Matriculation in the United States
Physician shortages are more pronounced in rural than in urban areas. The geography of medical school application and matriculation could provide insights into geographic differences in physician availability. Using data from the Association of American Medical Colleges (AAMC), we conducted geospatial analyses, and developed origin-destination (O-D) trajectories and conceptual graphs to understand the root cause of rural physician shortages. Geographic disparities exist at a significant level in medical school applications in the US. The total number of medical school applications increased by 38% from 2001 to 2015, but the number had decreased by 2% in completely rural counties. Most counties with no medical school applicants were in rural areas (88%). Rurality had a significant negative association with the application rate and explained 15.3% of the variation at the county level. The number of medical school applications in a county was disproportional to the population by rurality. Applicants from completely rural counties (2% of the US population) represented less than 1% of the total medical school applications. Our results can inform recruitment strategies for new medical school students, elucidate location decisions of new medical schools, provide recommendations to close the rural-urban gap in medical school applications, and reduce physician shortages in rural areas.
LionVu 2.0 Usability Assessment for Pennsylvania, United States
The Penn State Cancer Initiative implemented LionVu 1.0 (Penn State University, United States) in 2017 as a web-based mapping tool to educate and inform public health professionals about the cancer burden in Pennsylvania and 28 counties in central Pennsylvania, locally known as the catchment area. The purpose of its improvement, LionVu 2.0, was to assist investigators answer person-place-time questions related to cancer and its risk factors by examining several data variables simultaneously. The primary objective of this study was to conduct a usability assessment of a prototype of LionVu 2.0 which included area- and point-based data. The assessment was conducted through an online survey; 10 individuals, most of whom had a masters or doctorate degree, completed the survey. Although most participants had a favorable view of LionVu 2.0, many had little to no experience with web mapping. Therefore, it was not surprising to learn that participants wanted short 10-15-minute training videos to be available with future releases, and a simplified user-interface that removes advanced functionality. One unexpected finding was the suggestion of using LionVu 2.0 for teaching and grant proposals. The usability study of the prototype of LionVu 2.0 provided important feedback for its future development.
Assessing the Reliability of Relevant Tweets and Validation Using Manual and Automatic Approaches for Flood Risk Communication
While Twitter has been touted as a preeminent source of up-to-date information on hazard events, the reliability of tweets is still a concern. Our previous publication extracted relevant tweets containing information about the 2013 Colorado flood event and its impacts. Using the relevant tweets, this research further examined the reliability (accuracy and trueness) of the tweets by examining the text and image content and comparing them to other publicly available data sources. Both manual identification of text information and automated (Google Cloud Vision, application programming interface (API)) extraction of images were implemented to balance accurate information verification and efficient processing time. The results showed that both the text and images contained useful information about damaged/flooded roads/streets. This information will help emergency response coordination efforts and informed allocation of resources when enough tweets contain geocoordinates or location/venue names. This research will identify reliable crowdsourced risk information to facilitate near real-time emergency response through better use of crowdsourced risk communication platforms.
Using Open Source Data to Identify Transit Deserts in Four Major Chinese Cities
The concept of transit deserts stems from the concept of food deserts. There is substantial research on transit deserts in developed countries. However, there is no known research that has studied this subject in Chinese cities. Using open-source data, this paper identified transit desert areas in four major Chinese cities (Beijing, Shanghai, Wuhan, Chengdu). The results show that: (1) In these four cities, the transit desert areas are mainly concentrated in city centers and hardly occur in any suburban areas, which is very different from the cases in the US. (2) Shanghai has the largest transit-dependent population living in transit deserts, followed by Beijing, Chengdu, and Wuhan. Chengdu has the smallest transit desert areas, followed by Shanghai, Wuhan, and Beijing. (3) An oversized transit-dependent population and incomplete transit systems in these cities might contribute to the transit deserts' occurrences. (4) Different distribution of population density, traveling preference, and transportation investment policy in Chinese and American cities might contribute to the different findings. By examining transit desert problems in major Chinese cities, this study brought people's attention to the gap between transit demand and supply in China.
Understanding the Shared E-scooter Travels in Austin, TX
This paper investigated the travel patterns of 1.7 million shared E-scooter trips from April 2018 to February 2019 in Austin, TX. There were more than 6000 active E-scooters in operation each month, generating over 150,000 trips and covered approximately 117,000 miles. During this period, the average travel distance and operation time of E-scooter trips were 0.77 miles and 7.55 min, respectively. We further identified two E-scooter usage hotspots in the city (Downtown Austin and the University of Texas campus). The spatial analysis showed that more trips originated from Downtown Austin than were completed, while the opposite was true for the UT campus. We also investigated the relationship between the number of E-scooter trips and the surrounding environments. The results show that areas with higher population density and more residents with higher education were correlated with more E-scooter trips. A shorter distance to the city center, the presence of transit stations, better street connectivity, and more compact land use were also associated with increased E scooter usage in Austin, TX. Surprisingly, the proportion of young residents within a neighbourhood was negatively correlated with E-scooter usage.
Map Archive Mining: Visual-Analytical Approaches to Explore Large Historical Map Collections
Historical maps are unique sources of retrospective geographical information. Recently, several map archives containing map series covering large spatial and temporal extents have been systematically scanned and made available to the public. The geographical information contained in such data archives makes it possible to extend geospatial analysis retrospectively beyond the era of digital cartography. However, given the large data volumes of such archives (e.g., more than 200,000 map sheets in the United States Geological Survey topographic map archive) and the low graphical quality of older, manually-produced map sheets, the process to extract geographical information from these map archives needs to be automated to the highest degree possible. To understand the potential challenges (e.g., salient map characteristics and data quality variations) in automating large-scale information extraction tasks for map archives, it is useful to efficiently assess spatio-temporal coverage, approximate map content, and spatial accuracy of georeferenced map sheets at different map scales. Such preliminary analytical steps are often neglected or ignored in the map processing literature but represent critical phases that lay the foundation for any subsequent computational processes including recognition. Exemplified for the United States Geological Survey topographic map and the Sanborn fire insurance map archives, we demonstrate how such preliminary analyses can be systematically conducted using traditional analytical and cartographic techniques, as well as visual-analytical data mining tools originating from machine learning and data science.
Canadian Forest Fires and the Effects of Long-Range Transboundary Air Pollution on Hospitalizations among the Elderly
In July 2002, lightning strikes ignited over 250 fires in Quebec, Canada, destroying over one million hectares of forest. The smoke plume generated from the fires had a major impact on air quality across the east coast of the U.S. Using data from the Medicare National Claims History File and the U.S. Environmental Protection Agency (EPA) National air pollution monitoring network, we evaluated the health impact of smoke exposure on 5.9 million elderly people (ages 65+) in the Medicare population in 81 counties in 11 northeastern and Mid-Atlantic States of the US. We estimated differences in the exposure to ambient PM-airborne particulate matter with aerodynamic diameter of ≤2.5 μm-concentrations and hospitalizations for cardiovascular, pulmonary and injury outcomes, before and during the smoke episode. We found that there was an associated 49.6% (95% confidence interval (CI), 29.8, 72.3) and 64.9% (95% CI, 44.3-88.5) increase rate of hospitalization for respiratory and cardiovascular diagnoses, respectively, when the smoke plume was present compared to before the smoke plume had arrived. Our study suggests that rapid increases in PM concentrations resulting from wildfire smoke can impact the health of elderly populations thousands of kilometers removed from the fires.
Spatially-Explicit Simulation Modeling of Ecological Response to Climate Change: Methodological Considerations in Predicting Shifting Population Dynamics of Infectious Disease Vectors
Poikilothermic disease vectors can respond to altered climates through spatial changes in both population size and phenology. Quantitative descriptors to characterize, analyze and visualize these dynamic responses are lacking, particularly across large spatial domains. In order to demonstrate the value of a spatially explicit, dynamic modeling approach, we assessed spatial changes in the population dynamics of , the Lyme disease vector, using a temperature-forced population model simulated across a grid of 4 × 4 km cells covering the eastern United States, using both modeled (Weather Research and Forecasting (WRF) 3.2.1) baseline/current (2001-2004) and projected (Representative Concentration Pathway (RCP) 4.5 and RCP 8.5; 2057-2059) climate data. Ten dynamic population features (DPFs) were derived from simulated populations and analyzed spatially to characterize the regional population response to current and future climate across the domain. Each DPF under the current climate was assessed for its ability to discriminate observed Lyme disease risk and known vector presence/absence, using data from the US Centers for Disease Control and Prevention. Peak vector population and month of peak vector population were the DPFs that performed best as predictors of current Lyme disease risk. When examined under baseline and projected climate scenarios, the spatial and temporal distributions of DPFs shift and the seasonal cycle of key questing life stages is compressed under some scenarios. Our results demonstrate the utility of spatial characterization, analysis and visualization of dynamic population responses-including altered phenology-of disease vectors to altered climate.
