Spatial Statistics

A Hypothesis Test for Detecting Spatial Patterns in Categorical Areal Data
Self S, Zhao X, Zgodic A, Overby A, White D, McLain AC and Dyckman C
The vast growth of spatial datasets in recent decades has fueled the development of many statistical methods for detecting spatial patterns. Two of the most commonly studied spatial patterns are clustering, loosely defined as datapoints with similar attributes existing close together, and dispersion, loosely defined as the semi-regular placement of datapoints with similar attributes. In this work, we develop a hypothesis test to detect spatial clustering or dispersion at specific distances in categorical areal data. Such data consists of a set of spatial regions whose boundaries are fixed and known (e.g., counties) associated with a categorical random variable (e.g. whether the county is rural, micropolitan, or metropolitan). We propose a method to extend the positive area proportion function (developed for detecting spatial clustering in binary areal data) to the categorical case. This proposal, referred to as the categorical positive areal proportion function test, can detect various spatial patterns, including homogeneous clusters, heterogeneous clusters, and dispersion. Our approach is the first method capable of distinguishing between different types of clustering in categorical areal data. After validating our method using an extensive simulation study, we use the categorical positive area proportion function test to detect spatial patterns in Boulder County, Colorado USA biological, agricultural, built and open conservation easements.
Exploring heterogeneity and dynamics of meteorological influences on US PM: A distributed learning approach with spatiotemporal varying coefficient models
Wang L, Wang G and Gao AS
Particulate matter (PM) has emerged as a primary air quality concern due to its substantial impact on human health. Many recent research works suggest that PM concentrations depend on meteorological conditions. Enhancing current pollution control strategies necessitates a more holistic comprehension of PM dynamics and the precise quantification of spatiotemporal heterogeneity in the relationship between meteorological factors and PM levels. The spatiotemporal varying coefficient model stands as a prominent spatial regression technique adept at addressing this heterogeneity. Amidst the challenges posed by the substantial scale of modern spatiotemporal datasets, we propose a pioneering distributed estimation method (DEM) founded on multivariate spline smoothing across a domain's triangulation. This DEM algorithm ensures an easily implementable, highly scalable, and communication-efficient strategy, demonstrating almost linear speedup potential. We validate the effectiveness of our proposed DEM through extensive simulation studies, demonstrating that it achieves coefficient estimations akin to those of global estimators derived from complete datasets. Applying the proposed model and method to the US daily PM and meteorological data, we investigate the influence of meteorological variables on PM concentrations, revealing both spatial and seasonal variations in this relationship.
Modeling lake conductivity in the contiguous United States using spatial indexing for big spatial data
Dumelle M, Ver Hoef JM, Handler A, Hill RA, Higham M and Olsen AR
Conductivity is an important indicator of the health of aquatic ecosystems. We model large amounts of lake conductivity data collected as part of the United States Environmental Protection Agency's National Lakes Assessment using spatial indexing, a flexible and efficient approach to fitting spatial statistical models to big data sets. Spatial indexing is capable of accommodating various spatial covariance structures as well as features like random effects, geometric anisotropy, partition factors, and non-Euclidean topologies. We use spatial indexing to compare lake conductivity models and show that calcium oxide rock content, crop production, human development, precipitation, and temperature are strongly related to lake conductivity. We use this model to predict lake conductivity at hundreds of thousands of lakes distributed throughout the contiguous United States. We find that lake conductivity models fit using spatial indexing are nearly identical to lake conductivity models fit using traditional methods but are nearly 50 times faster (sample size 3,311). Spatial indexing is readily available in the spmodel package.
Analyzing COVID-19 data in the Canadian province of Manitoba: A new approach
Amiri L, Torabi M and Deardon R
The basic homogeneous SEIR (susceptible-exposed-infected-removed) model is a commonly used compartmental model for analysing infectious diseases such as influenza and COVID-19. However, in the homogeneous SEIR model, it is assumed that the population of study is homogeneous and, one cannot incorporate individual-level information (e.g., location of infected people, distance between susceptible and infected individuals, vaccination status) which may be important in predicting new disease cases. Recently, a geographically-dependent individual-level model (GD-ILM) within an SEIR framework was developed for when both regional and individual-level spatial data are available. In this paper, we propose to use an SEIR GD-ILM for each health region of Manitoba (central Canadian province) population to analyse the COVID-19 data. As different health regions of the population under study may act differently, we assume that each health region has its own corresponding parameters determined by a homogeneous SEIR model (such as contact rate, latent period, infectious period). A Monte Carlo Expectation Conditional Maximization (MCECM) algorithm is used for inference. Using estimated parameters we predict the infection rate at each health region of Manitoba over time to identify highly risk local geographical areas. Performance of the proposed approach is also evaluated through simulation studies.
A Hypothesis Test for Detecting Distance-Specific Clustering and Dispersion in Areal Data
Self S, Overby A, Zgodic A, White D, McLain A and Dyckman C
Spatial clustering detection has a variety of applications in diverse fields, including identifying infectious disease outbreaks, pinpointing crime hotspots, and identifying clusters of neurons in brain imaging applications. Ripley's K-function is a popular method for detecting clustering (or dispersion) in point process data at specific distances. Ripley's K-function measures the expected number of points within a given distance of any observed point. Clustering can be assessed by comparing the observed value of Ripley's K-function to the expected value under complete spatial randomness. While performing spatial clustering analysis on point process data is common, applications to areal data commonly arise and need to be accurately assessed. Inspired by Ripley's K-function, we develop the and use it to develop a hypothesis testing procedure for the detection of spatial clustering and dispersion at specific distances in areal data. We compare the performance of the proposed PAPF hypothesis test to that of the global Moran's I statistic, the Getis-Ord general G statistic, and the spatial scan statistic with extensive simulation studies. We then evaluate the real-world performance of our method by using it to detect spatial clustering in land parcels containing conservation easements and US counties with high pediatric overweight/obesity rates.
Modeling geostatistical incomplete spatially correlated survival data with applications to COVID-19 mortality in Ghana
Allotey PA and Harel O
Survival models which incorporate frailties are common in time-to-event data collected over distinct spatial regions. While incomplete data are unavoidable and a common complication in statistical analysis of spatial survival research, most researchers still ignore the missing data problem. In this paper, we propose a geostatistical modeling approach for incomplete spatially correlated survival data. We achieve this by exploring missingness in outcome, covariates, and spatial locations. In the process, we analyze incomplete spatially-referenced survival data using a Weibull model for the baseline hazard function and correlated log-Gaussian frailties to model spatial correlation. We illustrate the proposed method with simulated data and an application to geo-referenced COVID-19 data from Ghana. There are several disagreements between parameter estimates and credible intervals widths obtained using our proposed approach and complete case analysis. Based on these findings, we argue that our approach provides more reliable parameter estimates and has higher predictive accuracy.
Adaptive Gaussian Markov random field spatiotemporal models for infectious disease mapping and forecasting
MacNab YC
Recent disease mapping literature presents adaptively parameterized spatiotemporal (ST) autoregressive (AR) or conditional autoregressive (CAR) models for Bayesian prediction of COVID-19 infection risks. These models were motivated to capture complex spatiotemporal dynamics and heterogeneities of infection risks. In the present paper, we synthesize, generalize, and unify the ST AR and CAR model constructions for models augmented by adaptive Gaussian Markov random fields, with an emphasis on disease forecasting. A general convolution construction is presented, with illustrative models motivated to (i) characterize local risk dependencies and influences over both spatial and temporal dimensions, (ii) model risk heterogeneities and discontinuities, and (iii) predict and forecast areal-level disease risks and occurrences. The broadened constructions allow rich options of intuitive parameterization for disease mapping and spatial regression. Illustrative parameterizations are presented for Bayesian hierarchical models of Poisson, zero-inflated Poisson, and Bernoulli data models, respectively. They are also discussed in the context of quantifying time-varying or time-invariant effects of (omitted) covariates, with application to prediction and forecasting areal-level COVID-19 infection occurrences and probabilities of zero-infection. The model constructions presented herein have much wider scope in offering a flexible framework for modelling complex spatiotemporal data and for estimation, learning, and forecasting purposes.
Spatial aggregation with respect to a population distribution: Impact on inference
Paige J, Fuglstad GA, Riebler A and Wakefield J
Spatial aggregation with respect to a population distribution involves estimating aggregate population quantities based on observations from individuals. In this context, a geostatistical workflow must account for three major sources of : aggregation weights, fine scale variation, and finite population variation. However, these sources of aggregation error are commonly ignored, and the population instead treated as a fixed population density surface. We improve common practice by introducing a allowing aggregation models to account for aggregation error simply and transparently. This preserves aggregate point estimates while increasing their uncertainties. We compare the proposed and the traditional approach using two simulation studies mimicking neonatal mortality rate (NMR) data from the 2014 Kenya Demographic and Health Survey. In the traditional approach, undercoverage/overcoverage of interval estimates depends arbitrarily on the aggregation grid resolution, while the new approach is resolution robust. Differences between the aggregation approaches increase as an area's population decreases, and are particularly large at the second administrative level and finer, but also at the first administrative level for some population quantities. These findings are consistent with those of an application to the true NMR data. We demonstrate in a sensitivity analysis that burden estimates and their uncertainties are not robust to changes in population density and census information, while prevalence estimates and uncertainties seem stable.
Estimating intervention effects on infectious disease control: The effect of community mobility reduction on Coronavirus spread
Giffin A, Gong W, Majumder S, Rappold AG, Reich BJ and Yang S
Understanding the effects of interventions, such as restrictions on community and large group gatherings, is critical to controlling the spread of COVID-19. Susceptible-Infectious-Recovered (SIR) models are traditionally used to forecast the infection rates but do not provide insights into the causal effects of interventions. We propose a spatiotemporal model that estimates the causal effect of changes in community mobility (intervention) on infection rates. Using an approximation to the SIR model and incorporating spatiotemporal dependence, the proposed model estimates a direct and indirect (spillover) effect of intervention. Under an interference and treatment ignorability assumption, this model is able to estimate causal intervention effects, and additionally allows for spatial interference between locations. Reductions in community mobility were measured by cell phone movement data. The results suggest that the reductions in mobility decrease Coronavirus cases 4 to 7 weeks after the intervention.
Bayesian negative binomial regression with spatially varying dispersion: Modeling COVID-19 incidence in Georgia
Mutiso F, Pearce JL, Benjamin-Neelon SE, Mueller NT, Li H and Neelon B
Overdispersed count data arise commonly in disease mapping and infectious disease studies. Typically, the level of overdispersion is assumed to be constant over time and space. In some applications, however, this assumption is violated, and in such cases, it is necessary to model the dispersion as a function of time and space in order to obtain valid inferences. Motivated by a study examining spatiotemporal patterns in COVID-19 incidence, we develop a Bayesian negative binomial model that accounts for heterogeneity in both the incidence rate and degree of overdispersion. To fully capture the heterogeneity in the data, we introduce region-level covariates, smooth temporal effects, and spatially correlated random effects in both the mean and dispersion components of the model. The random effects are assigned bivariate intrinsic conditionally autoregressive priors that promote spatial smoothing and permit the model components to borrow information, which is appealing when the mean and dispersion are spatially correlated. Through simulation studies, we show that ignoring heterogeneity in the dispersion can lead to biased and imprecise estimates. For estimation, we adopt a Bayesian approach that combines full-conditional Gibbs sampling and Metropolis-Hastings steps. We apply the model to a study of COVID-19 incidence in the state of Georgia, USA from March 15 to December 31, 2020.
Combining school-catchment area models with geostatistical models for analysing school survey data from low-resource settings: Inferential benefits and limitations
Macharia PM, Ray N, Gitonga CW, Snow RW and Giorgi E
School-based sampling has been used to inform targeted responses for malaria and neglected tropical diseases. Standard geostatistical methods for mapping disease prevalence use the school location to model spatial correlation, which is questionable since exposure to the disease is more likely to occur in the residential location. In this paper, we propose to overcome the limitations of standard geostatistical methods by introducing a modelling framework that accounts for the uncertainty in the location of the residence of the students. By using cost distance and cost allocation models to define spatial accessibility and in absence of any information on the travel mode of students to school, we consider three school catchment area models that assume walking only, walking and bicycling and, walking and motorized transport. We illustrate the use of this approach using two case studies of malaria in Kenya and compare it with the standard approach that uses the school locations to build geostatistical models. We argue that the proposed modelling framework presents several inferential benefits, such as the ability to combine data from multiple surveys some of which may also record the residence location, and to deal with ecological bias when estimating the effects of malaria risk factors. However, our results show that invalid assumptions on the modes of travel to school can worsen the predictive performance of geostatistical models. Future research in this area should focus on collecting information on the modes of transportation to school which can then be used to better parametrize the catchment area models.
Modeling infectious disease dynamics: Integrating contact tracing-based stochastic compartment and spatio-temporal risk models
Mahmood M, Amaral AVR, Mateu J and Moraga P
Major infectious diseases such as COVID-19 have a significant impact on population lives and put enormous pressure on healthcare systems globally. Strong interventions, such as lockdowns and social distancing measures, imposed to prevent these diseases from spreading, may also negatively impact society, leading to jobs losses, mental health problems, and increased inequalities, making crucial the prioritization of riskier areas when applying these protocols. The modeling of mobility data derived from contact-tracing data can be used to forecast infectious trajectories and help design strategies for prevention and control. In this work, we propose a new spatial-stochastic model that allows us to characterize the temporally varying spatial risk better than existing methods. We demonstrate the use of the proposed model by simulating an epidemic in the city of Valencia, Spain, and comparing it with a contact tracing-based stochastic compartment reference model. The results show that, by accounting for the spatial risk values in the model, the peak of infected individuals, as well as the overall number of infected cases, are reduced. Therefore, adding a spatial risk component into compartment models may give finer control over the epidemic dynamics, which might help the people in charge to make better decisions.
Variograms for kriging and clustering of spatial functional data with phase variation
Guo X, Kurtek S and Bharath K
Spatial, amplitude and phase variations in spatial functional data are confounded. Conclusions from the popular functional trace-variogram, which quantifies spatial variation, can be misleading when analyzing misaligned functional data with phase variation. To remedy this, we describe a framework that extends amplitude-phase separation methods in functional data to the spatial setting, with a view towards performing clustering and spatial prediction. We propose a decomposition of the trace-variogram into amplitude and phase components, and quantify how spatial correlations between functional observations manifest in their respective amplitude and phase. This enables us to generate separate amplitude and phase clustering methods for spatial functional data, and develop a novel spatial functional interpolant at unobserved locations based on combining separate amplitude and phase predictions. Through simulations and real data analyses, we demonstrate advantages of our approach when compared to standard ones that ignore phase variation, through more accurate predictions and more interpretable clustering results.
Application of Bayesian spatial-temporal models for estimating unrecognized COVID-19 deaths in the United States
Zhang Y, Chang HH, Iuliano AD and Reed C
In the United States, COVID-19 has become a leading cause of death since 2020. However, the number of COVID-19 deaths reported from death certificates is likely to represent an underestimate of the total deaths related to SARS-CoV-2 infections. Estimating those deaths not captured through death certificates is important to understanding the full burden of COVID-19 on mortality. In this work, we explored enhancements to an existing approach by employing Bayesian hierarchical models to estimate unrecognized deaths attributed to COVID-19 using weekly state-level COVID-19 viral surveillance and mortality data in the United States from March 2020 to April 2021. We demonstrated our model using those aged years who died. First, we used a spatial-temporal binomial regression model to estimate the percent of positive SARS-CoV-2 test results. A spatial-temporal negative-binomial model was then used to estimate unrecognized COVID-19 deaths by exploiting the spatial-temporal association between SARS-CoV-2 percent positive and all-cause mortality counts using an excess mortality approach. Computationally efficient Bayesian inference was accomplished via the Polya-Gamma representation of the binomial and negative-binomial models. Among those aged years, we estimated 58,200 (95% CI: 51,300, 64,900) unrecognized COVID-19 deaths, which accounts for 26% (95% CI: 24%, 29%) of total COVID-19 deaths in this age group. Our modeling results suggest that COVID-19 mortality and the proportion of unrecognized deaths among deaths attributed to COVID-19 vary by time and across states.
Bayesian disease mapping: Past, present, and future
MacNab YC
On the occasion of the Spatial Statistics' 10th Anniversary, I reflect on the past and present of Bayesian disease mapping and look into its future. I focus on some key developments of models, and on recent evolution of multivariate and adaptive Gaussian Markov random fields and their impact and importance in disease mapping. I reflect on Bayesian disease mapping as a subject of spatial statistics that has advanced to date, and continues to grow, in scope and complexity alongside increasing needs of analytic tools for contemporary health science research, such as spatial epidemiology, population and public health, and medicine. I illustrate (potential) utility and impact of some of the disease mapping models and methods for analysing and monitoring communicable disease such as the COVID-19 infection risks during an ongoing pandemic.
Spatial clustering behaviour of Covid-19 conditioned by the development level: Case study for the administrative units in Romania
Cioban S and Mare C
Spatial analyses related to Covid-19 have been so far conducted at county, regional or national level, without a thorough assessment at the continuous local level of administrative-territorial units like cities, towns, or communes. To address this gap, we employ daily data on the infection rate provided for Romanian administrative units from March to May 2021. Using the global and local Moran I spatial autocorrelation coefficients, we identify significant clustering processes in the Covid-19 infection rate. Additional analysis based on spatially smoothed rate maps and spatial regressions prove that this clustering pattern is influenced by the development level of localities, proxied by unemployment rate and Local Human Development Index. Results show the features of the 3rd wave in Romania, characterized by a quadratic trend.
Editorial: Spatio-temporal dynamics of Covid
D'Urso P, Sahu S and Stein A
Community mobility in the European regions during COVID-19 pandemic: A partitioning around medoids with noise cluster based on space-time autoregressive models
D'Urso P, Mucciardi M, Otranto E and Vitale V
In this paper we propose a robust fuzzy clustering model, the STAR-based Fuzzy C-Medoids Clustering model with Noise Cluster, to define territorial partitions of the European regions (NUTS2) according to the workplaces mobility trends for places of work provided by Google with reference to the whole COVID-19 pandemic period. The clustering model takes into account both temporal and spatial information by means of the autoregressive temporal and spatial coefficients of the STAR model. The proposed clustering model through the noise cluster is capable of neutralizing the negative effects of noisy data. The main empirical results regard the expected direct relationship between the Community mobility trend and the lockdown periods, and a clear spatial interaction effect among neighboring regions.
Spatio-temporal modelling of COVID-19 incident cases using Richards' curve: An application to the Italian regions
Mingione M, Alaimo Di Loro P, Farcomeni A, Divino F, Lovison G, Maruotti A and Lasinio GJ
We introduce an extended generalised logistic growth model for discrete outcomes, in which spatial and temporal dependence are dealt with the specification of a network structure within an Auto-Regressive approach. A major challenge concerns the specification of the network structure, crucial to consistently estimate the canonical parameters of the generalised logistic curve, e.g. peak time and height. We compared a network based on geographic proximity and one built on historical data of transport exchanges between regions. Parameters are estimated under the Bayesian framework, using Stan probabilistic programming language. The proposed approach is motivated by the analysis of both the first and the second wave of COVID-19 in Italy, i.e. from February 2020 to July 2020 and from July 2020 to December 2020, respectively. We analyse data at the regional level and, interestingly enough, prove that substantial spatial and temporal dependence occurred in both waves, although strong restrictive measures were implemented during the first wave. Accurate predictions are obtained, improving those of the model where independence across regions is assumed.
A D-vine copula-based quantile regression model with spatial dependence for COVID-19 infection rate in Italy
D'Urso P, De Giovanni L and Vitale V
The main determinants of COVID-19 spread in Italy are investigated, in this work, by means of a D-vine copula based quantile regression. The outcome is the COVID-19 cumulative infection rate registered on October 30th 2020, with reference to the 107 Italian provinces, and it is regressed on some covariates of interest accounting for medical, environmental and demographic factors. To deal with the issue of spatial autocorrelation, the D-vine copula based quantile regression also embeds a spatial autoregressive component that controls for the extent of spatial dependence. The use of vine copula enhances model flexibility accounting for non-linear relationships and tail dependencies. Moreover, the model selection procedure leads to parsimonious models providing a rank of covariates based on their explanatory power with respect to the outcome.
Editorial: Spatio-temporal dynamics of Covid
Stein A and Gelfand A
The Publisher regrets that this article is an accidental duplication of an article that has already been published, http://dx.doi.org/10.1016/j.spasta.2021.100588. The duplicate article has therefore been withdrawn. The full Elsevier Policy on Article Withdrawal can be found at https://www.elsevier.com/about/our-business/policies/article-withdrawal.