Assessing Tuning Parameter Selection Variability in Penalized Regression
Penalized regression methods that perform simultaneous model selection and estimation are ubiquitous in statistical modeling. The use of such methods is often unavoidable as manual inspection of all possible models quickly becomes intractable when there are more than a handful of predictors. However, automated methods usually fail to incorporate domain-knowledge, exploratory analyses, or other factors that might guide a more interactive model-building approach. A hybrid approach is to use penalized regression to identify a set of candidate models and then to use interactive model-building to examine this candidate set more closely. To identify a set of candidate models, we derive point and interval estimators of the probability that each model along a solution path will minimize a given model selection criterion, for example, Akaike information criterion, Bayesian information criterion (AIC, BIC), etc., conditional on the observed solution path. Then models with a high probability of selection are considered for further examination. Thus, the proposed methodology attempts to strike a balance between algorithmic modeling approaches that are computationally efficient but fail to incorporate expert knowledge, and interactive modeling approaches that are labor intensive but informed by experience, intuition, and domain knowledge. Supplementary materials for this article are available online.
Convex Bidirectional Large Margin Classifiers
Classification problems are commonly seen in practice. In this paper, we aim to develop classifiers that can enjoy great interpretability as linear classifiers, and at the same time have model flexibility as nonlinear classifiers. We propose convex bidirectional large margin classifiers to fill the gap between linear and general nonlinear classifiers for high dimensional data. Our method provides a new data visualization tool for classification of high dimensional data. The obtained bilinear projection structure makes the proposed classifier very interpretable. Additional shrinkage to approximate variable selection is also considered. Through analysis of simulated and real data in high dimensional settings, our method is shown to have superior prediction performance and interpretability when there are potential subpopulations in the data. The computer code of the proposed method is available as supplemental materials.
Model-Based Clustering of Nonparametric Weighted Networks with Application to Water Pollution Analysis
Water pollution is a major global environmental problem, and it poses a great environmental risk to public health and biological diversity. This work is motivated by assessing the potential environmental threat of coal mining through increased sulfate concentrations in river networks, which do not belong to any simple parametric distribution. However, existing network models mainly focus on binary or discrete networks and weighted networks with known parametric weight distributions. We propose a principled nonparametric weighted network model based on exponential-family random graph models and local likelihood estimation, and study its model-based clustering with application to large-scale water pollution network analysis. We do not require any parametric distribution assumption on network weights. The proposed method greatly extends the methodology and applicability of statistical network models. Furthermore, it is scalable to large and complex networks in large-scale environmental studies. The power of our proposed methods is demonstrated in simulation studies and a real application to sulfate pollution network analysis in Ohio watershed located in Pennsylvania, United States.
Spatial Signal Detection Using Continuous Shrinkage Priors
Motivated by the problem of detecting changes in two-dimensional X-ray diffraction data, we propose a Bayesian spatial model for sparse signal detection in image data. Our model places considerable mass near zero and has heavy tails to reflect the prior belief that the image signal is zero for most pixels and large for an important subset. We show that the spatial prior places mass on nearby locations simultaneously being zero, and also allows for nearby locations to simultaneously be large signals. The form of the prior also facilitates efficient computing for large images. We conduct a simulation study to evaluate the properties of the proposed prior and show that it outperforms other spatial models. We apply our method in the analysis of X-ray diffraction data from a two-dimensional area detector to detect changes in the pattern when the material is exposed to an electric field.
Online Updating of Statistical Inference in the Big Data Setting
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Robust Low-rank Tensor Decomposition with the Criterion
The growing prevalence of tensor data, or multiway arrays, in science and engineering applications motivates the need for tensor decompositions that are robust against outliers. In this paper, we present a robust Tucker decomposition estimator based on the criterion, called the Tucker-. Our numerical experiments demonstrate that Tucker- has empirically stronger recovery performance in more challenging high-rank scenarios compared with existing alternatives. The appropriate Tucker-rank can be selected in a data-driven manner with cross-validation or hold-out validation. The practical effectiveness of Tucker- is validated on real data applications in fMRI tensor denoising, PARAFAC analysis of fluorescence data, and feature extraction for classification of corrupted images.
Comment: Feature Screening and Variable Selection via Iterative Ridge Regression
Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations becomes large. There is a burgeoning literature on approaches for analyzing large spatial datasets. In this article, we propose a divide-and-conquer strategy within the Bayesian paradigm. We partition the data into subsets, analyze each subset using a Bayesian spatial process model and then obtain approximate posterior inference for the entire dataset by combining the individual posterior distributions from each subset. Importantly, as often desired in spatial analysis, we offer full posterior predictive inference at arbitrary locations for the outcome as well as the residual spatial surface after accounting for spatially oriented predictors. We call this approach "Spatial Meta-Kriging" (SMK). We do not need to store the entire data in one processor, and this leads to superior scalability. We demonstrate SMK with various spatial regression models including Gaussian processes and tapered Gaussian processes. The approach is intuitive, easy to implement, and is supported by theoretical results presented in the supplementary material available online. Empirical illustrations are provided using different simulation experiments and a geostatistical analysis of Pacific Ocean sea surface temperature data.
Anomaly Detection in Large-Scale Networks With Latent Space Models
We develop a real-time anomaly detection method for directed activity on large, sparse networks. We model the propensity for future activity using a dynamic logistic model with interaction terms for sender- and receiver-specific latent factors in addition to sender- and receiver-specific popularity scores; deviations from this underlying model constitute potential anomalies. Latent nodal attributes are estimated via a variational Bayesian approach and may change over time, representing natural shifts in network activity. Estimation is augmented with a case-control approximation to take advantage of the sparsity of the network and reduces computational complexity from ( ) to (), where is the number of nodes and is the number of observed edges. We run our algorithm on network event records collected from an enterprise network of over 25,000 computers and are able to identify a red team attack with half the detection rate required of the model without latent interaction terms.
Adaptive Partially Observed Sequential Change Detection and Isolation
High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work focuses on the online adaptive monitoring of high-dimensional data in resource-constrained environments with multiple potential failure modes. To achieve this, we propose to apply the Shiryaev-Roberts procedure on the failure mode level and utilize the multi-arm bandit to balance the exploration and exploitation. We further discuss the theoretical property of the proposed algorithm to show that the proposed method can correctly isolate the failure mode. Finally, extensive simulations and two case studies demonstrate that the change point detection performance and the failure mode isolation accuracy can be greatly improved.
High-dimensional Cost-constrained Regression via Nonconvex Optimization
Budget constraints become an important consideration in modern predictive modeling due to the high cost of collecting certain predictors. This motivates us to develop cost-constrained predictive modeling methods. In this paper, we study a new high-dimensional cost-constrained linear regression problem, that is, we aim to find the cost-constrained regression model with the smallest expected prediction error among all models satisfying a budget constraint. The non-convex budget constraint makes this problem NP-hard. In order to estimate the regression coefficient vector of the cost-constrained regression model, we propose a new discrete first-order continuous optimization method. In particular, our method delivers a series of estimates of the regression coefficient vector by solving a sequence of 0-1 knapsack problems. Theoretically, we prove that the series of the estimates generated by our iterative algorithm converge to a first-order stationary point, which can be a globally optimal solution under some conditions. Furthermore, we study some extensions of our method that can be used for general statistical learning problems and problems with groups of variables. Numerical studies using simulated datasets and a real dataset from a diabetes study indicate that our proposed method can solve problems of fairly high dimensions with promising performance. Supplementary materials for this article are available online.
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control
In many real-world problems of real-time monitoring high-dimensional streaming data, one wants to detect an undesired event or change quickly once it occurs, but under the sampling control constraint in the sense that one might be able to only observe or use selected components data for decision-making per time step in the resource-constrained environments. In this paper, we propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm based on the limiting Bayesian approach to incorporate a prior knowledge of potential changes. Our proposed algorithm, termed Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration for acquiring long-term knowledge and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques. Extensive numerical simulations and case studies demonstrate the statistical and computational efficiency of our proposed TSSRP algorithm.
A Sharper Computational Tool for LE Regression
Building on previous research of Chi and Chi (2022), the current paper revisits estimation in robust structured regression under the LE criterion. We adopt the majorization-minimization (MM) principle to design a new algorithm for updating the vector of regression coefficients. Our sharp majorization achieves faster convergence than the previous alternating proximal gradient descent algorithm (Chi and Chi, 2022). In addition, we reparameterize the model by substituting precision for scale and estimate precision via a modified Newton's method. This simplifies and accelerates overall estimation. We also introduce distance-to-set penalties to enable constrained estimation under nonconvex constraint sets. This tactic also improves performance in coefficient estimation and structure recovery. Finally, we demonstrate the merits of our improved tactics through a rich set of simulation examples and a real data application.
A Unified Analysis of Structured Sonar-terrain Data using Bayesian Functional Mixed Models
Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. While existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this paper, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design.
Matrix Linear Discriminant Analysis
We propose a novel linear discriminant analysis (LDA) approach for the classification of high-dimensional matrix-valued data that commonly arises from imaging studies. Motivated by the equivalence of the conventional LDA and the ordinary least squares, we consider an efficient nuclear norm penalized regression that encourages a low-rank structure. Theoretical properties including a nonasymptotic risk bound and a rank consistency result are established. Simulation studies and an application to electroencephalography data show the superior performance of the proposed method over the existing approaches.
Permutation and Grouping Methods for Sharpening Gaussian Process Approximations
Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that the ordering can be tuned to sharpen the approximations. Indeed, the first part of the paper includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this paper is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.
Bayesian State Space Modeling of Physical Processes in Industrial Hygiene
Exposure assessment models are deterministic models derived from physical-chemical laws. In real workplace settings, chemical concentration measurements can be noisy and indirectly measured. In addition, inference on important parameters such as generation and ventilation rates are usually of interest since they are difficult to obtain. In this article, we outline a flexible Bayesian framework for parameter inference and exposure prediction. In particular, we devise Bayesian state space models by discretizing the differential equation models and incorporating information from observed measurements and expert prior knowledge. At each time point, a new measurement is available that contains some noise, so using the physical model and the available measurements, we try to obtain a more accurate state estimate, which can be called filtering. We consider Monte Carlo sampling methods for parameter estimation and inference under nonlinear and non-Gaussian assumptions. The performance of the different methods is studied on computer-simulated and controlled laboratory-generated data. We consider some commonly used exposure models representing different physical hypotheses. Supplementary materials for this article are available online.
A Geometric Approach to Archetypal Analysis and Nonnegative Matrix Factorization
Archetypal analysis and non-negative matrix factorization (NMF) are staples in a statisticians toolbox for dimension reduction and exploratory data analysis. We describe a geometric approach to both NMF and archetypal analysis by interpreting both problems as finding extreme points of the data cloud. We also develop and analyze an efficient approach to finding extreme points in high dimensions. For modern massive datasets that are too large to fit on a single machine and must be stored in a distributed setting, our approach makes only a small number of passes over the data. In fact, it is possible to obtain the NMF or perform archetypal analysis with just two passes over the data.
Ridge Regularization: An Essential Concept in Data Science
Ridge or more formally regularization shows up in many areas of statistics and machine learning. It is one of those essential devices that any good data scientist needs to master for their craft. In this brief , I have collected together some of the magic and beauty of ridge that my colleagues and I have encountered over the past 40 years in applied statistics.
Orthogonalizing EM: A design-based least squares algorithm
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when is much larger than . Supplementary materials for this article are available online.
Note on the Equivalence of Orthogonalizing EM and Proximal Gradient Descent
Xiong et al. (2016) develop a method called orthogonalizing EM (OEM) to solve penalized regression problems for tall data. While OEM is developed in the context of the EM algorithm, we show that it is, in fact, an instance of proximal gradient descent, a popular first-order convex optimization algorithm.
