AMERICAN STATISTICIAN

Conceptualizing Experimental Controls Using the Potential Outcomes Framework
Hunter KB, Koenig K and Bind MA
The goal of a controlled experiment is to remove unwanted variation when estimating the causal effect of the intervention. Experiments conducted in the basic sciences frequently achieve this goal using experimental controls, such as "negative" and "positive" controls, which are additional experimental components designed to detect systematic sources of variation. We introduce a taxonomy of clear, mathematically-precise definitions of experimental controls using the potential outcomes framework. These definitions are intended for pedagogical purposes: they may be used by educators to ensure that their students adhere to good statistical practice, and may also be useful for communication with practitioners who are less familiar with statistical concepts. We define three types of experimental controls based on assumptions about potential outcomes: treatment, outcome, and contrast controls. After each type of control is introduced, we provide examples of its use. We also discuss experimental controls as tools for researchers to use in designing experiments and detecting potential design flaws, such as identifying unwanted variation. We believe that experimental controls are powerful yet underutilized tools for reproducible, replicable, rigorous, and transparent research.
A Cornucopia of Maximum Likelihood Algorithms
Lange K, Li XJ and Zhou H
Classroom expositions of maximum likelihood estimation (MLE) rely on traditional calculus methods to construct analytic solutions. This creates in students a false sense of the ease with which MLE problems can be attacked. In a nod to reality, some teachers mention and apply Newton's method, Fisher scoring, and the expectation-maximization (EM) algorithm. Although preferable to leaving students in a state of ignorance, such brief expositions ultimately fail to expose the full body of relevant techniques. Some of these techniques extend more readily to high-dimensional data problems than Newton's method and scoring. The current paper emphasizes block ascent and descent, profile likelihoods, the minorization-maximization (MM) principle, and their creative combination. These themes are put to work in readable Julia code to solve several MLE problems.
Laplace's law of succession estimator and M-statistics
Demidenko E
The classic formula for estimating the binomial probability as the proportion of successes contradicts common sense for extreme probabilities when the event never occurs or occurs every time. Laplace's law of succession estimator, one of the first applications of Bayesian statistics, has been around for over 250 years and resolves the paradoxes, although rarely discussed in modern statistics texts. This work aims to introduce a new theory for exact optimal statistical inference using Laplace's law of succession estimator as a motivating example. We prove that this estimator may be viewed from a different theoretical perspective as the limit point of the short confidence interval on the double-log scale when the confidence level approaches zero. This motivating example paves the road to the definition of an estimator as the inflection point on the cumulative distribution function as a function of the parameter given the observed statistic. This estimator has the maximum infinitesimal probability of the coverage of the unknown parameter and, therefore, is called the maximum concentration (MC) estimator as a part of a more general M-statistics theory. The new theory is illustrated with exact optimal confidence intervals for the normal standard deviation and the respective MC estimators.
A Multiple Imputation Approach for the Cumulative Incidence, with Implications for Variance Estimation
Chase EC, Boonstra PS and Taylor JMG
We present an alternative approach to estimating the cumulative incidence function that uses non-parametric multiple imputation to reduce the problem to that of estimating a binomial proportion. In the standard competing risks setting, we show mathematically and empirically that our imputation-based estimator is equivalent to the Aalen-Johansen estimator of the cumulative incidence given a sufficient number of imputations. However, our approach allows for the use of a wider variety of methods for the analysis of binary outcomes, including preferred options for uncertainty estimation. While we focus on the cumulative incidence function, the multiple imputation approach likely extends to more complex problems in competing risks.
Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors
Ge L, Zhang Y, Waller LA and Lyles RH
Epidemiologic screening programs often make use of tests with small, but non-zero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods.
Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics
Han L, Arfè A and Trippa L
The use of simulation-based sensitivity analyses is fundamental for evaluating and comparing candidate designs of future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics with respect to various unknown parameters. Typical examples of operating characteristics include the likelihood of detecting treatment effects and the average study duration, which depend on parameters that are unknown until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios and (ii) the list of operating characteristics of interest. We propose a new approach for choosing the set of scenarios to be included in a sensitivity analysis. We maximize a utility criterion that formalizes whether a specific set of sensitivity scenarios is adequate to summarize how the operating characteristics of the trial design vary across plausible values of the unknown parameters. Then, we use optimization techniques to select the best set of simulation scenarios (according to the criteria specified by the investigator) to exemplify the operating characteristics of the trial design. We illustrate our proposal in three trial designs.
Understanding the implications of a complete case analysis for regression models with a right-censored covariate
Ashner MC and Garcia TP
Despite its drawbacks, the complete case analysis is commonly used in regression models with incomplete covariates. Understanding when the complete case analysis will lead to consistent parameter estimation is vital before use. Our aim here is to demonstrate when a complete case analysis is consistent for randomly right-censored covariates and to discuss the implications of its use even when consistent. Across the censored covariate literature, different assumptions are made to ensure a complete case analysis produces a consistent estimator, which leads to confusion in practice. We make several contributions to dispel this confusion. First, we summarize the language surrounding the assumptions that lead to a consistent complete case estimator. Then, we show a unidirectional hierarchical relationship between these assumptions, which leads us to one sufficient assumption to consider before using a complete case analysis. Lastly, we conduct a simulation study to illustrate the performance of a complete case analysis with a right-censored covariate under different censoring mechanism assumptions, and we demonstrate its use with a Huntington disease data example.
Prioritizing Variables for Observational Study Design using the Joint Variable Importance Plot
Liao LD, Zhu Y, Ngo AL, Chehab RF and Pimentel SD
Observational studies of treatment effects require adjustment for confounding variables. However, causal inference methods typically cannot deliver perfect adjustment on all measured baseline variables, and there is often ambiguity about which variables should be prioritized. Standard prioritization methods based on treatment imbalance alone neglect variables' relationships with the outcome. We propose the joint variable importance plot to guide variable prioritization for observational studies. Since not all variables are equally relevant to the outcome, the plot adds outcome associations to quantify the potential confounding jointly with the standardized mean difference. To enhance comparisons on the plot between variables with different confounding relationships, we also derive and plot bias curves. Variable prioritization using the plot can produce recommended values for tuning parameters in many existing matching and weighting methods. We showcase the use of the joint variable importance plots in the design of a balance-constrained matched study to evaluate whether taking an antidiabetic medication, glyburide, increases the incidence of C-section delivery among pregnant individuals with gestational diabetes.
Assignment-Control Plots: A Visual Companion for Causal Inference Study Design
Aikens RC and Baiocchi M
An important step for any causal inference study design is understanding the distribution of the subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. We propose a set of visualizations that reduce the space of measured covariates into two components of baseline variation important to the design of an observational causal inference study: a propensity score summarizing baseline variation associated with treatment assignment, and prognostic score summarizing baseline variation associated with the untreated potential outcome. These and variations thereof visualize study design trade-offs and illustrate core methodological concepts in causal inference. As a practical demonstration, we apply assignment-control plots to a hypothetical study of cardiothoracic surgery. To demonstrate how these plots can be used to illustrate nuanced concepts, we use them to visualize unmeasured confounding and to consider the relationship between propensity scores and instrumental variables. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development.
Improved approximation and visualization of the correlation matrix
Graffelman J and de Leeuw J
The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example data set, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.
The R2D2 prior for generalized linear mixed models
Yanchenko E, Bondell HD and Reich BJ
In Bayesian analysis, the selection of a prior distribution is typically done by considering each parameter in the model. While this can be convenient, in many scenarios it may be desirable to place a prior on a summary measure of the model instead. In this work, we propose a prior on the model fit, as measured by a Bayesian coefficient of determination ( ), which then induces a prior on the individual parameters. We achieve this by placing a beta prior on and then deriving the induced prior on the global variance parameter for generalized linear mixed models. We derive closed-form expressions in many scenarios and present several approximation strategies when an analytic form is not possible and/or to allow for easier computation. In these situations, we suggest approximating the prior by using a generalized beta prime distribution and provide a simple default prior construction scheme. This approach is quite flexible and can be easily implemented in standard Bayesian software. Lastly, we demonstrate the performance of the method on simulated and real-world data, where the method particularly shines in high-dimensional settings, as well as modeling random effects.
Flexible Distributed Lag Models for Count Data Using mgcv
Economou T, Parliari D, Tobias A, Dawkins L, Steptoe H, Sarran C, Stoner O, Lowe R and Lelieveld J
In this tutorial we present the use of R package mgcv to implement Distributed Lag Non-Linear Models (DLNMs) in a flexible way. Interpretation of smoothing splines as random quantities enables approximate Bayesian inference, which in turn allows uncertainty quantification and comprehensive model checking. We illustrate various modeling situations using open-access epidemiological data in conjunction with simulation experiments. We demonstrate the inclusion of temporal structures and the use of mixture distributions to allow for extreme outliers. Moreover, we demonstrate interactions of the temporal lagged structures with other covariates with different lagged periods for different covariates. Spatial structures are also demonstrated, including smooth spatial variability and Markov random fields, in addition to hierarchical formulations to allow for non-structured dependency. Posterior predictive simulation is used to ensure models verify well against the data.
Counternull sets in randomized experiments
Bind MC and Rubin DB
Consider a study whose primary results are "not statistically significant". How often does it lead to the following published conclusion that "there is no effect of the treatment/exposure on the outcome"? We believe too often and that the requirement to report counternull values could help to avoid this! In statistical parlance, the null value of an estimand is a value that is distinguished in some way from other possible values, for example a value that indicates no difference between the general health status of those treated with a new drug versus a traditional drug. A counternull value is a nonnull value of that estimand that is supported by the same amount of evidence that supports the null value. Of course, such a definition depends critically on how "evidence" is defined. Here, we consider the context of a randomized experiment where evidence is summarized by the randomization-based p-value associated with a specified sharp null hypothesis. Consequently, a counternull value has the same p-value from the randomization test as does the null value; the counternull value is rarely unique, but rather comprises a of values. We explore advantages to reporting a counternull set in addition to the p-value associated with a null value; a first advantage is pedagogical, in that reporting it avoids the mistake of implicitly accepting a not-rejected null hypothesis; a second advantage is that the effort to construct a counternull set can be scientifically helpful by encouraging thought about nonnull values of estimands. Two examples are used to illustrate these ideas.
An Example to Illustrate Randomized Trial Estimands and Estimators
Harrison LJ and Brummel SS
Recently, the International Conference on Harmonisation finalized an estimand framework for randomized trials that was adopted by regulatory bodies worldwide. The framework introduced five strategies for handling post-randomization events; namely the treatment policy, composite variable, while on treatment, hypothetical and principal stratum estimands. We describe an illustrative example to elucidate the difference between these five strategies for handling intercurrent events and provide an estimation technique for each. Specifically, we consider the intercurrent event of treatment discontinuation and introduce potential outcome notation to describe five estimands and corresponding estimators: 1) an intention-to-treat estimator of the total effect of a treatment policy; 2) an intention-to-treat estimator of a composite of the outcome and remaining on treatment; 3) a per-protocol estimator of the outcome in individuals observed to remain on treatment; 4) a g-computation estimator of a hypothetical scenario that all individuals remain on treatment; and 5) a principal stratum estimator of the treatment effect in individuals who would remain on treatment under the experimental condition. Additional insight is provided by defining situations where certain estimands are equal, and by studying the while on treatment strategy under repeated outcome measures. We highlight relevant causal inference literature to enable adoption in practice.
A Characterization of Most(More) Powerful Test Statistics with Simple Nonparametric Applications
Vexler A and Hutson AD
Data-driven most powerful tests are statistical hypothesis decision-making tools that deliver the greatest power against a fixed null hypothesis among all corresponding data-based tests of a given size. When the underlying data distributions are known, the likelihood ratio principle can be applied to conduct most powerful tests. Reversing this notion, we consider the following questions. (a) Assuming a test statistic, say , is given, how can we transform to improve the power of the test? (b) Can be used to generate the most powerful test? (c) How does one compare test statistics with respect to an attribute of the desired most powerful decision-making procedure? To examine these questions, we propose one-to-one mapping of the term "most powerful" to the distribution properties of a given test statistic via matching characterization. This form of characterization has practical applicability and aligns well with the general principle of sufficiency. Findings indicate that to improve a given test, we can employ relevant ancillary statistics that do not have changes in their distributions with respect to tested hypotheses. As an example, the present method is illustrated by modifying the usual t-test under nonparametric settings. Numerical studies based on generated data and a real-data set confirm that the proposed approach can be useful in practice.
MOVER-R and Penalized MOVER-R Confidence Intervals for the Ratio of Two Quantities
Wang P, Ma Y, Xu S, Wang YX, Zhang Y, Lou X, Li M, Wu B, Gao G, Yin P and Liu N
Developing a confidence interval for the ratio of two quantities is an important task in statistics because of its omnipresence in real world applications. For such a problem, the MOVER-R (method of variance recovery for the ratio) technique, which is based on the recovery of variance estimates from confidence limits of the numerator and the denominator separately, was proposed as a useful and efficient approach. However, this method implicitly assumes that the confidence interval for the denominator never includes zero, which might be violated in practice. In this article, we first use a new framework to derive the MOVER-R confidence interval, which does not require the above assumption and covers the whole parameter space. We find that MOVER-R can produce an unbounded confidence interval, just like the well-known Fieller method. To overcome this issue, we further propose the penalized MOVER-R. We prove that the new method differs from MOVER-R only at the second order. It, however, always gives a bounded and analytic confidence interval. Through simulation studies and a real data application, we show that the penalized MOVER-R generally provides a better confidence interval than MOVER-R in terms of controlling the coverage probability and the median width.
Sequential monitoring using the Second Generation P-Value with Type I error controlled by monitoring frequency
Chipman JJ, Greevy RA, Mayberry L and Blume JD
The Second Generation P-Value (SGPV) measures the overlap between an estimated interval and a composite hypothesis of parameter values. We develop a sequential monitoring scheme of the SGPV (SeqSGPV) to connect study design intentions with end-of-study inference anchored on scientific relevance. We build upon Freedman's "Region of Equivalence" (ROE) in specifying scientifically meaningful hypotheses called Pre-specified Regions Indicating Scientific Merit (PRISM). We compare PRISM monitoring versus monitoring alternative ROE specifications. Error rates are controlled through the PRISM's indifference zone around the point null and monitoring frequency strategies. Because the former is fixed due to scientific relevance, the latter is a targettable means for designing studies with desirable operating characters. An affirmation step to stopping rules improves frequency properties including the error rate, the risk of reversing conclusions under delayed outcomes, and bias.
The Sign Test, Paired Data, and Asymmetric Dependence: A Cautionary Tale
Hutson AD and Yu H
In the paired data setting, the sign test is often described in statistical textbooks as a test for comparing differences between the medians of two marginal distributions. There is an implicit assumption that the median of the differences is equivalent to the difference of the medians when employing the sign test in this fashion. We demonstrate however that given asymmetry in the bivariate distribution of the paired data, there are often scenarios where the median of the differences is not equal to the difference of the medians. Further, we show that these scenarios will lead to a false interpretation of the sign test for its intended use in the paired data setting. We illustrate the false-interpretation concept via theory, a simulation study, and through a real-world example based on breast cancer RNA sequencing data obtained from the Cancer Genome Atlas (TCGA).
Proximal MCMC for Bayesian Inference of Constrained and Regularized Estimation
Zhou X, Heng Q, Chi EC and Zhou H
This paper advocates proximal Markov Chain Monte Carlo (ProxMCMC) as a flexible and general Bayesian inference framework for constrained or regularized estimation. Originally introduced in the Bayesian imaging literature, ProxMCMC employs the Moreau-Yosida envelope for a smooth approximation of the total-variation regularization term, fixes variance and regularization strength parameters as constants, and uses the Langevin algorithm for the posterior sampling. We extend ProxMCMC to be fully Bayesian by providing data-adaptive estimation of all parameters including the regularization strength parameter. More powerful sampling algorithms such as Hamiltonian Monte Carlo are employed to scale ProxMCMC to high-dimensional problems. Analogous to the proximal algorithms in optimization, ProxMCMC offers a versatile and modularized procedure for conducting statistical inference on constrained and regularized problems. The power of ProxMCMC is illustrated on various statistical estimation and machine learning tasks, the inference of which is traditionally considered difficult from both frequentist and Bayesian perspectives.
Expressing regret: a unified view of credible intervals
Rice K and Ye L
Posterior uncertainty is typically summarized as a credible interval, an interval in the parameter space that contains a fixed proportion - usually 95% - of the posterior's support. For multivariate parameters, credible sets perform the same role. There are of course many potential 95% intervals from which to choose, yet even standard choices are rarely justified in any formal way. In this paper we give a general method, focusing on the loss function that motivates an estimate - the Bayes rule - around which we construct a credible set. The set contains all points which, as estimates, would have minimally-worse expected loss than the Bayes rule: we call this excess expected loss 'regret'. The approach can be used for any model and prior, and we show how it justifies all widely-used choices of credible interval/set. Further examples show how it provides insights into more complex estimation problems.
Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement
Nguyen M, Eulalio T, Marafino BJ, Rose C, Chen JH and Baiocchi M
A gap remains between developing risk prediction models and deploying models to support real-world decision making, especially in high-stakes situations. Human-experts' reasoning abilities remain critical in identifying potential improvements and ensuring safety. We propose a (TDA) framework for eliciting and combining expert-human insight into the evaluation of models. The insight is threefold: (1) statistical methods are limited to using joint distributions of observable quantities for predictions but often there is more information available in a real-world than what is usable for algorithms, (2) domain experts can access more information (e.g., patient files) than an algorithm and bring additional knowledge into their assessments through leveraging insights and experiences, and (3) experts can re-frame and re-evaluate prediction problems to suit real-world situations. Here, we revisit an example of predicting temporal risk for intensive care admission within 24 hours of hospitalization. We propose a sampling procedure for identifying informative cases for deeper inspection. Expert feedback is used to understand sources of information to improve model development and deployment. We recommend model assessment based on objective evaluation metrics derived from subjective evaluations of the problem formulation. TDA insights facilitate iterative model development towards safer, actionable, and acceptable risk predictions.