BIOMETRICAL JOURNAL

Variable Selection via Fused Sparse-Group Lasso Penalized Multi-state Models Incorporating Molecular Data
Miah K, Goeman JJ, Putter H, Kopp-Schneider A and Benner A
In multi-state models based on high-dimensional data, effective modeling strategies are required to determine an optimal, ideally parsimonious model. In particular, linking covariate effects across transitions is needed to conduct joint variable selection. A useful technique to reduce model complexity is to address homogeneous covariate effects for distinct transitions. We integrate this approach to data-driven variable selection by extended regularization methods within multi-state model building. We propose the fused sparse-group lasso (FSGL) penalized Cox-type regression in the framework of multi-state models combining the penalization concepts of pairwise differences of covariate effects along with transition-wise grouping. For optimization, we adapt the alternating direction method of multipliers (ADMM) algorithm to transition-specific hazards regression in the multi-state setting. In a simulation study and application to acute myeloid leukemia (AML) data, we evaluate the algorithm's ability to select a sparse model incorporating relevant transition-specific effects and similar cross-transition effects. We investigate settings in which the combined penalty is beneficial compared to global lasso regularization. Clinical Trial Registration: The AMLSG 09-09 trial is registered with ClinicalTrials.gov (NCT00893399) and has been completed.
Non-Markov Nonparametric Estimation of Complex Multistate Outcomes After Hematopoietic Stem Cell Transplantation
Vilsmeier J, Schmeller S, Fürst D and Beyersmann J
Often probabilities of nonstandard time-to-event endpoints are of interest, which are more complex than overall survival. One such probability is chronic graft-versus-host disease (GvHD-) and relapse-free survival, the probability of being alive, in remission, and not suffering from chronic GvHD after stem cell transplantation, with chronic GvHD being a recurrent event. Because the probabilities for endpoints with recurrent events may not fall monotonically, one should not use the Kaplan-Meier estimator for estimation, but the Aalen-Johansen estimator. The Aalen-Johansen is a consistent estimator even in non-Markov scenarios if state occupation probabilities are being estimated and censoring is random. In some multistate models, it is also possible to use linear combinations of Kaplan-Meier estimators, which do not depend on the Markov assumption but can estimate probabilities to be out of bounds. For these linear combinations, we propose a wild bootstrap procedure for inference and compare it with the wild bootstrap for the Aalen-Johansen estimator in non-Markov scenarios. In the proposed procedure, the limiting distribution of the Nelson-Aalen estimator is approximated using the wild bootstrap and transformed via the functional delta method. This approach is adaptable to different multistate models. Using real data, confidence bands are generated using the wild bootstrap for chronic GvHD- and relapse-free survival. Additionally, coverage probabilities of confidence intervals and confidence bands generated by Efron's bootstrap and the wild bootstrap are examined with simulations.
Efficient Testing Using Surrogate Information
Knowlton R and Parast L
In modern clinical trials, there is immense pressure to use surrogate markers in place of an expensive or long-term primary outcome to make more timely decisions about treatment effectiveness. However, using a surrogate marker to test for a treatment effect can be difficult and controversial. Existing methods tend to either rely on fully parametric methods where strict assumptions are made about the relationship between the surrogate and the outcome, or assume the surrogate marker is valid for the entire study population. In this paper, we develop a fully nonparametric method for efficient testing using surrogate information (ETSI). Our approach is specifically designed for settings where there is heterogeneity in the utility of the surrogate marker, that is, the surrogate is valid for certain patient subgroups and not others. ETSI enables treatment effect estimation and hypothesis testing via kernel-based estimation for a setting where the surrogate is used in place of the primary outcome for individuals for whom the surrogate is valid, and the primary outcome is purposefully only measured in the remaining patients. In addition, we provide a framework for future study design with power and sample size estimates based on our proposed testing procedure. Throughout, we assume a continuous surrogate and a primary outcome that may be discrete or continuous. We demonstrate the performance of our methods via a simulation study and application to two distinct HIV clinical trials.
Interpretable Machine Learning for Survival Analysis
Langbein SH, Krzyziński M, Spytek M, Baniecki H, Biecek P and Wright MN
With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
Revisiting Hazard Ratios: Can We Define Causal Estimands for Time-Dependent Treatment Effects?
Edelmann D
In this paper, some aspects concerning the causal interpretation of hazard contrasts are revisited. It is first investigated, in which sense the hazard ratio constitutes a causal effect. It is demonstrated that the hazard ratio at a timepoint represents a causal effect for the population at baseline, but in general not for any population at risk at time . Moreover, the scenario is studied, in which the survival curves coincide up to some timepoint and then separate. This investigation provides valuable insight both on the causal interpretation of the conventional hazard ratio and on properties of the recently proposed causal hazard ratio. The findings suggest that, without making further assumptions, there is in general no meaningful estimand for a treatment effect at time . It is therefore advocated to develop alternative estimands grounded in medically plausible assumptions about the joint distribution of counterfactual survival times.
Impact of Near-Positivity Violations on IPTW-Estimated Marginal Structural Survival Models With Time-Dependent Confounding
Spreafico M
In longitudinal observational studies, marginal structural models (MSMs) are used to analyze the causal effect of an exposure on the (time-to-event) outcome of interest, while accounting for exposure-affected time-dependent confounding. In the applied literature, inverse probability of treatment weighting (IPTW) has been widely adopted to estimate MSMs. An essential assumption for IPTW-based MSMs is positivity, which requires that, for any combination of measured confounders among individuals, there is a nonzero probability of receiving each treatment strategy. Positivity is crucial for valid causal inference through IPTW-based MSMs, but is often overlooked compared to confounding bias. Near-positivity violations, where certain treatments are theoretically possible but rarely observed due to randomness, are common in practical applications, particularly when the sample size is small, and they pose significant challenges for causal inference. This study investigates the impact of near-positivity violations on estimates from IPTW-based MSMs in survival analysis. Two algorithms are proposed for simulating longitudinal data from hazard-MSMs, accommodating near-positivity violations, a time-varying binary exposure, and a time-to-event outcome. Cases of near-positivity violations, where remaining unexposed is rare within certain confounder levels, are analyzed across various scenarios and weight truncation (WT) strategies. Through comprehensive simulations, this study shows that even minor near-positivity violations in longitudinal survival analyses can substantially destabilize IPTW-based estimators, inflating variance and bias, especially under aggressive WT. This work aims to serve as a critical warning against overlooking the positivity assumption or naively applying WT in causal studies using longitudinal observational data and IPTW.
Pseudo-Observation Approach for Length-Biased Cox Proportional Hazards Model
Akbari M, Rad NN and Chen DG
Pseudo-observations are used to estimate the expectation of a function of interest in a population when survival data are incomplete due to censoring or truncation. Length-biased sampling is a special case of a left-truncation model, in which the truncation variable follows a uniform distribution. This phenomenon is commonly encountered in various fields such as survival analysis and epidemiology, where the event of interest is related to the length or duration of an underlying process. In such settings, the probability of observing a data point is higher for longer lengths, leading to biased sampling. The goal of this paper is to apply pseudo-observations to estimate the regression coefficients in the Cox proportional hazards model under length-biased right-censored (LBRC) data. We assess the accuracy and efficiency of two approaches that differ in their generation of pseudo-observations, comparing them with two prominent standard methods in the presence of LBRC data. The results demonstrate that the two proposed pseudo-observation methods are comparable to the standard methods in terms of standard error, with advantages in providing confidence intervals that are closer to the nominal level in large sample sizes and specific scenarios. Additionally, although length-biased data are a special case of left-truncated data, they must be addressed separately by utilizing the information that the left-truncation variable follows a uniform distribution, as the simulation results show. We also establish the consistency and asymptotic normality of one of the proposed estimators. Finally, we applied the method to analyze a real dataset from LBRC.
A New Approach to the Nonparametric Behrens-Fisher Problem With Compatible Confidence Intervals
Schüürhuis S, Konietschke F and Brunner E
We propose a new method to address the nonparametric Behrens-Fisher problem, allowing for unequal distribution functions across the two samples. The procedure tests the null hypothesis , where denotes the Mann-Whitney effect. Apart from the trivial case of one-point distributions, no restrictions are imposed on the underlying data distribution. The test is derived by evaluating the ratio of the true variance of the Mann-Whitney effect estimator to its theoretical maximum, as derived from the Birnbaum-Klose inequality. Through simulations, we demonstrate that the proposed test effectively controls the type-I error rate under various conditions, including small and unbalanced sample sizes, and different data-generating mechanisms. Notably, it provides better control of the type-I error rate than the widely used Brunner-Munzel test, particularly at small significance levels such as . We further construct range-preserving compatible confidence intervals and show that they exhibit improved coverage compared to the confidence intervals compatible to the Brunner-Munzel test. Finally, we illustrate the application of the method in a clinical trial example.
Intercept Estimation of Semi-Parametric Joint Models in the Context of Longitudinal Data Subject to Irregular Observations
Ledesma L and Pullenayegum E
Longitudinal data are often subject to irregular visiting times, with outcomes and visit times influenced by a latent variable. Semi-parametric joint models that account for this dependence have been proposed; among these, the Sun model is the most suitable for count data as it employs a multiplicative link function. Semi-parametric joint models define an intercept function as the mean outcome when all covariates are set to zero; this is differenced out in the course of estimation and is consequently not estimated. The Sun estimator thus provides estimates of relative covariate effects, but is unable to provide estimates of absolute effects or of longitudinal prognosis in the absence of covariates. We extend the Sun model by additionally estimating the intercept term, showing that our extended estimator is consistent and asymptotically Normal. In simulations, our estimator outperforms the original Sun estimator in terms of bias and standard error and is also more computationally efficient. We apply our estimator to a longitudinal study of tumor recurrence among bladder cancer patients. Provided the intercept term can be adequately captured using splines, we recommend that our extended Sun estimator be used in place of the original estimator, since it leads to smaller bias, smaller standard errors, and allows estimation of the mean outcome trajectories.
The Locally Active-Controlled Optimal Design: Applications in Oncology Clinical Studies
Zhang X and Shen G
Antitumor activity in oncology clinical trials is typically assessed using overall survival (OS) or progression-free survival (PFS) endpoints, which are often imprecise and uninformative in small, noncomparative studies. The tumor growth inhibition (TGI) model, which captures both drug effects and natural tumor growth, quantitatively characterizes tumor size dynamics as a function of drug dosage, offering a more informative framework for comparing cancer treatments. In this work, we study the locally optimal design for a comparative oncology trial in which Dalpiciclib is the investigational agent and Capecitabine is the reference drug under an active control (AC) setting. Our novel approach avoids unrealistic distributional assumptions about response measurements. The resulting locally AC-optimal design minimizes the variance of the estimated matching dose of Dalpiciclib to Capecitabine and may unify Phase II and Phase III objectives by allowing evaluation of a higher Dalpiciclib dose with prespecified superiority.
Evaluating Causal Effects on Time-to-Event Outcomes in an RCT in Oncology With Treatment Discontinuation
Ballerini V, Bornkamp B, Mealli F, Wang C, Zhang Y and Mattei A
In clinical trials, patients may discontinue treatments prematurely, breaking the initial randomization. In our motivating study, a randomized controlled trial in oncology, patients assigned the investigational treatment may discontinue it due to adverse events. The ICH E9(R1) Addendum provides guidelines for handling such "intercurrent events." The right strategy to adopt depends on the questions of interest. We propose adopting a principal stratum strategy and decomposing the overall intention-to-treat effect into principal causal effects for groups of patients defined by their potential discontinuation behaviour. We first show how to implement a principal stratum strategy to assess causal effects on a survival outcome in the presence of continuous-time treatment discontinuation, its advantages, and the conclusions that can be drawn. Our strategy allows us to properly handle the time-to-event intermediate variable, which is not defined for patients who would not discontinue, and to account for the fact that the discontinuation time and the primary endpoint are subject to censoring. We employ a flexible model-based Bayesian approach to tackle these complexities, providing easily interpretable results. We apply this Bayesian principal stratification framework to analyze synthetic data of the motivating oncology trial. Supported by a simulation study, we shed light on the role of covariates in this framework. Beyond making structural and parametric assumptions more credible, they lead to more precise inference. Also, they can be used to characterize patients' discontinuation behavior, which could help inform clinical practice and future protocols.
Bayesian Structure Learning for Graphical Models With Symmetry Constraints
Li Q, Wang N, Gao X and Pan J
PAM50 gene expression profiling, a popular and widely used tool, is employed to identify and assess the functional relationships and pathways among genes in patients with breast cancer. Motivated by a study aimed at concurrently recovering dependency and symmetric networks for the PAM50 gene data set, we consider the graphical Gaussian model with symmetry constraints on edges and vertices. The symmetry constraints in the model are represented by imposing equality constraints on the concentration matrix. This model allows us to simultaneously explore the dependency relationships and symmetrical structure among the variables. The symmetrical structure of PAM50 gene expression can deepen our understanding of their functional similarities and the inherent symmetrical properties of gene regulatory behavior. Prioritizing candidate genes with high functional similarity will help elucidate the underlying biological mechanisms for the disease progression. To effectively capture the network's structure, we utilize a birth-death Markov Chain Monte Carlo method. This method is a continuous-time and transdimensional search algorithm that is particularly effective in this context. To further improve the efficiency of the algorithm, we propose a stepwise model learning strategy combined with an approximation method for the posterior distribution. To validate the effectiveness of our approach, we finally apply it in various simulation studies as well as in a practical application involving the PAM50 gene expression data set.
Generalized Bayesian Inference for Causal Effects Using the Covariate Balancing Procedure
Orihara S, Momozaki T and Nakagawa T
In observational studies, the propensity score plays a central role in estimating causal effects of interest. The inverse probability weighting (IPW) estimator is commonly used for this purpose. However, if the propensity score model is misspecified, the IPW estimator may produce biased estimates of causal effects. Previous studies have proposed some robust propensity score estimation procedures. However, these methods require considering parameters that dominate the uncertainty of sampling and treatment allocation. This study proposes a novel Bayesian estimating procedure that necessitates probabilistically deciding the parameter, rather than deterministically. Since the IPW estimator and propensity score estimator can be derived as solutions to certain loss functions, the general Bayesian paradigm, which does not require considering the full likelihood, can be applied. Therefore, our proposed method only requires the same level of assumptions as ordinary causal inference contexts. The proposed Bayesian method demonstrates equal or superior results compared to some previous methods in simulation experiments and is also applied to real data, namely the Whitehall dataset.
Sparse Canonical Correlation Analysis for Multiple Measurements With Latent Trajectories
Senar N, Zwinderman AH and Hof MH
Canonical correlation analysis (CCA) is a widely used multivariate method in omics research for integrating high-dimensional datasets. CCA identifies hidden links by deriving linear projections of observed features that maximally correlate datasets. An important requirement of standard CCA is that observations are independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are suboptimal for high-dimensional analysis and exploiting this data's longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent variable level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement an penalty to enforce fixed sparsity levels. We estimate these trajectories fitting longitudinal models to the low-dimensional latent variables, leveraging the clustered structure of high-dimensional datasets, thus exploring shared longitudinal latent mechanisms. Furthermore, modeling time in the latent space significantly reduces computational burden. We validate our model's performance using simulated data and show its real-world applicability with data from the Human Microbiome Project. This application highlights the model's ability to handle high-dimensional, sparsely, and irregularly observed data. Our CCA method for repeated measurements enables efficient estimation of canonical correlations across measurements for clustered data. Compared to existing methods, ours substantially reduces computational time in high-dimensional analyses as well as provides longitudinal trajectories that yield interpretable and insightful results.
Censoring and Competing Risks: Avoidable and Non-Avoidable Events. Comment to the Article "Hazards constitute key quantities for analysing, interpreting and understanding time-to-event data" by Beyersmann, Schmoor, and Schumacher
Andersen PK
It is argued that even though censoring and competing events, technically, play similar roles when estimating hazard functions, they are conceptually different and should be treated as such when interpreting time-to-event data.
ADDIS-Graphs for Online Error Control With Application to Platform Trials
Fischer L, Bofill Roig M and Brannath W
In contemporary research, online error control is often required, where an error criterion, such as familywise error rate (FWER) or false discovery rate (FDR), shall remain under control while testing an a priori unbounded sequence of hypotheses. The existing online literature mainly considered large-scale studies and constructed powerful but rigid algorithms for these. However, smaller studies, such as platform trials, require high flexibility and easy interpretability to take study objectives into account and facilitate the communication. Another challenge in platform trials is that due to the shared control arm some of the -values are dependent and significance levels need to be prespecified before the decisions for all the past treatments are available. We propose adaptive-discarding-Graphs (ADDIS-Graphs) with FWER control that due to their graphical structure perfectly adapt to such settings and provably uniformly improve the state-of-the-art method. We introduce several extensions of these ADDIS-Graphs, including the incorporation of information about the joint distribution of the -values and a version for FDR control.
Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics
Limpoco MAA, Faes C and Hens N
Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.
Multivariate Bayesian Dynamic Borrowing for Repeated Measures Data With Application to External Control Arms in Open-Label Extension Studies
Hartley BF, Psioda MA and Mander AP
Borrowing analyses are increasingly important in clinical trials. We develop a method for using robust mixture priors in multivariate dynamic borrowing. The method was motivated by a desire to produce causally valid, long-term treatment effect estimates of a continuous endpoint from a single active-arm open-label extension study following a randomized clinical trial by dynamically incorporating prior beliefs from a long-term external control arm. The proposed method is a generally applicable Bayesian dynamic borrowing analysis for estimates of multivariate summary metrics based on a multivariate normal likelihood function for various parameter models, some of which we describe. There are important connections to estimation incorporating a prior belief for a hypothetical estimand strategy, that is, had the event not occurred, for intercurrent events which lead to missing data.
Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes: The Genetic Latent Factor Approach
Melsen KAC, Kunst JF, Crossa J, Krause MR, van Eeuwijk FA, Kruijer W and Peeters CFW
Decreasing costs and new technologies have led to an increase in the amount of data available to plant breeding programs. High-throughput phenotyping (HTP) platforms routinely generate high-dimensional datasets of secondary features that may be used to improve genomic prediction accuracy. However, integration of these data comes with challenges such as multicollinearity, parameter estimation in settings, and the computational complexity of many standard approaches. Several methods have emerged to analyze such data, but interpretation of model parameters often remains challenging. We propose genetic latent factor best linear unbiased prediction (glfBLUP), a prediction pipeline that reduces the dimensionality of the original secondary HTP data using generative factor analysis. In short, glfBLUP uses redundancy filtered and regularized genetic and residual correlation matrices to fit a maximum likelihood factor model and estimate genetic latent factor scores. These latent factors are subsequently used in multitrait genomic prediction. Our approach performs better than alternatives in extensive simulations and a real-world application, while producing easily interpretable and biologically relevant parameters. We discuss several possible extensions and highlight glfBLUP as the basis for a flexible and modular multitrait genomic prediction framework.
Weibull Regression With Both Measurement Error and Misclassification in Covariates
Cao Z and Wong MY
The problem of measurement error and misclassification in covariates is ubiquitous in nutritional epidemiology and some other research areas, which often leads to biased estimate and loss of power. However, addressing both measurement error and misclassification simultaneously in a single analysis is challenged and less actively studied, especially in regression model for survival data with censoring. The approximate maximum likelihood estimation (AMLE) has been proved to be an effective method to correct both measurement error and misclassification simultaneously in a logistic regression model. However, its impact on survival analysis models has not been studied. In this paper, we study biases caused by both measurement error and misclassification in covariates from a Weibull accelerated failure time model, and explore the use of AMLE and its asymptotic properties to correct these biases. Extensive simulation studies are conducted to evaluate the finite-sample performance of the resulting estimator. The proposed method is then applied to deal with measurement error and misclassification in some nutrients of interest from the EPIC-InterAct Study.
Sharp Bounds for Continuous-Valued Treatment Effects with Unobserved Confounders
Baitairian JB, Sebastien B, Jreich R, Katsahian S and Guilloux A
In causal inference, treatment effects are typically estimated under the ignorability, or unconfoundedness, assumption, which is often unrealistic in observational data. By relaxing this assumption and conducting a sensitivity analysis, we introduce novel bounds and derive confidence intervals for the Average Potential Outcome (APO)-a standard metric for evaluating continuous-valued treatment or exposure effects. We demonstrate that these bounds are sharp under a continuous sensitivity model, in the sense that they give the smallest possible interval under this model, and propose a doubly robust version of our estimators. In a comparative analysis with another method from the literature, using both simulated and real data sets, we show that our approach not only yields sharper bounds but also achieves good coverage of the true APO, with significantly reduced computation times.