Epidemics

A binary prototype for time-series surveillance and intervention
Olejarz J, Hoffmann T, Zapf A, Mugahid D, Molinaro R, Brown C, Boltyenkov A, Dudykevych T, Gupta A, Lipsitch M, Atun R, Onnela JP, Fortune S, Sampath R and Grad YH
Despite much research on early detection of anomalies from surveillance data, a systematic framework for appropriately acting on these signals is lacking. We addressed this gap by formulating a hidden Markov-style model for time-series surveillance, where the system state, the observed data, and the decision rule are all binary. We incur a delayed cost, c, whenever the system is abnormal and no action is taken, or an immediate cost, k, with action, where k
The transmission dynamics of Norovirus in England: A genotype-specific modelling study
Vesga JF, Douglas A, Celma C, Knock ES, Baguelin M and Edmunds WJ
Norovirus is the leading cause of acute gastroenteritis cases in England and worldwide, with diverse co-circulating genotypes. Vaccine candidates targeting multiple genotypes are advancing. However, most transmission models still focus on single-strain dynamics, limiting their ability to assess the role of co-circulating strains on population burden.
A deep learning approach for enhancing pandemic prediction: A retrospective evaluation of transformer neural networks and multi-source data fusion for infectious disease forecasting
Wu J, Tanim S, Woo M, Ahammed T, Bleichrodt AM and Rennert L
This paper introduces a deep learning model for county-level Covid-19 forecasting, presenting it as a retrospective case study. We utilize a transformer neural network with multi-source data fusion, incorporating historical case data, death data, and social media sentiment to capture complex temporal and spatial dynamics. Additionally, we develop multi-level and multi-scale attention mechanisms for adaptive time-frequency analysis. In a retrospective evaluation across three Omicron variant waves (December 2021 through February 2023), the model demonstrated strong performance in predicting county-level Covid-19 cases and deaths, with median county agreement accuracy ranging from 74.0 % to 82.6 % for one-week case forecasts and 68.7-79.6 % for 5-week case forecasts. While these historical results are promising, prospective validation is needed to assess the model's utility under live, evolving data conditions. Median county agreement accuracy for deaths ranged from 83.2 % to 86.3 % for one-week forecasts and 84.3-87.2 % for five-week forecasts. Incorporating social media data yielded mild to moderate improvement in forecasting accuracy. Overall, the proposed model yielded substantial improvements compared to a baseline persistence model utilizing the last observation carried forward. By integrating real-time data and capturing complex pandemic dynamics, this approach surpasses traditional methods. The results demonstrate the model's strong performance in a retrospective setting, highlighting the utility of multi-source data fusion and attention mechanisms for fine-grained epidemiological forecasting. This work serves as a case study on the application of advanced deep learning techniques to local-level pandemic data, offering a methodological framework for future research.
Supporting LGBTQ+ epidemiologists in the UK during research-related travel and international collaboration
Hicks JT, Daniels BC, Maddren R, Doyle J, Mager T, Atchison CJ and Okell L
Conferences, fieldwork, international positions, and collaborations with international partners are beneficial to any epidemiologist, strengthening relationships with fellow scientists, policymakers, health professionals, and those affected by the studied disease. However, international working can pose unique challenges for minority groups. In the UK, LGBTQ+ scientists have a degree of legal protection against discrimination, and universities often have LGBTQ+ staff-student networks that provide support. By contrast, international work can present barriers that non-LGTBQ+ colleagues may not be aware of, such as stress when travelling to countries with anti-LGBTQ+ laws, policies, or sentiments. Homophobic, biphobic, and transphobic beliefs, policies, and actions fluctuate over time, but persist or are on the rise in many locations across the world, including high-income countries. Without institutional support, work-related travel can present a cognitive burden, threatening both physical safety and mental well-being of LGBTQ+ researchers. At Imperial College London, we have worked to address these challenges by developing resources and training for LGBTQ+ staff, students, and allies. We developed an initiative including the creation of online written resources, integration of these materials into travel safety protocols, and a partnership with a LGBTQ+ mental health organization to offer in-person training. We present our experience developing these resources, describe feedback of training participants, and discuss strategies for institutions to develop their own support resources, fostering greater equity in the research experience for individuals of all identities.
A pilot study to correlate wastewater and clinical surveillance for hepatitis A in New York state
Weidmann MD, Bryant PW, Schoultz L, Jadhav A, Rickerman L, Hill DT, Lamson DM, Larsen DA and St George K
The COVID-19 pandemic prompted a rapid expansion of wastewater-based surveillance in New York State (NYS). Pilot studies were initiated in 2023 to assess the use of this system for the surveillance of hepatitis A virus (HAV) and other pathogens of public health interest. A known cause of outbreaks in the US associated with contaminated food products and transmission between injection drug users, HAV is present in feces for weeks before the onset of symptoms. However, the use of wastewater surveillance as an early warning system has not been assessed outside of the outbreak setting. We compare clinical HAV surveillance with quantitative testing of wastewater samples for HAV RNA from four counties in NYS between September 2022 and November 2023, a period of relatively low HAV incidence. There was a significantly higher mean concentration of HAV RNA in wastewater from sewersheds in districts with reported HAV cases, relative to those without (267 vs. 21 gene copies per microliter, p < 0.05). For 91 % of HAV cases, HAV RNA was detected in the wastewater from the same county between HAV exposure onset and diagnosis, and new HAV RNA detection in wastewater occurred, on average, 41 days before case diagnosis. Our findings demonstrate that wastewater surveillance may provide early warning of case clusters at the county level in low-incidence settings and may allow for detection of otherwise missed asymptomatic or mild illness. Expansion of testing to include all sewersheds in each county may further improve the sensitivity for identifying locations for targeted HAV intervention.
Artificial intelligence for health security in Africa: Benefits, risks and opportunities
Standley CJ, Breugelmans JG, Chaudhari A, Cherian N, Chwalek S, Deol A, Dietrich J, du Moulin L, Otim G, James W, Kloth S, Masmoudi S, Ndembi N, Ndlovu N, Scarponi D, Schnetzinger F, Shapiro M and Hebbeler A
Artificial intelligence (AI) provides paradigm-shifting opportunities to accelerate epidemic preparedness and response and ensure health security. Such benefits may be particularly applicable to countries in Africa, which have to date struggled to meet compliance obligations under international health security frameworks. Here, we build on discussions that took place at the March 2024 Health Security Partnership for Africa workshop in Addis Ababa, Ethiopia, to describe potential applications of AI-enabled approaches to accelerate activities throughout the preparedness ecosystem, with a particular focus on the rapid development and deployment of novel vaccines in support of the 100 Days Mission, focusing on Africa. We also consider the risks and barriers that may challenge successful deployment of AI for health security in African settings, and opportunities to elevate African leadership on governance and implementation.
An optimized geo-hierarchical ensemble model to forecast hospitalizations from respiratory viruses in the United States
Xu S, Du H, Dong E, Wang X, Zhang L and Gardner LM
Accurate forecasting of infectious diseases is crucial for timely public health response. Ensemble frameworks have shown promising outcomes in short-term forecasting of COVID-19, among other respiratory viruses, however, there is a need to further improve these frameworks. Here, we propose the generalized Optimized Geo-Hierarchical Ensemble Model (OGEM), a novel forecasting machine learning framework to forecast state-level hospitalizations of influenza, COVID-19, and RSV in the U.S. independently. This framework is multi-resolution: it integrates state, regionally-trained, and nationally-trained models through an ensemble layer and applies various optimization methods to parameterize the model weights and enhance overall predictive accuracy. This proposed framework builds on existing forecasting literature by 1) employing an ensemble of three spatially hierarchical models with state-level forecasts as the output; 2) incorporating four distinct weight optimization methods to generate the ensemble; 3) utilizing clustering methods to dynamically identify multi-state regions as a function of short-term and long-term hospitalization trends for the regionally-trained model; and 4) providing a generalized framework to forecast the expected near-term hospitalizations from Influenza, RSV and COVID-19. Results demonstrate OGEM is a robust framework with relatively high performance. Extensive experimentation using historical data highlights the predictive power of our framework compared to existing ensemble approaches. Its robust performance underscores the framework's effectiveness and potential for improving and broadening infectious disease forecasting.
Modeling the spatio-temporal spread of cholera in France in 1892
Perlant C, Weill FX, Paireau J, Scipioni M, Bosetti P and Cauchemez S
From an historical perspective, it is important to understand how past epidemics spread; but such a task is complicated by limited data availability. Here, using unique digitized historical data, we characterized the patterns and drivers of spread of the last major French cholera epidemic in 1892. We found that epidemic dynamics are well captured by a standard gravity model, highlighting the key contribution of human mobility to cholera spread. Our findings also underscore the crucial role of major commercial ports that acted both as points of introduction from external sources (multiple introductions were estimated) and as local transmission hubs (transmission rates increased by a factor of 10 around ports). We also estimated a 2.5-fold increase in transmission rates in mid-August, compensated by a reduction in the duration of infectivity of municipalities, highlighting both seasonality in transmission and the effectiveness of control measures implemented in 1892. Applying modern analytical techniques to historical outbreaks enhances our understanding of past pandemics.
The impact of household physical distancing and its timing on the transmission of SARS-CoV-2: Insights from a household transmission evaluation study
Coletti P, Hens N, Faes C, McLean HQ, Belongia EA, Rolfes M, Mellis A, Reed C, Biddle J, Kim A, Zhu Y, Talbot HK and Grijalva CG
Studies on SARS-CoV-2 household transmission often assume random mixing, overlooking detailed contact patterns and the timing of physical distancing.
Wastewater-based surveillance for influenza and respiratory syncytial virus: Insights from a 21-month study in Oklahoma
Deshpande G, Rimal B, Shelton K, Vogel J, Stevenson B and Kuhn KG
Upper respiratory infections caused by viruses such as respiratory syncytial virus (RSV) and influenza are major health concerns globally. Traditional surveillance methods of these viruses rely on clinical data, which can miss mild or asymptomatic cases, leading to gaps in understanding of their epidemiology. Wastewater-based surveillance (WBS) offers an alternative monitoring approach, providing real-time, population-representative data infection levels. This study aimed to evaluate the value of WBS for monitoring influenza A and B and RSV in Oklahoma from August 2022 to May 2024. Wastewater samples were collected weekly from 18 treatment plants statewide, and viral RNA was quantified using RT-qPCR. We compared wastewater data with reported influenza hospitalizations and RSV test positivity. We found significant seasonality in clinical outcomes as well as wastewater concentrations for influenza A and RSV. Our results also showed comparatively high wastewater concentrations during times when influenza hospitalizations and RSV test positivity were at their seasonal highs. Our study demonstrates the potential for WBS to offer timely insights into respiratory virus trends, particularly for underserved communities. This method complements traditional surveillance, offering a broader understanding of viral transmission and supporting public health interventions.
Random Forest of epidemiological models for Influenza forecasting
Aawar MA and Srivastava A
Forecasting the hospitalizations caused by the Influenza virus is vital for public health planning so hospitals can be better prepared for an influx of patients. Many forecasting methods have been used in real-time during the Influenza seasons and submitted to the CDC for public communication. We hypothesize that we can improve forecasting by using multiple mechanistic models to produce potential trajectories and use machine learning to learn how to combine those trajectories into an improved forecast. We propose a Tree Ensemble model design that utilizes the individual predictors of our baseline model SIkJalpha to improve its performance. Each predictor is generated by changing a set of hyperparameters. We compare our prospective forecasts deployed for the FluSight challenge (seasons ending in 2022, 2023, and 2024) to all the other submitted approaches. Our approach is fully automated and does not require any manual tuning. Our submissions remained in the top 33% of the models in all seasons. We demonstrate that our Random Forest-based approach is able to improve upon the forecasts of the individual predictors in terms of mean absolute error, coverage, and weighted interval score. Our method retrospectively outperformed all other models in terms of the mean absolute error and the weighted interval score based on the mean across all weekly submissions of the 2021-22 season.
Investigating the impact of non-pharmaceutical interventions (NPIs) on post-pandemic Respiratory Syncytial Virus (RSV) hospitalisations and seasonality in Wales, UK
Santiago G, White C, Collins B, Cottrell S, Williams C, Lucini B and Gravenor MB
Respiratory Syncytial Virus (RSV) is a single-stranded RNA virus and a major cause of hospitalisations in paediatric and geriatric populations. In the Northern Hemisphere, the RSV season is typically between October and March. Following the introduction of Non-pharmaceutical Interventions (NPIs), in response to the COVID-19 pandemic, disruptions in seasonality have been observed.
Reimagining the serocatalytic model for infectious diseases: A case study of common coronaviruses
Larsen SL, Yang J, Lv H, Huan YW, Teo QW, Pholcharee T, Lei R, Gopal AB, Shao EK, Talmage L, Mok CKP, Takahashi S, Kraay ANM, Wu NC and Martinez PP
Despite the increased availability of serological data, understanding serodynamics remains challenging. Serocatalytic models, which describe the rate of seroconversion (gain of antibodies) and seroreversion (loss of antibodies) within a population, have traditionally been fit to cross-sectional serological data to capture long-term transmission dynamics. However, a key limitation is their binary assumption on serological status, ignoring heterogeneity in optical density levels, antibody titers, and/or exposure history. Here, we implemented Gaussian mixture models - an established statistical tool - to cross-sectional data in order to characterize serological diversity of seasonal human coronaviruses (sHCoVs) across a wide range of age groups. These methods consistently identified multiple distinct seropositive levels, suggesting that among seropositive individuals, the number of prior exposures or response to infection may vary. We fit adapted, multi-compartment serocatalytic models with different assumptions on exposure history and waning of antibodies. The best fit model for each sHCoV was always one that accounted for host variation in the scale of serological response to infection. These models allowed us to estimate the strength and frequency of serological responses, finding that the time for a seronegative individual to become seropositive ranges from 2.40 to 7.03 years across sHCoVs, and most individuals mount a strong antibody response reflected in high optical density values, skipping lower levels of seropositivity. We find that despite frequent infection and strong serological responses, for all sHCoVs except 229E, individuals are likely to become seronegative again at some point after their first infection. Nonetheless, our results also indicate that by age 22, for each sHCoV the probability of having seroconverted at least once is over 95%. Crucially, our reimagined serocatalytic methods can be flexibly adapted across pathogens, having the potential to be broadly applied beyond this work.
The bridge between two worlds: Global South researchers' journeys through Global North academic training and beyond
Djaafara BA, Mutunga M, Eneanya OA, Forna A and Cucunubá ZM
International training of Global South researchers represents a strategic investment that yields substantial returns, rather than the traditional "brain drain" framing. This perspective synthesises the experiences of infectious disease epidemiologists from Colombia, Indonesia, Kenya, Nigeria, and Sierra Leone who completed training in Global North institutions between 2015 and 2024. Despite facing challenges, language barriers, and representational pressures, we demonstrate how Global South researchers transform these obstacles into unique strengths that enhance local research capabilities. Our experiences also show that Global South researchers serve as vital bridges between academic worlds, contributing irreplaceable contextual knowledge while building collaborative networks that advance infectious disease epidemiology research regardless of geographic location. We provide four strategic recommendations for better infectious disease epidemiology research ecosystems: 1) creating supportive institutional environments in Global North institutions, 2) building sustainable partnerships that strengthen home institutions, 3) embracing individual agency and responsibility, and 4) strengthening regional collaborations while adapting to evolving global contexts. Our narrative progresses from challenges to empowerment, demonstrating that Global South researchers are valuable contributors essential to advancing infectious disease epidemiology research.
Epidemiology and environmental risks of antibiotic resistant Enterobacterales isolates in different aquatic matrices from North-Western Romania
Farkas A, Carpa R, Szekeres E, Teban-Man A, Coman C and Butiuc-Keul A
The most menacing sources of environmental contamination with antibiotic resistant bacteria are effluents derived from anthropic activities. Even when wastewater treatment processes are implemented, conventional methods are not able to completely retain the antibiotic resistance determinants. We propose an antibiotic resistance risk assessment, incorporating the characterisation of ARB, ARGs and MGEs in different environmental compartments. Antibiotic susceptibility testing of 678 Enterobacterales isolates revealed an increased degree of intrinsic resistance to erythromycin (77.9 %), high level of resistance to ampicillin (39.7 %), low frequency of carbapenem resistance (2.36 %), and a percentage of 34.4 % MDR strains. The most frequent resistance determinants were bla (26.5 %) and tetA (8.26 %), while the intI1 gene was found in 7.37 % of isolates. Resistant Enterobacterales from aquatic matrices with different degrees of contamination were identified as Citrobacter spp. (n = 46), Enterobacter spp. (n = 35), Klebsiella spp. (n = 54) and Escherichia coli (n = 107). A strong statistical correlation was observed between the presence of intI1 and the ARG index (0.768) in resistant Enterobacter spp. Distinct clustering of strains was not observed across different environmental matrices, especially in those directly impacted by human-derived bacteria. Also, distribution of ARB patterns and diversity of ARGs was stable from the taxonomic perspective. Dendrogram analysis based on ERIC-PCR profiles confirmed the presence of strains with identical DNA fingerprints in non-related aquatic ecosystems. The epidemiology of resistant Citrobacter, Enterobacter, Klebsiella and Escherichia isolates confirmed an extensive migration and environmental dispersion of strains with human health significance, particularly important for water resources.
Explaining the stable coexistence of drug-resistant and -susceptible pathogens: the resistance acquisition purifying selection model
Pennings PS
Drug resistance is a problem in many pathogens. While overall, levels of resistance have risen in recent decades, there are many examples where after an initial rise, levels of resistance have stabilized. The stable coexistence of resistance and susceptibility has proven hard to explain - in most evolutionary models, either resistance or susceptibility ultimately "wins" and takes over the population. Here, we show that a simple model, mathematically akin to mutation-selection balance theory, can explain several key observations about drug resistance: (1) the stable coexistence of resistant and susceptible strains (2) at levels that depend on population-level drug usage and (3) with resistance often due to many different strains (resistance is present on many different genetic backgrounds). The model is applicable to resistance due to both mutations and horizontal gene transfer (HGT). It predicts that new resistant strains should continuously appear (through mutation or HGT and positive selection within treated hosts) and disappear (due to a fitness cost of resistance). The result is that while resistance is stable, which strains carry resistance is constantly changing. We used data from a longitudinal genomic study on E. coli in Norway to test this prediction for resistance to five different drugs and found that, consistent with the model, most resistant strains indeed disappear quickly after they appear in the dataset. Having a model that explains the dynamics of drug resistance will allow us to plan science-backed interventions to reduce the burden of drug resistance.
Optimisation of wastewater surveillance for COVID-19 after resumption of normalcy from the pandemic: A case of Hong Kong
Lo ES, So SC, Wong LT, Mohammad KN, Law KY, Chan KS, Tsang SW, Lo D, Kung KH, Au AK and Chuang SK
Wastewater surveillance (WWS) was critical to Hong Kong's COVID-19 response, providing early warning indicators and enabling targeted measures to control the epidemic in the city during the pandemic. As the approach to COVID-19 transitioned from containment to long-term management, maintaining the WWS programme became challenging owing to financial limitations. This article chronicles our efforts to optimize the programme to guarantee its long-term sustainability while preserving its efficacy in tracking disease trends and detecting novel variants. Prior to optimization, our WWS programme gathered samples from 120 locations weekly, encompassing 80 % of the population. Drawing from our experience, we examined several optimization measures, such as decreasing frequency of sampling and altering testing procedures. Nonetheless, these methods were deemed impractical owing to operational and technical difficulties. Ultimately, we determined that a reduction in sampling sites was the most viable method. Statistical analyses utilizing data from April 2023 to March 2024 corroborated this methodology, indicating that despite an 85 % decrease in sample locations (from 120 to 18), the surveillance data retained a high degree of reliability (R² > 0.97) compared to the original model. This optimized methodology decreased expenses by about 80 % while maintaining data reliability. By disseminating our methodology and findings, we aim to provide useful information that may aid other jurisdictions in establishing cost-effective WWS systems as they confront analogous difficulties globally.
Environmental drivers of Ixodes ricinus tick population dynamics: Mechanistic modelling using longitudinal field surveys and climate data
Kim Y, Jaulhac B, Vesga JF, Zilliox L, Boulanger N, Edmunds WJ and Métras R
Ixodes ricinus is the primary vector for Lyme disease and tick-borne encephalitis across Europe. However, the environmental drivers of the tick's complex life cycle have not been quantified with real-world data, making it challenging to incorporate tick demography into tick-borne disease transmission models. To address this gap, we fitted a mechanistic model to a detailed 10-year longitudinal dataset from four sites in Northern France, where I. ricinus is abundant and Lyme disease and tick-borne encephalitis have been reported for decades. By incorporating key demographic processes and the influence of environmental conditions on these processes, our model estimated oviposition, hatching, and moulting rates across a range of temperature or saturation deficit, as well as questing and vertebrate host contact rates. In the studied tick population, moulting peaked at 14.2 °C (95 %HDI: 12.5-16.1 °C), substantially lower than commonly suggested by laboratory-based studies, whereas oviposition and hatching peaked at 24.4 °C (95 %HDI: 10.9-27.2 °C) and 24.7 °C (95 %HDI: 17.8-27.2 °C), respectively. Furthermore, the parameter scaling the empirical baseline vertebrate host contact rates was found to vary significantly between the four study sites, with one site presenting up to 2.90 (95 %HDI: 2.15-3.86) times higher contact rates than the other three sites. Additionally, for ticks overwintering through diapause, moulting in spring more accurately matched the predominantly unimodal questing patterns observed, compared to moulting in summer. Finally, model projections under pessimistic climate change scenarios indicated decreasing tick abundance trends over the next two decades, while no significant decrease was predicted under moderate scenarios. This study provides a foundation for models of I. ricinus-borne pathogen transmission and can be adapted to other Ixodidae populations of public health significance.
Advances in approximate Bayesian inference for models in epidemiology
Li X, Chadwick F and Swallow B
Bayesian inference methods are useful in infectious diseases modeling due to their capability to propagate uncertainty, manage sparse data, incorporate latent structures, and address high-dimensional parameter spaces. However, parameter inference through assimilation of observational data in these models remains challenging. While asymptotically exact Bayesian methods offer theoretical guarantees for accurate inference, they can be computationally demanding and impractical for real-time outbreak analysis. This review synthesizes recent advances in approximate Bayesian inference methods that aim to balance inferential accuracy with scalability. We focus on four prominent families: Approximate Bayesian Computation, Bayesian Synthetic Likelihood, Integrated Nested Laplace Approximation, and Variational Inference. For each method, we evaluate its relevance to epidemiological applications, emphasizing innovations that improve both computational efficiency and inference accuracy. We also offer practical guidance on method selection across a range of modeling scenarios. Finally, we identify hybrid exact approximate inference as a promising frontier that combines methodological rigor with the scalability needed for the response to outbreaks. This review provides epidemiologists with a conceptual framework to navigate the trade-off between statistical accuracy and computational feasibility in contemporary disease modeling.
Rtglm: Unifying estimation of the time-varying reproduction number, R, under the Generalised Linear and Additive Models
Nouvellet P
Most current methods to estimate the time-varying reproduction number (R), such as EpiEstim, rely on branching processes and the renewal equation. They also require subjective choices to set the level of temporal and spatial heterogeneity assumed. We propose a novel framework to estimate R based on Generalized Linear and Additive Models (GLM/GAM). By integrating the renewal equation model within GLM/GAM, the proposed framework, "Rtglm", allows smooth estimation of R variations over time and space without relying on arbitrary scaling parameters. The performance of Rtglm was evaluated using historical datasets and simulated outbreaks. It demonstrated improved overall performance and accuracy compared to EpiEstim, as measured by the CRPS scores and Mean Square Errors respectively. However, when case incidence was low and R estimation relied on a smoothing term, Rtglm was marginally overconfident in its estimates. The method offers substantial improvement for the real-time estimation of spatio-temporal trends in R, with improved performance and lower reliance on arbitrarily set parameters. The open-source and user-friendly R package developed will also simplify user experience. Finally, the framework bridges gaps between epidemic monitoring methodologies and sets the stage for future extensions to enhance statistical inference and integrate additional epidemiological complexities, including the evaluation of intervention strategies.
Forecasting regional COVID-19 hospitalisation in England using ordinal machine learning method
Wang H, Kwok KO, Li R and Riley S
The COVID-19 pandemic caused substantial pressure on healthcare, with many systems needing to prepare for and mitigate the consequences of surges in demand caused by multiple overlapping waves of infections. Therefore, public health agencies and health system managers also benefitted from short-term forecasts for respiratory infections that allowed them to manage services. While quantitative forecasts treating hospital admissions as continuous variables existed, many health managers prefer discrete levels of demand, similar to approaches used in weather and flooding. However, effective tools for generating precise sub-national forecasts remained limited.