PSYCHOLOGICAL ASSESSMENT

The development and psychometric validation of the Food and Alcohol Disturbance Expectancy Questionnaire (FAD-EQ) in three independent college student samples
Berry KA, De Young KP, Looby A and
Food and alcohol disturbance (FAD; i.e., use of compensatory behaviors to offset alcohol-related calories and/or to enhance the effects of alcohol) is prevalent among college students and associated with negative consequences. Expectancy effects may play a critical role in understanding the phenomenology and trajectory of FAD, as research from the alcohol and disordered eating literature suggest expectancies are uniquely linked to engagement in these behaviors and their related outcomes. However, little is known about FAD-specific expectancies. This study aimed to develop and psychometrically validate the FAD Expectancy Questionnaire (FAD-EQ). Using three independent multisite samples of U.S. college students (Sample 1: = 2,594; Sample 2: = 1,693; Sample 3: = 3,824), we conducted exploratory and confirmatory factor analyses, tested measurement invariance across sex assigned at birth, evaluated the measure's construct validity, and examined whether FAD expectancy profiles differed by type of past-month FAD engagement. Results supported a two-factor structure (positive and negative FAD expectancies) for the 30-item FAD-EQ, which demonstrated acceptable internal consistency, measurement invariance across sex assigned at birth, and preliminary evidence of convergent and discriminant validity. Similar to results from the alcohol expectancy literature, students engaging in FAD for alcohol enhancement and caloric compensation purposes endorsed the strongest positive and weakest negative expectancies, while students who denied past-month FAD endorsed the weakest positive and strongest negative expectancies. Findings from this study offer a robust tool for assessing FAD expectancies and provide several avenues for future research, intervention, and prevention efforts. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
How often is "often"? Improving assessment of the externalizing spectrum using absolute frequency
Petersen IT, Demko Z, Doebler P, Sabel L, Oleson JJ and Krueger RF
Nearly all questionnaires of externalizing problems use vague quantifiers of relative frequency (e.g., rarely/sometimes/often) or true/false statements. Vague quantifiers have many problems, including imprecision and low interpretability. An alternative is numeric quantifiers that quantify, in absolute frequency, how many times the person engaged in the behavior during a given time frame. This study evaluates whether absolute frequency provides utility for assessing the externalizing spectrum. Participants included adults recruited online and college students, for a combined sample of 1,237 adults (290 males; 947 females) spanning 18-92 years of age. A subset of items was adapted from the Externalizing Spectrum Inventory to assess absolute frequency, supplemented with additional items to ensure broad coverage. Using a 30-day reference period, participants indicated how many times they engaged in each behavior per day, per week, in the past month, or in the prior year. Externalizing problems showed age-related decreases from early to later adulthood. On average, men showed greater externalizing problems than women in early and older adulthood; women showed greater externalizing problems than men in middle adulthood. Latent scores derived from absolute frequency items demonstrated convergent validity with a widely used measure of externalizing problems (Adult Self-Report), discriminant validity with respect to internalizing problems, and criterion and incremental validity in relation to functional impairment and inhibitory control. Count data led to greater precision-less uncertainty in the estimate of each person's level of externalizing problems-than dichotomized versions of the items. Findings suggest there is key utility in assessing absolute frequency of externalizing behavior. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Interpreting the Hospital Anxiety and Depression Scale (HADS) for individuals with traumatic brain injury: Clinical correlates and empirical severity cutoffs
Fox T, Samiotis A, Ponsford J, Spitz G and Carmichael J
Anxiety and depression symptoms are more prevalent in individuals with moderate-severe traumatic brain injury (TBI) than in the general population, underscoring the need for effective screening. The Hospital Anxiety and Depression Scale (HADS) is widely used for this purpose, yet how best to use and interpret the HADS in individuals with TBI remains an open question. This study evaluated the HADS total score, subscale scores, and individual items in 402 individuals with moderate-severe TBI. The sample was on average 13 years postinjury, originally recruited during inpatient rehabilitation. HADS scores were regressed against five clinically important variables to determine their concurrent criterion validity, and regression tree analyses were used to establish severity cutoffs for the total and subscale scores. Results indicated that while the HADS subscale scores provided no meaningful advantage over the total score in accounting for variance in suicidal ideation, self-harm, or mental health treatment, the subscales-particularly depression subscale items-were more informative with respect to functional disability and life satisfaction. Benchmarked against these clinically important variables, we established graded severity cutoffs ("normal," "mild," "moderate," and "severe") for both the HADS total and subscale scores. This study provides clinicians and researchers with empirically derived guidance for using and interpreting the HADS when assessing emotional distress in individuals with moderate-severe TBI. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
The importance of specifying the time period in repeated measures of personality assessments
Southward MW, Kushner ML, Stumpp NE, Cecil SE, Maynard CJ, Barnhill AK, Buchenberger VJ and Sauer-Zavala S
Because most validated personality measures were designed to capture relatively general and stable characteristics, they do not specify a particular timeframe for respondents to consider. It is thus unknown how these measures perform when administered repeatedly or how this performance compares to the same measures with instructions and items adapted to the repeated timeframe of interest. We randomly assigned undergraduate participants ( = 257; = 20.4; 79% female; 77% White; 77% heterosexual) to complete measures of personality (NEO-Five Factor Inventory-3, Level of Personality Functioning Scale-Brief Form-2.0, Personality Inventory for DSM-5-Brief Form, Five Factor Borderline Inventory-Short Form) with validated general instructions and items or measures with instructions and items pertaining to the previous week once per week for 6 weeks. Compared to measures with general instructions, measures with weekly instructions demonstrated greater within-person internal consistency (weekly ωs: .42-.83; general ωs: .44-.72), lower rank-order stability (weekly average 1 week = .72; general average 1 week = .86), greater variability (: .08-.94), lower average mean scores across time (: -.96 to .25), and stronger associations with measures of anxiety and depression, well-being, and functioning but similar between-person internal consistencies (weekly ωs: .79-.99; general ωs: .79-.99) and measurement invariance. Researchers assessing personality weekly may thus be able to capture more variability and stronger associations with relevant constructs while still maintaining reliable individual differences and construct validity using personality measures referencing participants' past week. However, nuances such as lower average scores when referencing the past week should be kept in mind when comparing results between studies using different reference timeframes. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Development of a brief adjusting alcohol purchase task as a measure of behavioral economic demand for alcohol in two samples of heavy-drinking young adults
Coelho SG, Belisario KL, Keough MT, Amlung MT, Murphy JG and MacKillop J
Alcohol demand, reflecting the relative reinforcing value of alcohol, is robustly associated with alcohol use and problems. Alcohol demand is typically assessed using alcohol purchase tasks (APTs) in which participants estimate alcohol consumption at escalating prices, with the resulting demand curves yielding multiple indices of value. Although generally efficient, full-length APTs pose some burden and existing short forms cannot produce individual demand curves or derived indices. Thus, we developed a brief adjusting APT using an algorithm to identify 5-6 items from a full-length APT based on an individual's level of alcohol demand. Two independent samples of heavy-drinking young adults (Sample 1: = 725, = 21.43 years; Sample 2: = 588, = 22.64 years) completed an assessment that included a full-length APT and measures of alcohol use and problems. Using a binary-search-style algorithm, brief APT responses were extracted from the full-length APT. In each sample, individual demand curves from the brief APT fit the data well. Observed (intensity, , breakpoint) and derived (elasticity) demand indices robustly corresponded with the full-length APT, including similar mean estimates and high correlations ( ≥ 0.79) between corresponding indices from the brief and full-length APTs. Demand indices from the brief APT were associated with alcohol use and problems, and associations of corresponding indices from the brief and full-length APTs with alcohol use outcomes were of equivalent magnitude. These findings provide initial support for a brief adjusting APT as a measure of alcohol demand. Future research should further evaluate this measure, including stand-alone administration and detection of within-person changes in alcohol demand. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Psychometric properties and validity of a Mobile Patient Health Questionnaire-9 (MPHQ-9) for ecological momentary assessment in depressed adults
Haddox D, Mackin DM, Griffin TZ, Heinz MV, Nemesure MD, Collins AC, Price GD, Lekkas D, Pillai A, Nepal S, Campbell AT and Jacobson NC
Ecological momentary assessment is well-suited for capturing rapid symptom dynamics, and it is increasingly used to measure depression symptoms. However, few depression measures are validated for ecological momentary assessment use in the manner expected for traditional questionnaires. Therefore, this study examined the internal consistency, longitudinal stability, and convergent validity of the Mobile Patient Health Questionnaire-9 (MPHQ-9), a version of the Patient Health Questionnaire-9 (PHQ-9) modified for ecological momentary assessment. Depressed participants ( = 280; female = 83.93%; White = 79.29%) completed the MPHQ-9 three times daily for 90 days. Data from the first and last 2 weeks were analyzed to align with a prestudy PHQ-9 and poststudy PHQ-9 and Inventory of Depression and Anxiety Symptoms-II. The MPHQ-9 demonstrated fair to substantial adjusted item-total correlations ( = .42-.83), often exceeding the PHQ-9 ( = .39-.72), with Cronbach's α coefficients of .91 and .81, respectively. Reliability analyses of the MPHQ-9 using generalizability theory and multilevel modeling to account for repeated measures yielded substantial between-person reliability (∼1.0) but mixed within-person reliability estimates of .81 (generalizability theory) and .44 (multilevel modeling). The MPHQ-9 showed moderate stability ( = .69, intraclass correlation coefficient = .58), compared to the slight stability of the PHQ-9 ( = .39, intraclass correlation coefficient = .37). There was moderate agreement between the MPHQ-9 and both the PHQ-9 = .71) and the Inventory of Depression and Anxiety Symptoms-II General Depression subscale ( = .65). Supplementary analyses identified short forms with similar convergent validity but reduced symptom-level information. This study provides initial validation of the MPHQ-9 and compares its psychometric properties to the traditional PHQ-9. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
The Dissociative Symptoms Scale (DSS): Psychometric properties of scores on a German version in clinical samples
Heekerens JB, Biermann M, Mocarz-Kleindienst M, Vonderlin R, Lyssenko L, Hofmann VXC, Carlson EB, Enning F, Schmahl C and Kleindienst N
Dissociation is a widespread phenomenon with significant mental health implications. The 20-item Dissociative Symptoms Scale (Carlson et al., 2018) was developed to measure moderately severe levels of dissociation across a broad range of clinical populations. The factor structure of the Dissociative Symptoms Scale comprises four domains: In this article, we present a German version of the Dissociative Symptoms Scale (G-DSS) and examine the psychometric properties of scores on the G-DSS. Across two studies ( = 257) involving clinical samples primarily composed of individuals with depressive disorders, borderline personality disorder, and posttraumatic stress disorder, we demonstrate that G-DSS scores mostly align with the expected four-factor structure. In addition, G-DSS scores demonstrated adequate internal consistency (ω ≥ .76 for subscales), strong convergent validity with large correlations to scores on other dissociation measures, good discriminant validity with small or nonsignificant correlations to scores on personality facets, and good concurrent validity with positive correlations to scores on psychopathology indicators. We conclude that scores on the G-DSS are reliable and valid for assessing dissociative symptoms in clinical populations similar to the samples studied, which can enhance our understanding of dissociation's structure and clinical implications. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
How do participant preferences, expectancies, and perceptions of ecological momentary assessment impact adherence? A mixed-methods analysis
Piccirillo ML, Rapp EB, Frohe T, Reichman M, Joban M, Lee S, Tingman R, Okesanya A, Spink KM, Volpe L, Walukevich-Dienst K and Foster KT
Clinical researchers using ecological momentary assessment (EMA) methods design study protocols to optimize adherence. These decisions can sacrifice the scope, temporal granularity, and observational power of resultant data and are often based on limited empirical evidence. The present mixed-methods study queried participant preferences on EMA design and expectancies regarding a hypothetical EMA protocol (Sample 1, N = 1,495). We used a concurrent triangulation approach to analyze quantitative survey and qualitative interview data assessing implementation outcomes from a subset of Sample 1 individuals (a majority of whom reported clinical levels of anxiety/stress or frequent alcohol use) who enrolled and completed at least 2 weeks EMA (Sample 2; n = 59). Participants completed three EMA surveys for up to 112 days (M = 76.8 days, SD = 37.88) with an average within-person compliance of 73.8% (SD = 17.18). Descriptive statistics and a hybrid inductive-deductive coding approach were used to analyze quantitative and qualitative data, respectively, to understand factors that influence adherence. Participants perceived daily variability in most health-related domains (e.g., mental health symptoms, rest) and frequently reported positive expectations for EMA (e.g., anticipated increased awareness). Most participants reported that EMA helped increase awareness of their daily patterns (n = 37, 62.7%) but that study protocols were long and burdensome (n = 44, 74.6%). Qualitative themes were related to deductive implementation outcomes with significant inductive nuances that varied by level of adherence. Results help to guide EMA protocol decisions to improve adherence based on participant preferences, expectations, and experiences during EMA. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Exploring research-participant perceptions of experience sampling studies on self-harm
Morgan RM, Ilagan GS, Conway CC and Soroko E
Experience sampling research is making important contributions to our knowledge base on self-injurious thoughts and behaviors (SITBs). However, there are unresolved questions about how people with SITB histories respond to participating in these studies, which can be demanding, relative to other research designs. The present study explored how people engage with experience sampling methods (ESMs) that target SITB outcomes. Combining data from two studies of community-dwelling adults with a history of significant SITBs, we performed a mixed methods investigation of people's reactions to ESMs in a sample of 109 people (57.8% female; 26.6% trans/gender-expansive; 33.0% heterosexual; 53.2% White; 16.5% Latinx) who described (via closed- and open-ended questions) their perceptions of the research process. Our quantitative analyses replicated prior work in finding that most people rated the study experience as satisfactory, feasible, and not unduly stressful. Also, there was little evidence of iatrogenic effects (i.e., intensifying SITBs across the repeated assessments). Our qualitative investigation, based in reflexive thematic analysis, yielded themes related to self-awareness, confronting previously guarded internal experiences, emotion regulation, biographical or "self"-related change, and inconvenience. Guided by these results, we discuss potential benefits and harms of ESM study participation that we believe should be considered by researchers to promote ethical research practices and a more valid evidence base in the SITB literature. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
A psychometric examination of computerized adaptive measures of posttraumatic stress disorder among military veterans
Lee DJ, Crowe ML, Weathers FW, Bovin MJ, Acierno R, Arenson MB, Collazo EN, Hart S, Hughes CH, Naganuma-Carreras J, Patrick GC, Pompei MC, Sager JC, Wood AE, Schnurr PP and Marx BP
Brenner et al. (2021) developed and initially tested two computerized adaptive measures of posttraumatic stress disorder (PTSD), one that provides a provisional diagnosis (PTSD computerized adaptive diagnostic screen [CAD-PTSD]) and one that estimates symptom severity (PTSD computerized adaptive severity test [CAT-PTSD]). We expanded on the initial psychometric findings by collecting data regarding test-retest reliability, incremental validity, and respondent burden. A sample of veterans ( = 156, 32% women, 36% Black, 13% identified as Spanish, Hispanic, Latino, Puerto Rican, or Cuban) recruited from three Veterans Affairs medical centers completed the CAT-PTSD and CAD-PTSD, the Clinician-Administered PTSD Scale for , the PTSD Checklist for (PCL-5), Primary Care PTSD Screen for (PC-PTSD-5), and a battery of other self-rated scales. Fifty-three participants (34%) completed the measures a second time within 7 days ( = 5.41; = 1.97) of their first visit. CAT-PTSD scores revealed good convergent validity ( = .78 with Clinician-Administered PTSD Scale for total score), discriminant validity, and test-retest reliability ( = .80). Scores on the PCL-5 and PC-PTSD-5 had similar characteristics. The CAD-PTSD demonstrated poor diagnostic efficiency, κ(.5, 0) = .40, and test-retest reliability (κ = .25), whereas previously established cut-scores for the PCL-5 and PC-PTSD-5 showed fair to good diagnostic utility and adequate to good test-retest reliability. Results suggest that the CAT-PTSD may provide a valid indicator of PTSD symptom severity, but does not offer incremental value beyond the PCL-5 and PC-PTSD-5. The CAD-PTSD was markedly inferior to the use of PCL-5 or PC-PTSD-5 cut scores. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Development and validation of the visual reasoning test for the NIH Toolbox
Slotkin J, Tyner CE, Ho EH, Kaat AJ, Dworak EM, Laforte E, Ma M, Han YC, Aytürk E, Zhang M, Tulsky DS and Gershon R
The Cognition Battery of the NIH Toolbox (NIHTB) is an iPad-based set of brief assessments with documented test score reliability and validity, covering a range of cognitive functions for ages 3-85+. The construct of visual reasoning (VR) was identified as a measurement gap in previous versions of the NIHTB by a contingent of NIH-sponsored researchers. The VR Test was developed de novo to address this gap across the entire age spectrum. Given the intended brevity of NIHTB measures, VR was designed for computer-adaptive test administration. VR items were calibrated on 3,768 participants from the NIHTB Version 3 norming study. To evaluate convergent validity of the derived test score, a subsample of 283 participants was administered the age-appropriate Wechsler Intelligence Scale, including the commonly used Matrix Reasoning subtest to assess VR. A separate subsample of 190 individuals was readministered VR 1-14 days later to evaluate test-retest reliability of VR scores. Results yielded a robust bank of 180 items measuring a wide range on the ability spectrum, with correlations of .45-.55 with Wechsler Matrix Reasoning tests and .48-.65 with Wechsler Full-Scale IQ scores. Test-retest intraclass correlation coefficient for VR scores was .77. The new NIHTB VR Test is a computer-adaptive test that can be used to assess VR from preschool to older adulthood, showing evidence of convergent validity of test scores with those of similar constructs, and test-retest reliability of scores, as well as an overall strong relationship to general cognitive ability. This measure broadens the scope of the NIHTB. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
The importance of examining measurement properties in ecological momentary assessment research: An illustrative example in suicide research
Rogers ML, Robison M, Lawrence OC, Mattera EF and Mitaj D
Ecological momentary assessment is a burgeoning methodology within suicide research, allowing for investigations of dynamics in suicidality and its risk factors in naturalistic settings. Less attention, however, has been paid to carefully examining whether measurement invariance is present across participants and prompts. This study examined the measurement properties of items assessing affective states in 237 adults with severe suicidal ideation (Mage = 27.12, 61.2% cisgender women, 86.9% White, 38.4% bisexual/pansexual). Utilizing latent Markov factor analysis, four questions were addressed: (a) how many measurement models (states) underlie the data? (b) how do these states differ? (c) were momentary suicidal intent and lifetime suicide attempts predictors of transitions between states? and (d) for whom does invariance hold? Results indicated that the best-fitting model had three states (R2entropy = .93). State 1 (Demoralization, 59% of observations) consisted of one factor characterized by high distress and lower arousal. State 2 (Agitated Arousal, 28%) consisted of one factor characterized by high distress and arousal. State 3 (Content, 13%) consisted of one factor characterized by low distress and absent arousal. Although these states were very stable across observations, momentary suicidal intent and suicide attempt history predicted transitions. Only a minority of participants (11.1%) remained in the same state throughout the entire study; thus, within-person and between-person invariance were insufficient. These findings underscore the importance of meticulous inspection of measurement properties when conducting intensive longitudinal research, as observations cannot be validly compared if measurement invariance is not met. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Exploring the predictive value of different affect dynamics for psychological treatment outcome
Hehlmann MI, Siepe B, Mink F, Peña Loray JS, Gomez Penedo JM, Fritz J, Moggia D, Rubel JA and Lutz W
Ecological momentary assessment is increasingly used to assess affect dynamics for predicting treatment outcomes. While multiple methodological approaches exist, their added predictive value remains unclear, with previous results suggesting a low predictive value beyond mean level (M), standard deviation (SD), or initial impairment. To this end, this study aimed to evaluate the predictive value of different affect dynamic measures (ADMs) for psychological treatment outcomes using a naturalistic data set. A total of 140 White outpatients (63.6% female, 33.6% male) from the psychotherapy clinic of the Trier University reported positive affect and negative affect four times daily over 2 weeks. We applied continuous-time dynamic modeling, drop and recovery rates, time-varying change point analyses, control charts, and multilevel modeling to each patient's time series. Interdependencies among indicators were analyzed using principal component analysis. Each ADM's performance for predicting treatment outcomes beyond initial impairment was assessed via R2. The model with the optimal predictor combination (using elastic net regularization) was compared in predictive performance to a model including M and SD of positive and negative affect. Significant interdependencies were found among ADMs. The predictor selection identified the cross-effect of positive affect on negative affect (from the continuous-time dynamic modeling) as the best predictor, while the model including M and SD of positive and negative affect had the greatest predictive power. In this naturalistic clinical sample, complex ADMs offered limited additional predictive value beyond M or SD for treatment outcome. Considerations and R scripts for each ADM are provided. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Self- and informant-based comparison of Dark Triad scales in German adults
Bromme L and Wetzel E
Self-report measures are the standard for assessing Dark Triad personality traits-narcissism, Machiavellianism, and psychopathy. When informant-report measures are used, they are usually created as ad hoc adaptations of the self-report scales, lacking transparent documentation and evidence of validity. To fill this gap, we systematically created other-report adaptations of established Dark Triad personality measures-the the scale, the the and scale-and compared their self-other invariance, self-other correlations, and criterion validity. Self-report versions were administered to 402 individuals (75% women, = 28 years, highly educated, predominantly white). Informant-report versions were administered to close informants nominated by the targets ( = 335). We found (a) that self-report and informant measures yielded the same latent structure, but that the strict invariance assumption was violated to varying degrees by the different scales. (b) All scales yielded large self-other correlations with no significant differences between scales. (c) The informant-report measures were about as strongly associated with preregistered outcome variables as the self-report measures, with only minor differences between scales. In conclusion, all scales are suitable for the informant-report assessment of the Dark Triad, but some scales do not allow valid comparisons of self- and other-report mean scores. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
The MMSE can yield biased and imprecise estimates of change: A novel IRT analysis of latent change scores from the A4 clinical trial
Thomas ML, Edland SD and Duehring J
The measurement precision of change scores has previously been investigated from the perspective of classical test theory. However, the measurement precision of change scores has not been thoroughly explored from an item response theory (IRT) perspective. In this study, we provide, to our knowledge, one of the first direct investigations of change score precision within an IRT framework. Specifically, using archival data from the antiamyloid treatment in asymptomatic Alzheimer's trial, we examined standard error of estimate for change scores on the mini-mental state examination, one component of the preclinical Alzheimer cognitive composite used to measure change between intervention arms. Multidimensional two-parameter IRT models were fitted to the mini-mental state examination item data with one latent dimension reflecting baseline ability and a second reflecting change in ability over time (i.e., latent change scores). Results showed that standard error depended on change magnitude and that change scores were expected to be biased toward zero when baseline performance scores were near ceiling. The results demonstrate why measures with pronounced ceiling effects should not be used to assess change in clinical trials or other longitudinal studies, and should be used cautiously in clinical settings. This study also demonstrates how IRT can be used to evaluate change score precision. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Applying transformer models to psychological time-series data: A step-by-step tutorial with an empirical illustration of depression trajectories
Zahn L, Nedderhoff A, Hecht M and Zitzmann S
Transformer models have emerged as powerful tools for analyzing time-series data, yet their application in clinical psychology remains underexplored. With the increasing availability of high-frequency psychological data, these models offer new opportunities for time-series analysis, such as detecting early warning signs of relapse, modeling symptom dynamics, and personalizing treatment strategies. This article provides a gentle introduction to transformer models, guiding researchers and clinicians through their theoretical foundations and practical implementation. Using a step-by-step illustrative work through, we demonstrate their potential for capturing complex patterns and long-term dependencies. An empirical example focusing on depression trajectories illustrates their application in psychological research. All analysis code is provided as a documented compressed archive in the journal's Supplemental Material and mirrored on the Open Science Framework (https://osf.io/mj8nh/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Assessing the internal consistency reliability of ecological momentary assessment measures: Insights from the WARN-D study
Castro-Alvarez S, Zhou DJ, Bringmann LF, Tutunji R, Proppert RKK, Rieble CL, Fried EI and Liu S
Intensive longitudinal research has become increasingly popular in the social and clinical sciences in recent years. However, this popularity has brought about many challenges for both methodological and empirical researchers, including challenges regarding measurement. In this preregistered study, we are particularly interested in the assessment of the reliability when multiple items are used to measure the same construct in intensive longitudinal data. This is important because reliability estimates are necessary (albeit not sufficient) to evaluate the quality of measures. Here, we evaluate the internal consistency reliability of scales used during Stage 2 of the WARN-D study, a 3-month period of daily and weekly measurements. The WARN-D study is a prospective 2-year study of approximately 1,750 students conducted in the Netherlands, aiming at building an early warning system for depression. Stage 2 includes 3 months of data on positive and negative affect measured four times a day and depression and anxiety measured once a week. To assess the reliability of each scale, we use six different statistical approaches including three simpler approaches that estimate the reliability at the between-person and within-person levels and three idiographic approaches that estimate person-specific reliability coefficients. This article also serves as a tutorial guide for substantive researchers, providing annotated code to facilitate estimating and reporting the reliability of ecological momentary assessment measures. We encourage all researchers to report the reliability of their data when applying the introduced statistical approaches, contributing to a collaborative effort toward developing more reliable measures in psychological and behavioral science. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Longitudinal measurement invariance of the Personality Inventory for ICD-11 across Black and White American older adults
Heragu P, Dieujuste N, Mekawi Y and Oltmanns JR
The Personality Inventory for (PiCD) assesses five maladaptive trait domains from the -11th edition's dimensional model of personality disorder. Validity evidence of PiCD scores has relied primarily on White samples and there have been no evaluations of measurement invariance (MI). Research examining use of PiCD scores with diverse populations is needed. The present study investigated MI of PiCD scores across race and time in sample of White and Black American older adults ( = 843, ∼20% Black). Cross-sectionally, Marsh et al.'s (2009) 13-step exploratory structural equation modeling was used to determine MI of the five domains across Black and White participants at two waves of data collection about 2 years apart. Findings revealed partial strong invariance across race at both waves. At Wave 1, intercepts for two Anankastia items and two negative affectivity items (only one negative affectivity item at Wave 2) were noninvariant across race. Longitudinal exploratory structural equation modeling suggested strict invariance across time for the entire sample. Domain-level longitudinal confirmatory factor analysis indicated strict invariance across time for Black participants in each PiCD domain. Findings suggest four item means demonstrated noninvariance and require further examination, but the PiCD scores showed a high level of invariance (factor structure, factor loadings, 56 of 60 item intercepts). Reasons for the four noninvariant item intercepts are probed by examining scale score differences with and without the items and external correlates. Findings indicate partial strong invariance for PiCD scores, but the four item mean scores need further exploration across race, and potential revision. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Utilizing qualitative methods to detect validity issues in clinical experience sampling methodology (ESM)
Schorrlepp L, Stadel M, Bringmann LF, Hesselink M and Maciejewski D
Experience sampling methodology (ESM) are used increasingly in clinical research and practice, promising unique insights into people's daily lives and more accurate, ecologically valid clinical assessments. However, there are rising concerns about the validity of ESM studies due to various measurement challenges, including differences and changes in participants' concept and scale interpretation (e.g., whether a 4 on a 7-point scale means the same for two individuals), and their interpretation of the study as a whole. Currently, the ESM literature mainly focuses on quantitative solutions. In this article, we highlight the contribution of qualitative methods to not only detecting the occurrence but also the content of validity issues. We describe how to implement validity checks for ESM studies using focus groups, open-ended items, as well as cognitive and semistructured interviews. Although these methods are already used in other fields, we present a translation to ESM research and describe implementations suitable for different research stages, from ESM material development to study follow-up. To illustrate the usefulness of these qualitative validity checks, we provide concrete examples from the clinical ESM literature and our own mixed-methods studies. Thereby we hope to encourage clinical researchers and practitioners interested in the implementation of ESM to reflect on how validity issues may impact the conclusions drawn from their collected data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Development and validation of the Responsible Drinking Inventory
Gray HM, McCullock SP, Slabczynski JM, Siu AM and LaPlante DA
Responsible drinking is a common term used by a variety of stakeholders. Although many people and organizations discuss responsible drinking, its meaning remains unclear. Researchers have begun to scrutinize the concept, critically questioning its utility, definition, and distinction from other alcohol-related constructs; however, its measurement has remained limited. Accordingly, we present a series of studies describing the development of the Responsible Drinking Inventory (RDI), a new 18-item self-administered measure of responsible drinking beliefs and behaviors. We report upon the creation and the psychometric properties of the RDI across six primary studies. Examinations of the RDI indicated appropriate reliability and validity, including convergent and divergent validity, as well as known groups and predictive validity. The RDI appears to provide information that is consistent with alcohol safety-oriented measures, such as the Protective Behavioral Strategies Scale, and distinct from alcohol harm measures, such as the Alcohol Dependence Scale. The RDI predicts acute consequences of drinking behavior 3 months in the future. This new measure provides unique insights into the nature of responsible drinking and a concise, yet comprehensive way to assess this concept. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Reactivity to experience sampling among adolescents with and without a lifetime or current history of self-harm thoughts or behaviors
Kirtley OJ, Sohier B, Šimsa B, Achterhof R, Myin-Germeys I and Lafit G
Some ethical committees question whether individuals with a history of self-harm thoughts and behaviors (SHTBs) are "too vulnerable" to take part in experience sampling method (ESM) research and if repeatedly asking about SHTBs could be harmful. Past research has focused on whether participating in ESM research influences SHTBs and has overlooked participants' general experience of ESM studies. We explored the relationships between ESM beep disturbance (disruptiveness) and compliance as well as lifetime and current SHTBs. N = 1507 participants completed baseline questionnaires, including about lifetime history of SHTBs, and N = 1788 completed ESM 10 times per day for 6 days, including questions about how much completing the ESM questionnaire disturbed them and their SHTBs in daily life. There were no significant differences in disturbance or compliance between individuals with no lifetime history of SHTBs, self-harm thoughts, or self-harm behaviors. Individuals reporting self-harm behaviors during the ESM period were more likely to experience the ESM questionnaires as more disturbing. Individuals experienced the ESM questionnaires as more disturbing when more intense self-harm thoughts were reported during the ESM period on average and when their current self-harm thoughts were more intense. Our results indicate that lifetime history of SHTBs does not relate to ESM compliance or beep disturbance. However, ESM may be more taxing for individuals experiencing more intense current SHTBs and at moments when their self-harm thoughts are more intense. We suggest that a "static vulnerability" approach to the ethical evaluation of ESM research based on lifetime history of SHTBs is inappropriate and that a dynamic approach is preferable. (PsycInfo Database Record (c) 2025 APA, all rights reserved).