Behavior Research Methods

Triggering just-in-time adaptive interventions based on real-time detection of daily-life stress: Methodological development and longitudinal multicenter evaluation
Bögemann SA, Krause F, van Kraaij A, Marciniak MA, van Leeuwen JM, Weermeijer J, Mituniewicz J, Puhlmann LMC, Zerban M, Reppmann ZC, Kobylińska D, Yuen KSL, Kleim B, Walter H, Myin-Germeys I, Kalisch R, Veer IM, Roelofs K and Hermans EJ
Stress-related disorders present a significant global burden, highlighting the need for effective, preventive measures. Mobile just-in-time adaptive interventions (JITAI) can be applied in real time and context-specifically, precisely when individuals need them most. Yet, they are rarely applied in stress research. This study introduces a novel approach by performing real-time analysis of both psychological and physiological data to trigger interventions during moments of high stress. We evaluated the feasibility of this JITAI algorithm, which integrates ecological momentary assessments (EMA) and ecological physiological assessments (EPA) to generate a stress score that triggers interventions in real time by relating the score to a personalized stress threshold. The feasibility of the technical implementation, participant adherence, and user experience were assessed within a multicenter study with 215 participants conducted across five research sites. The JITAI algorithm successfully processed EMA and EPA data to trigger real-time interventions. A total of 68% (standard deviation [SD] = 29%) of EMA beeps contained extracted EPA features, demonstrating technical feasibility. The algorithm triggered 1.61 (SD = 1.26) interventions per day, with 43% (SD = 27%) of EMA beeps per week leading to triggered interventions. Compliance rates of 43% (SD = 22%) for EMA and 43% (SD = 30%) for the JITAI were achieved, with feedback indicating areas for improvement, particularly for daily-life integration. Our findings provide preliminary support for the feasibility of the developed JITAI algorithm, demonstrating effective data processing and intervention triggering in real time, while also highlighting areas for improvement. Future research should focus on minimizing participant burden, including the intensity of EMA protocols, to improve participant adherence and acceptability while maintaining the benefits of real-time intervention delivery.
A method for measuring closed-loop latency in gaze-contingent rendering without extra equipment
Anderson MD, Cooper EA and Otero-Millan J
In gaze-contingent rendering, the visual stimulus rendered on a display changes based on where the observer is looking. This technique allows researchers to achieve dynamic control over stimulus placement on the retina in the presence of eye movements and is often used to investigate how sensory processing and perception vary across the visual field. Precise stimulus placement using gaze-contingent rendering depends on minimizing the temporal latency between a change in the observer's gaze position, measured using an eye tracker, and the corresponding change to the stimulus. This latency, however, can be challenging to measure reliably. Here, we present a simple method for measuring system latency that requires no additional hardware beyond the eye tracker and display, which are already part of the gaze-contingent system. Two small circles are rendered on the display to simulate the appearance of two pupils. The eye tracker is pointed towards the display to record both pupils simultaneously. One pupil is drawn based on a pre-determined trajectory, for example, moving up and down at a constant speed. The second pupil is "gaze-contingent": it is drawn based on the measured position of the first pupil. The time-lag at which the position of the second pupil matches the first pupil gives the closed-loop latency of the entire system. To validate this method, we added artificial rendering delays to our system and produced measured latencies that precisely corresponded to predictions, given the refresh rate of the display. This method provides a simple, low-cost way of precisely quantifying gaze-contingent rendering latencies, with no additional hardware required.
Measurement of age-of-acquisition in morphologically rich languages: Insights from Kannada and Filipino
Dulay KM, Mirković J, Fua MMRC, Prabhu D and Nag S
In this study, we present age-of-acquisition (AoA) ratings for 885 Kannada and Filipino words as a new resource for research and education purposes. Beyond this, we consider the methodological and theoretical considerations of measuring AoA in morphologically rich, specifically agglutinative, languages, to study child language acquisition. Parents, teachers, and experts provided subjective ratings of when they thought a child acquired each word. Results were generally consistent between the two languages. Mixed-effects models demonstrated that word characteristics, including parts-of-speech category, word length, and age band of first occurrence in a print corpus, were significantly related to AoA ratings, whereas rater characteristics, including participant type, age, gender, and number of languages spoken, had generally non-significant associations with AoA ratings. The number of morphemes was significantly associated with AoA ratings in some analyses; however, crosslinguistic differences in the directionality of the relationships suggested the need to investigate underlying drivers of morphological complexity such as morpheme frequency, transparency/consistency, and function. The age-of-acquisition ratings were internally reliable and demonstrated consistency with the first occurrences of words in print and known trends in child language research. The results demonstrate the potential of these resources and open new directions for AoA research in morphologically rich languages.
The Subliminal Threshold Estimation Procedure (STEP): A calibration method tailored for estimating subliminal thresholds
Elbaz E, Yaron I and Mudrik L
A major challenge in studying unconscious processing is to effectively suppress the critical stimulus while allowing maximal signal strength for adequate sensitivity to detect an effect, if it exists. A possible way to do this is to calibrate stimulus strength. While calibrating stimulus strength is common in psychophysics, current calibration methods are not designed to find the maximal intensity in which the stimulus can still be rendered unconscious (i.e., find the upper subliminal threshold for each participant). Here, we demonstrate how calibration can be utilized to estimate, for each observer, this targeted threshold. We present a novel calibration procedure: the Subliminal Threshold Estimation Procedure (STEP), specifically designed for estimating the upper subliminal threshold for each individual. Using simulations, we showed that STEP outperforms existing calibration methods, which yielded strikingly low accuracy. We then further validated STEP using three empirical experiments. Together, these results establish STEP as highly beneficial for the study of unconscious processing.
Publisher Correction: Chinese Onomatopoeia Database (COD): Concreteness, imageability, context availability, age of acquisition, familiarity, semantic transparency, emotional valence, and emotional arousal for Chinese onomatopoeic words
Zhao Y, Wang H, Tse CS and Chen Q
A method for setting the melanopsin and rhodopsin content in commercial LED sources to investigate the effects of ambient light on behavior
Nugent TW and Zele AJ
Lighting is routinely specified only by its impact on the three cone photoreceptors via the correlated color temperature (CCT), ignoring the visual and non-visual contributions of the melanopsin photoreceptors. Disentangling the behavioral effects of the CCT from those of the melanopsin excitation is complex but necessary to understand melanopsin's effects and to inform the design of new lighting spectra for the built environment. Melanopsin photoreception is important for driving many visual and non-visual functions in humans, including circadian rhythms, mood, attention, and arousal. Here, we introduce a methodology using a widely available LED source (Philips Hue Play, Signify N.V.) to decouple the effects of melanopsin from those of cone photoreceptors. We present a computational algorithm for producing two ambient illuminations with different melanopsin and rhodopsin activation levels, whilst maintaining the same cone excitations, CCT and visual appearance (i.e., the two lighting conditions are cone metamers); this simple and inexpensive method removes the major confounding factor present in approaches that alter the melanopsin excitation of a light by exchanging the wavelength, color, or CCT. The method may find applications in behavioral experiments, including for clinical trials.
Novel method for rubber hand illusion strength measurement based on inverse multidimensional scaling
Litwin P, Kubik K and Longo MR
In the present study, we developed a novel self-report measurement method for the rubber hand illusion (RHI) strength based on inverse multidimensional scaling (MDS). In the preregistered study consisting of two experiments, participants experienced the RHI in synchronous and asynchronous conditions (Experiment 1) as well as the RHI and the arm immobilization imaginative suggestion (Experiment 2). In each condition, participants repeatedly arranged items related to distinct bodily-related experiences (including RHI or suggestion) in accordance with the perceived similarity between them. Proximity data obtained from the arrangements were represented as distances in the multidimensional bodily space. To measure RHI strength, we focused on distances between items representing experimental conditions and two baseline items representing cases of no ownership over an external object and normal bodily feelings. We found that the distance between the rubber hand and an external object was significantly larger in the synchronous than the asynchronous condition, and larger than the distance between the immobilized arm and the normal body, demonstrating stronger shifts in ownership for synchronous RHI. In general, the RHI was associated with moderate ownership and low perceived stimulation, and it clustered with experiences related to a high degree of ownership. MDS-based solutions for the bodily space were consistent within participants and across different experimental conditions. We believe that this method can complement traditional questionnaire-based measurement, offering additional opportunities for a comprehensive self-assessment of RHI strength.
The effect of pupil size on data quality in head-mounted eye trackers
Salari M, Niehorster DC, Nyström M and Bednarik R
Changes in pupil size can lead to apparent gaze shifts in data recorded with video-based eye trackers in the absence of physical eye rotation. This is known as the pupil-size artifact (PSA). While the PSA is widely reported in desktop eye trackers, it is unknown whether and to what extent it occurs in head-mounted eye trackers. In this paper, we examined the effects of pupil size variations on eye-tracking data quality in four head-mounted eye trackers: the Pupil Core, the Pupil Neon, the SMI ETG 2w, and the Tobii Pro Glasses 2, in addition to a widely used desktop eye tracker, the SR Research EyeLink 1000 Plus. Participants viewed a central target on a monitor while we systematically varied the screen brightness to induce controlled pupil size changes. All head-mounted eye trackers exhibited PSA, with apparent gaze shifts ranging from 0.94 for the Pupil Neon to 3.46 for the Pupil Core. Except for the Pupil Neon, all eye trackers exhibited a significant change in accuracy due to pupil size variations. Precision measures showed device-specific effects of pupil size changes, with some eye trackers performing better in the bright condition and others in the dark condition. These findings demonstrated that, just like desktop eye trackers, head-mounted video-based eye trackers exhibited PSA.
The ADVANCE toolkit: Automated descriptive video annotation in naturalistic child environments
Middelmann NK, Calbimonte JP, Wake EB, Jaquerod ME, Junod N, Glaus J, Sidiropoulou O, Plessen KJ, Murray MM and Vowels MJ
Video recordings are commonplace for observing human and animal behaviours, including interindividual interactions. In studies of humans, analyses for clinical applications remain particularly cumbersome, requiring human-based annotation that is time-consuming, bias-prone, and cost-ineffective. Attempts to use machine learning to address these limitations still oftentimes require highly standardised environments, scripted scenarios, and forward-facing individuals. Here, we provide the ADVANCE toolkit, an automated video annotation pipeline. The versatility of ADVANCE is demonstrated with schoolchildren and adults in an unscripted clinical setting within an art classroom environment that included 2-5 individuals, dynamic occlusions, and large variations in actions. We accurately detected each individual, tracked them simultaneously throughout the duration of the recording (including when an individual left and re-entered the field of view), estimated the position of their skeletal joints, and labelled their poses. By resolving challenges of manual annotation, we radically enhance the ability to extract information from video recordings across different scenarios and settings. This toolkit reduces clinical workload and enhances the ethological validity of video-based assessments, offering scalable solutions for behaviour analyses in naturalistic contexts.
Under my umbrella: Rating scales obscure statistical power and effect size heterogeneity
Fünderich JH, Beinhauer LJ and Renkewitz F
Data from rating scales underlie very specific restrictions: They have a lower limit, an upper limit, and they only consist of a few integers. These characteristics produce particular dependencies between means and standard deviations. A mean that is a non-integer, for example, can never be associated with zero variability, while a mean equal to one of the scale's limits can only be associated with zero variability. The relationship can be described by umbrella plots for which we present a formalization. We use that formalization to explore implications for statistical power and for the relationship between heterogeneity in unstandardized and standardized effect sizes. The analysis illustrates that power is not only affected by the mean difference and sample size, but also by the position of a mean within the respective scale. Further, the umbrella restrictions of rating scales can impede interpretability of meta-analytic heterogeneity. Estimations of relative heterogeneity can diverge between unstandardized and standardized effects, raising questions about which of the two patterns of heterogeneity we would want to explain (for example, through moderators). We reanalyze data from the Many Labs projects to illustrate the issue and finally discuss the implications of our observations as well as ways to utilize these properties of rating scales. To facilitate in-depth exploration and practical application of our formalization, we developed the Shiny Umbrellas app, which is publicly available at https://www.apps.meta-rep.lmu.de/shiny_umbrellas/ .
Assessing the validity evidence for habit measures based on time pressure
Martínez-López P, Vázquez-Millán A, Garre-Frutos F and Luque D
Animal research has shown that repeatedly performing a rewarded action leads to its transition into a habit-an inflexible response controlled by stimulus-response associations. Efforts to reproduce this principle in humans have yielded mixed results. Only two laboratory paradigms have demonstrated behavior habitualization following extensive instrumental training compared to minimal training: the forced-response task and the "aliens" outcome-devaluation task. These paradigms assess habitualization through distinct measures. The forced-response task focuses on the persistence of a trained response when a reversal is required, whereas the outcome-devaluation task measures reaction time switch costs-slowdowns in goal-directed responses conflicting with the trained habit. Although both measures have produced results consistent with the learning theory-showing stronger evidence of habits in overtrained conditions-their construct validity remains insufficiently established. In this study, participants completed 4 days of training in each paradigm. We replicated previous results in the forced-response task; in the outcome-devaluation task, a similar pattern emerged, observing the loss of a response speed advantage gained through training. We then examined the reliability of each measure and evaluated their convergent validity. Habitual responses in the forced-response task and reaction time switch costs in the outcome-devaluation task demonstrated good reliability, allowing us to assess whether individual differences remained stable. However, the two measures were not associated, providing no evidence of convergent validity. This suggests that these measures capture distinct aspects of the balance between habitual and goal-directed control. Our results highlight the need for further evaluation of the validity and reliability of current measures of habitual control in humans.
js-mEye: An extension and plugin for the measurement of pupil size in the online platform jsPsych
Jarvis M, Vasarhelyi A, Anderson J, Mulley C, Lipp OV and Ney LJ
The measurement of pupil size has become a topic of interest in psychology research over the past two decades due to its sensitivity to psychological processes such as arousal or cognitive load. However, pupil measurements have been limited by the necessity to conduct experiments in laboratory settings using high-quality and costly equipment. The current article describes the development and use of a jsPsych plugin and extension that incorporates an existing software that estimates pupil size using consumer-grade hardware, such as a webcam. We validated this new program (js-mEye) across two separate studies, which each manipulated screen luminance and color using a novel luminance task, as well as different levels of cognitive load using the N-back and the Stroop tasks. Changes in luminance and color produced significant changes in pupil size in the hypothesized direction. Changes in cognitive load induced in the N-back and Stroop tasks produced less clear findings; however, these findings were explained to some extent when participant engagement - indexed by task performance - was controlled for. Most importantly, all data were at least moderately correlated with data simultaneously recorded using an EyeLink 1000, suggesting that mEye was able to effectively substitute for a gold-standard eye-tracking device. This work presents an exciting future direction for pupillometry and, with further validation, may present a platform for measuring pupil size in online research studies, as well as in laboratory-based experiments that require minimal equipment.
Correction: SUBTLEX-AR: Arabic word distributional characteristics based on movie subtitles
Boudelaa S, Carreiras M, Jariya N and Perea M
A modified hidden Markov model for detecting insufficient effort responses in questionnaires
Xu H, Xiong J and Li F
Insufficient effort response (IER) significantly compromises the quality of questionnaire data, affecting the validity of resulting inferences. Traditional methods for detecting IER often fail to adequately capture various types of IER or consider participants' internal state transitions. This study expanded the hidden Markov model for analyzing participants' response strategies by reconstructing response and response time (RT) models that target the identification of IER in the context of questionnaires. The method takes into account the characteristics of IER in terms of response and RT, with the aim of dynamically detecting various types of IER. The simulation study demonstrated that a modified hidden Markov model (M-HMM) effectively recovers parameters, with its detection sensitivity primarily influenced by the prevalence of IER, differences in RT distributions between insufficient and effortful responses, and variations in IER severity and type among participants. Utilizing the M-HMM to analyze empirical data allowed for a deeper understanding of IER occurrences and improved item quality assessment, offering valuable insights for practitioners.
The best fixation target revisited: New insights from retinal eye tracking
Niehorster DC, Tamborski S, Nyström M, Konklewski R, Pryhodiuk V, Tołpa K, Hessels RS, Szkulmowski M and Hooge ITC
In many tasks, participants are instructed to fixate a target. While maintaining fixation, the eyes nonetheless make small fixational eye movements, such as microsaccades and drift. Previous work has examined the effect of fixation point design on fixation stability and the amount and spatial extent of fixational eye movements. However, much of this work used video-based eye trackers, which have insufficient resolution and suffer from artefacts that make them unsuitable for this topic of study. Here, we therefore use a retinal eye tracker, which offers superior resolution and does not suffer from the same artifacts to reexamine what fixation point design minimizes fixational eye movements. Participants were shown five fixation targets in two target polarity conditions, while the overall spatial spread of their gaze position during fixation, as well as their microsaccades and fixational drift, were examined. We found that gaze was more stable for white-on-black than black-on-grey fixation targets. Gaze was also more stable (lower spatial spread, microsaccade, and drift displacement) for fixation targets with a small central feature but these targets also yielded higher microsaccade rates than larger fixation targets without such a small central feature. In conclusion, there is not a single best fixation target that minimizes all aspects of fixational eye movements. Instead, if one wishes to optimize for minimal spatial spread of the gaze position, microsaccade or drift displacements, we recommend using a target with a small central feature. If one instead wishes to optimize for the lowest microsaccade rate, we recommend using a larger target without a small central feature.
The Magic Curiosity Arousing Tricks (MagicCATs) database in Italian younger and middle-aged adults: Descriptive statistics and rule-based machine learning
Padulo C, Ponticorvo M and Fairfield B
Epistemic emotions, and in particular curiosity, seem to enhance memory for both the specific information that stimulates the individual's curiosity and information presented in close temporal proximity. Most studies on memory and curiosity have adopted trivia questions to elicit curiosity. However, the amount and range of interest that trivia questions elicit are unclear, and there is no established, universal trivia item pool guaranteed to elicit comparable levels of curiosity across individuals of all ages. Thus, one substantial challenge when studying curiosity is systematically inducing it in controlled experimental settings. Recently, an innovative database called Magic Curiosity Arousing Tricks (MagicCATs) has been published. This database includes 166 short magic-trick video clips that adopt different materials and is designed to induce curiosity, surprise, and interest. Here, we aimed to validate this dataset in the Italian population by reporting the basic characteristics and the norms of the magic-trick video clips in younger and middle-aged adults. We also carried out association rule learning, a rule-based machine learning and data mining method to aid understanding of the co-occurrences between the different epistemic emotions and aid researchers in stimulus selection. Association rules underline relationships or associations between the variables in our datasets and can be used in association with descriptive statistics for stimulus selection in psychological experiments.
Hierarchical Bayesian estimation for cognitive models using Particle Metropolis within Gibbs (PMwG): A tutorial
Kuhne C, Gronau QF, Innes RJ, Cooper G, Stevenson N, Cavallaro JP, Brown SD and Hawkins GE
Estimating quantitative cognitive models from data is a staple of modern psychological science, but can be difficult and inefficient. Particle Metropolis within Gibbs (PMwG) is a robust and efficient sampling algorithm that supports model estimation in a hierarchical Bayesian framework. This tutorial shows how cognitive modeling can proceed efficiently using pmwg, a new open-source package for the R language. We step through implementing the pmwg package with simple signal detection theory models, to more complex cognitive models in which two tasks are jointly modeled together. Through this process, we also address questions of model adequacy and model selection, which must be solved in order to answer meaningful psychological questions. PMwG, and the pmwg package, has the potential to move the field of psychology ahead in new and interesting directions, and to resolve questions that were once too hard to answer with previously available sampling methods.
Spower: A general-purpose Monte Carlo simulation power analysis program
Chalmers RP
This article presents the software Spower, an R package designed as a general-purpose Monte Carlo simulation experiment tool to perform power analyses. The package includes complete customization capabilities with support for five distinct (expected) power analysis criteria (prospective/post hoc, a priori, compromise, sensitivity, and criterion), each of which reports the sampling uncertainty associated with the resulting estimates. Researchers may choose to define their own population generating and analysis function for their tailored simulation experiments, or may choose from a selection of the predefined simulation experiments available within the package. To facilitate comparability and further extensibility, simulation counterparts of the subroutines from the popular stand-alone software G*Power 3.1 (Faul et al., Behavior Research Methods, 41(4), 1149-1160 2009) are included within the package, along with other useful simulation experiment subroutines for improving estimation precision and creating visualizations.
Visual attention graph
Yang KF and Li YJ
Visual attention plays a critical role when our visual system executes active visual tasks by interacting with the physical scene. However, how to encode visual object relationships in the psychological world of the brain deserves exploration. Predicting visual fixations or scanpaths is a usual way to explore the visual attention and behaviors of human observers when viewing a scene. Most existing methods encode visual attention using individual fixations or scanpaths derived from raw gaze-shift data collected from human observers. This may not capture the common attention pattern well, because without considering the semantic information of the viewed scene, raw gaze shift data alone contain high inter- and intra-observer variability. To address this issue, we propose a new attention representation, called visual attention graph (VAG), to simultaneously code the visual saliency and scanpath in a graph-based representation and better reveal the common attention behavior of human observers. In the visual attention graph, the semantic-based scanpath is defined by the path on the graph, while the saliency of objects can be obtained by computing fixation density on each node. Systemic experiments demonstrate that the proposed attention graph combined with our new evaluation metrics provides a better benchmark for evaluating attention prediction methods. Meanwhile, extra experiments demonstrate the promising potential of the proposed attention graph in assessing human cognitive states, such as autism spectrum disorder screening and age classification.
Cross-cultural adaptation of the Language and Social Background Questionnaire: Psychometric properties emerging from the Persian version
Maleki M, Jahanjoo F, Shibafar S, Karimijavan G, Torabi MH and Jarollahi F
The self-reported Language and Social Background Questionnaire (LSBQ) measures an individual's language proficiency and usage quantitatively. This cross-sectional study aims to evaluate the psychometric properties of the LSBQ in the Persian (Farsi) language. A total of 325 adults aged between 15 and 59 years (mean age = 21.00 years, SD = 3.56; 251 females, 70 males) from Tabriz and Tehran participated in this study. To evaluate the Language and Social Background Questionnaire (LSBQ), exploratory factor analysis (EFA) was employed. The psychometric properties of the Persian LSBQ were assessed through various validity measures, as well as reliability analysis and receiver operating characteristic (ROC) curve analysis. The overall content validity ratio for the questionnaire was 0.98, with an impact score of 4.47. The internal consistency of the scale was satisfactory, with a Cronbach's alpha of 0.707. The EFA identified five key factors: "dominant language at home and community," "non-Persian use," "non-Persian proficiency," "Persian comprehension," and "switching". Using Youden's J criterion, an optimal cut-off points of - 1.00 was determined to effectively distinguish between monolinguals and non-monolinguals. To assess the convergent and discriminant validity of the instrument, Spearman's correlation was utilized to analyze the relationships among the variables. The Persian version of the LSBQ is a reliable and valid tool for assessing language proficiency and usage among Persian-speaking participants. It effectively distinguishes between monolingual and non-monolingual individuals. Researchers and clinicians can utilize the LSBQ effectively, provided it aligns with their specific research questions and the language experiences of their target population.
Beyond performance: A POMDP-based machine learning framework for expert cognition
He H and Duan Y
This study explores expert-novice differences in anticipation under uncertainty by combining partially observable Markov decision process (POMDP) modeling with machine learning classification. Forty-eight participants (24 experts, 24 novices) completed a basketball pass/shot anticipation task. Through POMDP modeling, two cognitive parameters-sensory precision (SP) and prior belief (pB)-were extracted to capture internal decision processes. Results showed that experts fit the POMDP model more closely, requiring more iterations for parameter convergence and achieving higher pseudo R values than novices. Experts demonstrated significantly higher SP, indicating superior ability to filter key cues under noisy conditions. Their pB values remained closer to neutral, suggesting flexible reliance on prior knowledge. In contrast, novices exhibited more biased priors and a lower, more dispersed SP. Machine learning analyses revealed that SP and pB jointly formed distinct clusters for experts and novices in a two-dimensional parameter space, with classification accuracies exceeding 90% across multiple methods. These findings indicate that expertise entails both enhanced perceptual precision and adaptive prior calibration, reflecting deeper cognitive reorganization rather than simple skill increments. Our dual-parameter approach offers a model-based perspective on expert cognition and may inform future research on the multifaceted nature of expertise.