Assessing Pressures Shaping Natural Language Lexica
Human languages balance communicative informativity with complexity, conveying as much as needed through the simplest means required to do so. Yet, these concepts-informativity and complexity-have been operationalized in various ways, and it remains unclear which definitions best capture empirical linguistic patterns. A particularly successful operationalization is that offered by the Information Bottleneck framework, which suggests a balance between complexity and informativity across domains like color, kinship, and number. However, we show that the notion of complexity employed by this framework has some counterintuitive consequences. Focusing on color terms, we then study to what extent this and other notions of complexity play a role in explaining cross-linguistic regularity. We propose a method to assess their explanatory contributions; and to probe whether they enter in a joint optimization or in a trade-off competition. This offers a more general framework to study language change and the forces that shape it, where instead of showing that a given model is compatible with existing data, the data is used to adjudicate between candidate measures.
The Mechanistic Framework of Alignment: A Unified Model
Conversational alignment, also known as accommodation, entrainment, interpersonal synchrony, and convergence, is defined as the tendency for interlocutors to exhibit similarity in their communicative behaviors. There have been many theories and explanations set forth as to why alignment occurs and, accordingly, the mechanisms that underlie it. To date, however, alignment research has been largely siloed, with different research teams often examining alignment through the lens of a single theoretical account. Considering causal mechanisms in tandem offers a more holistic and nuanced understanding of the dynamic nature of alignment, its purposes, and its consequences. Accordingly, we propose the Mechanistic Framework of Alignment (MFA), a qualitative conceptual model that integrates existing theories of conversational alignment into one unified framework. To explain this framework, we first review five alignment mechanisms, discussing the underlying assumptions, contributions, and supporting evidence for each. We then introduce two overarching factors-conversational goal and alignment type-that are critical for understanding when and how these mechanisms give rise to aligned behavior. Illustrative examples demonstrate how the relative weightings of each mechanism interact with these contextual variables. Finally, we conclude with directions for how future research can extend and refine this framework and how the MFA can support future work in this area.
Coming Up Next: The Extent of the Perceptual Window in Comic Reading
Recent models of sequential narratives suggest that readers form predictions about upcoming panels as they read. However, previous work has considered these predictions only in terms of currently viewed information. In the current studies, we investigate to what extent readers are using information from un-fixated panels in comic stories. Using the moving-window paradigm, we studied whether reading behavior was disrupted when upcoming panels were unavailable to the reader, in short comic strips (Experiment 1) and multipage comics (Experiment 2). Both studies showed the greatest disruption to reading when all peripheral information was removed, but such changes persisted when only partial peripheral information was available. The results indicate that readers are making use of information from at least two panels ahead of the current fixation location. We consider these findings in relation to the PINS model of comic reading, and how the role of peripheral information might be further explored.
The Bias-and-Expertise Model: A Bayesian Network Model of Political Source Characteristics
Perceptions of source credibility may play a role in major societal challenges like political polarization and the spread of misinformation as citizens disagree over which sources of political information are credible and sometimes trust untrustworthy sources. Cognitive scientists have developed Bayesian Network models of how people integrate perceptions of source credibility when learning from information provided by sources, but these models do not involve the crucial source characteristic in politics: bias. Biased sources make claims that align with a particular political agenda, whether or not they are true. We present a novel Bayesian Network model which integrates perceptions of a source's bias as well as their expertise. We demonstrate the model's validity for predicting how people will update beliefs and perceptions of bias and expertise in response to testimony across two studies, the second being a preregistered conceptual replication and extension of the first.
Constraints on Exchange Edits During Noisy-Channel Inference
According to the noisy channel framework of sentence processing, communication can succeed even when the input is corrupted because comprehenders rationally infer the speaker's intended meaning based on the prior probability of the literal interpretation and the probability that the input has been corrupted by noise. To test whether and under what conditions comprehenders consider word exchanges as a possible source of corruption, we ran five experiments on processing three types of simple German sentences: subject-before-object sentences (SO), object-before-subject sentences (OS), and passive sentences. Critical sentences had implausible meanings, but could be "repaired" by exchanging function words or by exchanging nouns. Experiments 1 through 4 presented sentences along with yes-no questions to probe interpretation. Implausible SO and passive sentences consistently elicited few nonliteral interpretations, whereas many nonliteral interpretations were given to implausible OS sentences. This was true regardless of whether word exchanges had to cross a main verb or an auxiliary, and it was more pronounced if the overall proportion of implausible sentences was low. We conclude that when answering yes-no questions, word exchanges are considered with function words of the same syntactic category, but not with nouns, and only when they result in a more likely syntactic structure. Experiment 5 showed that when explicitly asked to correct implausible sentences, comprehenders use noun exchanges frequently. We propose that the results for both yes-no questions and explicit corrections follow if the prior probability assigned to implausible sentences differs between tasks.
With or Without a System: How Category-Specific and System-Wide Cognitive Biases Shape Word Order
Certain recurrent features of language characterize the way a whole language system is structured. By contrast, others target specific categories of items within those wider systems. For example, languages tend to exhibit consistent order of heads and dependents across different phrases-a system-wide regularity known as harmony. While this tendency is generally robust, some specific syntactic categories appear to deviate from the trend. We examine one such case, the order of adjectives and genitives, which do not exhibit a typological tendency for consistent order with respect to the noun. Instead, adjectives tend to follow and genitives precede the noun. Across two silent gesture experiments, we test the hypothesis that these category-specific ordering tendencies reflect cognitive biases that favor (i) conveying objects before properties that modify them, but (ii) conveying expressions of possession before possessors. While our hypothesis is thus that these biases are semantic in nature-they impact preferences for how concepts are ordered-the claim is that they may have downstream effects on conventionalized syntax by contributing to an over-representation of postnominal adjectives and prenominal genitives. We find that these biases affect gesture order in contexts where no conventionalized system is in place. When a system is in place, participants learn that system, and category-specific biases do not impact their learning. Our results suggest that different contexts reveal distinct types of cognitive biases; some are active during learning and others are active during language creation.
Language-Invariant Strategies of Navigating Transitions in Joint Activities: Forms and Functions of Coordination Markers
Goal-directed tasks unfold in hierarchies of larger and smaller sub-tasks, and pursuing them jointly implies that participants must agree on whether they are continuing an ongoing sub-task (horizontal transition) or switching to the next sub-task (vertical transition). Previous research indicates that humans employ short and efficient coordination markers as procedural conventions to distinguish horizontal (e.g., in English, with yeah and uh-huh) and vertical transitions (with okay, all right). However, it remains unclear (1) whether such words serve as potentially universal coordination devices and (2) which properties make some markers more suitable for horizontal versus vertical transition contexts. We hypothesized that horizontal transitions in ongoing sub-tasks are associated with higher dual-tasking interference between verbal coordination and the nonlinguistic task, therefore, constraining the lexicality of coordination markers. In our experimental study, we assessed how speakers of three typologically diverse languages (Swiss French, Vietnamese, and Shipibo-Konibo; N = 232) used coordination markers to navigate a joint LEGO-building task. We found that in each language, coordination markers comprise a system of transition-specific conventions and that participants strategically deployed markers with minimal lexical and acoustic forms (uh-huh, mm) and repetitions in horizontal transitions, while more lexicalized markers (e.g., okay) in vertical transitions. Our findings suggest that (1) coordination markers are potentially universal linguistic devices for navigating joint activities and (2) the forms of coordination markers might be shaped by the constraints of their primary interaction context (here, horizontal and vertical transitions). Our study provides new evidence of how interactional settings might selectively shape language use through the forces of convergent language evolution.
Influence of Visual and Action Experiences on Sensorimotor Simulation During Action Verb Processing: The Roles of Motor Perspective and Personal Pronouns
The theory of embodied simulation posits that semantic processing related to actions involves the simulation of sensorimotor experiences, similar to action recognition, which also activates the action observation network. Visual and action experiences obtained through vision and proprioception can facilitate the processing of action verbs via this simulation process. However, the differential effects of these two types of action representations on the processing of action verbs remain to be explored. This study uses an action-language priming paradigm and three behavioral experiments to explore how visual and action experiences from different perspectives affect sensorimotor simulation in action verb processing. Experiment 1 studied how action image perspectives (first-person vs. third-person) and image-word congruency affect action verb priming. Experiment 2 examined the role of the action agent in perspective priming. Experiment 3 investigated that motor experience congruency, jointly activated by visual perspective and personal pronouns, influences action verb processing. Experiment 1 showed faster action verb processing with the first-person perspective (1PP) during prime-target incongruency and non-mirrored conditions, indicating better action control and prediction, enhancing sensorimotor simulation. Experiment 2 found faster responses with the 1PP during incongruency, with no effect from the action agent on sensorimotor simulation. Experiment 3 showed faster reaction times for prime-target congruency than incongruency, with no effect of perspective congruency. These results show that action verb processing involves simulating sensorimotor experiences from specific perspectives, emphasizing the key role of action experience and offering new evidence for action verb representation theories.
Assessing Others' Knowledge Through Their Speech Disfluencies and Gestures
As part of the multimodal language system, gestures play a vital role for listeners, by capturing attention and providing information. Similarly, disfluencies serve as a cue for the listeners about one's knowledge on a topic. In two studies, the first study with natural and the second study with controlled stimuli, we asked whether the combination of gestures and speech disfluencies would affect how listeners made feeling-of-another's-knowing (FOAK) judgments regarding speakers' knowledge states. In Study 1, we showed participants videos of speakers providing navigational instruction. We manipulated the speakers' use of gestures and speech disfluencies, whereas facial expressions, words, and additional visual cues (e.g., background, clothes, objects) naturally occurred. We found that fluent speech elicited higher FOAK ratings than disfluent speech, but no significant effect was found for gestures. In the follow-up Study 2, we examined the same disfluency-gesture interaction in a more controlled setting using video stimuli with an actress controlling for background, intonation, and word choice, as well as iconic and beat gesture types as gesture subcategories. Participants also filled out the Gesture Awareness Scale. Results were identical with the first study, in which only the disfluent speech received significantly lower FOAK ratings, revealing no effects of gesture use or type. These findings suggest that individuals may use certain communicative cues more than others, particularly in the context of assessing others' knowledge.
Shared and Distinct Phonemic Features in Sound-Shape and Sound-Size Correspondences: A Study of Mandarin Chinese
Certain speech sounds are consistently associated with visual properties such as shape and size, a phenomenon known as crossmodal correspondences. Well-established examples demonstrate that the vowel /u/ is often linked to rounder and larger objects, while /i/ is associated with more angular and smaller ones. However, most previous research utilized English pseudowords, leaving a gap in our understanding of how these correspondences manifest in tonal languages. The current study extends the investigation to Mandarin Chinese, a tonal language, to examine the roles of vowels, consonants, and lexical tones in sound-shape and sound-size correspondences. Participants heard consonant-vowel-tone syllables and rated each on a 5-point scale with rounder/more angular shapes or larger/smaller icons at opposite ends. The results confirmed the established vowel effect: /u/ was associated with rounder and larger patterns than /i/. Results for consonants demonstrated that the voiced-unvoiced contrast predicted sound-shape judgments, while the aspirated-unaspirated contrast, which is less prominent in English, influenced sound-size judgments. Lexical tones also revealed systematic effects, with Tone 1 (flat), Tone 2 (rising), Tone 3 (falling-rising), and Tone 4 (falling) progressively matched from rounder to more angular shapes, while Tones 1 and 2 were linked to larger sizes than Tones 3 and 4. These phonemic features reliably predicted crossmodal correspondences even when controlling for acoustic properties, suggesting robust mappings between phonemic and visual representations. This study highlights the common vowel effects across Mandarin and English while revealing unique influences of consonants and lexical tones, underscoring the role of language experience in shaping crossmodal correspondences.
Humans Select Subgoals That Balance Immediate and Future Cognitive Costs During Physical Assembly
From building a new piece of furniture to replacing a lightbulb, people must often figure out how to assemble an object from its parts. Although these physical assembly problems take on many different forms, they also pose common challenges. Chief among these is the question of how to break a complex problem down into subproblems that are easier to solve. What principles determine why some strategies for decomposing a problem are favored over others? Here, we investigate the decisions that people make when considering different visual subgoals in the context of attempting to build a series of virtual block towers. We hypothesized that people favor subgoals achieving a balance between how much progress the subgoals would help achieve toward the final goal and how effortful they would be to solve. We tested this hypothesis by defining several computational models of planning and subgoal selection, then evaluating how well these models predicted human planning and subgoal selection behavior on the same problems. Our results suggest that participants rapidly differentiated the computational costs of otherwise similarly ambitious subgoals, and used these judgments to drive subgoal selection. Moreover, our findings are consistent with the possibility that participants were not only sensitive to the immediate computational costs associated with solving the very next subgoal, but also future costs that might be incurred when attempting the rest of the problem. Taken together, these results contribute to our understanding of how humans make efficient use of cognitive resources to solve complex, grounded planning problems.
The Co-Structuring of Gesture-Vocal Dynamics: An Exploration in Karnatak Music Performance
In music performance contexts, vocalists tend to gesture with hand and upper body movements as they sing. But how does this gesturing relate to the sung phrases, and how do singers' gesturing styles differ from each other? In this study, we present a quantitative analysis and visualization pipeline that characterizes the multidimensional co-structuring of body movements and vocalizations in vocal performers. We apply this to a dataset of performances within the Karnatak music tradition of South India, including audio and motion tracking data of 44 performances by three expert Karnatak vocalists, openly published with this report. Our results show that time-varying features of head and hand gestures tend to be more similar when the concurrent vocal time-varying features are also more similar. While for each performer we find clear co-structuring of sound and movement, they each show their own characteristic salient dimensions (e.g., hand position, head acceleration) through which movement co-structures with singing. Our time-series analyses thereby provide a computational approach to characterizing individual vocalists' unique multimodal vocal-gesture co-structuring profiles. We also show that co-structuring clearly reduces degrees of freedom of the multimodal performance such that motifs that sound alike tend to co-structure with gestures that move alike. The current method can be applied to any multimodally ensembled signals in both human and nonhuman communication, to determine co-structuring profiles and explore any reduction in degrees of freedom. In the context of Karnatak singing performance, the current analysis is an important starting point for further experimental study of gesture-vocal synergies.
Scope of Message Planning: Evidence From Production of Sentences With Heavy Sentence-Final NPs
Speaking begins with the generation of a preverbal message. While a common assumption is that the scope of message-level planning (i.e., the size of message-level increments) can be more extensive than the scope of sentence-level planning, it is unclear how much information is typically encoded at the message level in advance of sentence-level planning during spontaneous production. This study assessed the scope and granularity of early message-level planning in English by tracking production of sentences with light versus heavy sentence-final NPs. Speakers produced SVO sentences to describe pictures showing an agent acting on a patient. Half of the pictures showed one-patient events, eliciting sentences with unmodified patient names (e.g., "The tailor is cutting the dress"), and half showed two-patient events with a target patient and a non-target patient. The presence of a non-target patient required production of a prenominal or postnominal modifier to uniquely identify the target patient (e.g., "The tailor is cutting the long dress" / "the dress with sleeves"). Analyses of speech onsets and eye movements before speech onset showed strong effects of the complexity of the sentence-final character, suggesting that early message-level planning does not proceed strictly word by word (or "from left to right") but instead includes basic information about the identity of both the sentence-initial and sentence-final characters. This is consistent with theories that assume extensive message-level planning before the start of sentence-level encoding and provides new evidence about the level of conceptual detail incorporated into early message plans.
The Ideological Turing Test: A Behavioral Measure of Open-Mindedness and Perspective-Taking
Understanding our ideological opponents is crucial for the effective exchange of arguments and the avoidance of escalation, and the reduction of conflict. We operationalize the idea of an "Ideological Turing Test" to measure the accuracy with which people represent the arguments of their ideological opponents. Crucially, this offers a behavioral measure of open-mindedness which goes beyond mere self-report. We recruited 200 participants from opposite sides of three topics with potential for polarization in the UK of the early 2020s (1200 participants total). Participants were asked to provide reasons both for and against their position. Their reasons were then rated by participants from the opposite side. Our criteria for "passing" the test was if an argument was agreed with by opponents to the same extent or higher than arguments made by proponents. We found evidence for high levels of mutual understanding across all three topics. We also found that those who passed were more open-minded toward their opponents, in that they were less likely to rate them as ignorant, immoral, or irrational. Our method provides a behavioral measure of open-mindedness and ability to mimic counterpartisan perspectives that goes beyond self-report measures. Our results offer encouragement that, even in highly polarized debates, high levels of mutual understanding persist.
Characterizing the Large-Scale Structure of Multimodal Semantic Networks
Humans organize semantic knowledge into complex networks that encode relations between concepts. The structure of those networks has broad implications for human cognitive processes, and for theories of semantic development. Evidence from large lexical networks such as those derived from word associations suggest that semantic networks are characterized by high sparsity and clustering while maintaining short average paths between concepts, a phenomenon known as a "small-world" network. It has also been argued that those networks are "scale-free," meaning that the number of connections (or degree) between concepts follows a power-law distribution, whereby most concepts have few connections, while a few have many. However, the scale-free property is still debated, and the extent to which the lexical evidence reflects the naturally occurring semantic regularities of the environment has not been investigated systematically. To address this, we collected and analyzed semantic descriptors, human evaluations, and similarity judgments from four large datasets of naturalistic stimuli across three modalities (visual, auditory, and audio-visual) comprising 7916 stimuli and 610,841 human responses. By connecting concepts that co-occur as descriptors of the same stimuli, we construct "multimodal" semantic networks. We show that these networks exhibit a clear small-world structure with a degree distribution that is best captured by a truncated power law (i.e., the most-connected concepts are less common than predicted by a perfect power law). We further show that these networks are predictive of human sensory judgments on these domains, as well as reaction times in an independent lexical decision task. Finally, we show that multimodal networks also share overlapping themes with previously analyzed lexical networks, which upon a more rigorous reanalysis are revealed to be truncated too. Our findings shed new light on the origins of the structure of semantic networks by tying it to the semantic regularities of the environment.
Visual Statistical Learning in Children Aged 3-9 Years
Visual statistical learning (visual SL) is the ability to implicitly extract statistical patterns from visual stimuli. Visual SL could be assessed using online measures, evaluating reaction times (RTs) to stimuli during task performance, and offline measures, which assess recognition of the presented patterns. We examined 96 children aged 3-9 years using a visual SL task that included online and offline measures. In the online phase, children viewed sequences of cartoon aliens presented one at a time, organized into triplets. The task was to press a button to two target stimuli: one predictable (the last alien in the triplet), and one unpredictable (the first in the triplet). In the offline phase, children performed a two-alternative-forced choice task, where they viewed two triplets and selected the one matching the sequence from the online phase. In online measures, we observed a gradual increase in RT for unpredictable stimulus and a slight decrease in RT for predictable stimulus over the experiment, with fewer errors for predictable stimulus, indicating an SL effect. In offline measures, the SL effect was also observed, though less robust: recognition rates for correct triplets exceeded chance level only for triplets containing predictable stimuli. Notably, while online measures remained stable across age, offline recognition rates increased with age, suggesting a link to the development of cognitive functions needed for explicit task performance. We propose that SL is not purely an implicit process but rather an active learning process shaped by experimental task requirements and goal setting.
Individualization Without Internalization
What is that "inner" voice that keeps you up at night or that tells you to stop as you reach for another chocolate? Advances in embodied cognitive science raise doubts about explaining the "self" as the result of internalizing our shared world. On that emerging view, there is nothing to transport from outside to inside the skull. But, if not an inner state of mind, then how should we understand the experience of a self? This paper develops a relational approach to individualization by aligning ecological thinking with practice theory through Meadian considerations. On this account, we continuously experience a meaningful world, filled with possibilities for action, tied to things in places and practices. Practices are intergenerational processes in which materials get organized by what we do, while in turn organizing us. Becoming a "self" requires learning to attend to such communal organizations as one's relation to the world expands across development. As we learn to engage various such organizations skillfully, we can experience them responding to us. Situated across practices, the "self" develops as a reciprocal relation between multiple timescales: notably between communal practices and a person's skilled activities. When we close our eyes and our thoughts come to the fore, we experience this reciprocal relation directly. To get this relational self into view, psychology needs to get out of our heads and study the worldly conditions that make us.
Time Spent Thinking in Online Chess Reflects the Value of Computation
Human planning tends to be efficient, focusing on a relatively small number of options when considering future paths. Recent proposals have suggested that this efficiency reflects intelligent deployment of the limited resources available for planning. A prediction of this and related proposals is that when individuals spend time thinking should depend on the benefits and costs of additional computation. We tested this hypothesis by measuring how much time humans spent thinking before acting in over 12 million online chess games. Players spent more time thinking in board positions where additional computation was more beneficial. This relationship was greater in stronger players, and was strengthened by considering only the information available to the player at the time of choice. A simple model based on measuring the actual cost of spending time thinking in online chess was able to capture qualitative features of this relationship. These results provide evidence that the amount of time humans spend thinking is appropriately sensitive to the value of computation.
Communicating Through Acting: Affording Communicative Intention in Pantomimes
How do people intuitively recognize communicative intention in pantomimes, even though such actions kinematically resemble instrumental behaviors directed at changing the world? We focus on two alternative hypotheses: one posits that instrumental intention competes with communicative intention, such that the weaker the former, the stronger the latter; the other suggests that instrumental intention is nested within communicative intention, such that the presence of the former facilitates the latter. To test these hypotheses, we compiled a video dataset of action-object pairs with varying frequencies in the English corpus. Using the concept of affordance, we qualitatively varied the degree to which a scene visually supports the execution of an action. Across two empirical experiments, we found a nonmonotonic relationship between affordance and communicative ratings: partial affordance, where the scene provides some support for an action's instrumental purpose, elicited the strongest perception of communicative intention. In contrast, full affordance or no affordance resulted in weaker interpretations of communicative intention. We also found that recognizing the instrumental components of pantomime-like actions predicted a higher communicativeness rating. Our study, on top of confirming humans' ability to interpret novel pantomimes, reveals a novel mechanism of communicative intention: recognizing an instrumental goal and perceiving suboptimal conditions for achieving it together enhance the communicative signal. This work contributes toward an integrated theory of pantomimes, demonstrating how the rationality principle not only aids in distinguishing communicative intention but also supports the identification of instrumental content embedded within it.
Between Two Grammatical Gender Systems: Exploring the Impact of Grammatical Gender on Memory Recall in Ukrainian-Russian Simultaneous Bilinguals
This study examines the impact of grammatical gender on memory recall among simultaneous bilinguals with two three-gendered languages (Ukrainian and Russian). Ukrainian-Russian bilinguals and English monolingual controls were tested on their ability to remember names assigned to objects with either matching or mismatching grammatical genders across their two languages. Results showed that bilinguals recalled names more accurately when the biological sex of the names was congruent with the grammatical gender of objects in both languages (e.g., recalling a male name assigned to a noun with masculine grammatical gender in both L1s, rather than a female name). English monolinguals, in contrast, showed no difference in recall. However, when grammatical gender mismatched across Ukrainian and Russian, the expected influence of the more proficient language on recall accuracy was not observed. These findings suggest that converging grammatical information from two L1s creates stronger memory associations, enhancing recall accuracy of simultaneous bilinguals. Conversely, mismatching grammatical genders appear to negate this effect. Taken together, these findings highlight the interconnected nature of bilingual conceptual representation.
Can We Really "See" Through Others' Eyes? Evidence of Embodied Visual-Spatial Representation From an Altercentric Viewpoint
Social interactions often require the ability to "stand in others' shoes" and perceive the world "through others' eyes," but it remains unclear the extent to which we can actually see others' visual worlds. Prior research has primarily focused on mental-body transformation in visual-spatial perspective taking (VSPT), yet the subsequent visual processing under the adopted perspective has been less explored. Addressing this gap, our study investigated mental representation of the visual scene as a direct outcome of perceiving from another's viewpoint. Using modified VSPT tasks, we paired avatar-perspective trials with self-perspective trials to create opportunities for observing priming effects resulting from potential mental representations formed under the avatar's perspective. We hypothesized that if individuals form embodied representations of visual scenes while explicitly processing stimuli from the avatar's viewpoint, these representations should be stored in memory, and elicit priming effects when later encountering similar scenes from their own perspective. Across four experiments, we provide the first evidence that (1) explicitly engaging in embodied VSPT produces robust mental representations of the visual scene from the adopted perspective, (2) these representations are visual-spatial rather than semantic in nature, and (3) these representations arise from embodied processing rather than from self-perspective strategies. Additionally, our findings reveal that individuals implicitly process visual stimuli from their own perspective during other-perspective tasks, forming distinct but weaker self-perspective representations. Overall, our findings demonstrate the existence of embodied representations in VSPT and offer significant insights into the processing mechanisms involved when we "stand in others' shoes."
Gestural and Verbal Evidence of Conceptual Representation Differences in Blind and Sighted Individuals
This preregistered study examined whether visual experience influences conceptual representations by examining both gestural expression and feature listing. Gestures-mostly driven by analog mappings of visuospatial and motoric experiences onto the body-offer a unique window into conceptual representations and provide complementary information not offered by language-based features, which have been the focus of previous work. Thirty congenitally or early blind and 30 sighted Turkish speakers produced silent gestures and features for concepts from semantic categories that differentially rely on experience in visual (non-manipulable objects and animals) and motor (manipulable objects) information. Blind individuals were less likely than sighted individuals to produce gestures for non-manipulable objects and animals, but not for manipulable objects. Overall, the tendency to use a particular gesture strategy for specific semantic categories was similar across groups. However, blind participants relied less on drawing and personification strategies depicting visuospatial aspects of concepts than sighted participants. Feature-listing revealed that blind participants share considerable conceptual knowledge with sighted participants, but their understanding differs in fine-grained details, particularly for animals. Thus, while concepts appear broadly similar in blind and sighted individuals, this study reveals nuanced differences, too, highlighting the intricate role of visual experience in conceptual representations.
Social Context Matters for Turn-Taking Dynamics: A Comparative Study of Autistic and Typically Developing Children
Engaging in fluent conversation is a surprisingly complex task that requires interlocutors to promptly respond to each other in a way that is appropriate to the social context. In this study, we disentangled different dimensions of turn-taking by investigating how the dynamics of child-adult interactions changed according to the activity (task-oriented vs. freer conversation) and the familiarity of the interlocutor (familiar vs. unfamiliar). Twenty-eight autistic children (16 male; = 10.8 years) and 20 age-matched typically developing children (8 male; = 9.6 years) participated in seven task-orientated face-to-face conversations with their caregivers (336 total conversations) and seven more telephone conversations alternately with their caregivers (144 total conversations, 60 with the typical development group) and an experimenter (191 total conversations, 112 with the autism group). By modeling inter-turn response latencies in multi-level Bayesian location-scale models, we found that inter-turn response latencies were consistent across repeated measures within social contexts, but exhibited substantial differences across social contexts. Autistic children exhibited more overlaps, produced faster response latencies and shorter pauses than typically developing children-and these group differences were stronger when conversing with the unfamiliar experimenter. Unfamiliarity also made the relation between individual differences and latencies evident: only in conversations with the experimenter were higher sociocognitive skills and lower social awareness associated with faster responses. Information flow and shared tempo were also influenced by familiarity: children adapted their response latencies to the predictability and tempo of their interlocutor's turn, but only when interacting with their caregivers and not the experimenter. These results highlight the need to construe turn-taking as a multicomponential construct that is shaped by individual differences, interpersonal dynamics, and the affordances of the context.
From Human Child to Grey Parrot: Exploring a Common Model of Word Meaning Extension Across Species
Word meaning extension refers to the process by which a single word form develops multiple related meanings. Prior studies demonstrate that meaning extension at diverse timescales, from decades-long historical change and to month-long changes in child overextension, is accounted for by models grounded in conceptual relations across knowledge types. Whether this framework generalizes to other species remains an open question. We address this question with a probabilistic model of overextension based on various knowledge types to predict word choice of nonhuman animals. As a starting point, we compared cases of overextension from Apollo - a grey parrot who has acquired some English words - to the cases of overextension documented in child language acquisition. We apply an established model of child overextension to a novel parrot dataset of over 200 referent-utterance pairs (e.g., bead-"ball") collected from Apollo's YouTube channel and test whether the child model can predict parrot word choice. Our results show that Apollo's overextension can be predicted by the multimodal model of child overextension better than baseline models that rely on frequency or sound similarity. We also find independent evidence supporting the role of different knowledge types from Alex, a grey parrot, who features prominently in prior research on animal acquisition of human language. Our findings suggest that a common model might account for the cognitive ability of word overextension identifiable in a species that diverged from humans about 320 million years ago. We discuss potential limitations and future research directions that may further strengthen the current findings.
Coordinating Attention in Face-to-Face Collaboration: The Dynamics of Gaze, Pointing, and Verbal Reference
During real-world interactions, people rely on gaze, gestures, and verbal references to coordinate attention and establish shared understanding. Yet, it remains unclear if and how these modalities couple within and between interacting individuals in face-to-face settings. The current study addressed this issue by analyzing dyadic face-to-face interactions, where participants (n = 52) collaboratively ranked paintings while their gaze, pointing gestures, and verbal references were recorded. Using cross-recurrence quantification analysis, we found that participants readily used pointing gestures to complement gaze and verbal reference cues and that gaze directed toward the partner followed canonical conversational patterns, that is, more looks to the other's face when listening than speaking. Further, gaze, pointing, and verbal references showed significant coupling both within and between individuals, with pointing gestures and verbal references guiding the partner's gaze to shared targets and speaker gaze leading listener gaze. Moreover, simultaneous pointing and verbal referencing led to more sustained attention coupling compared to pointing alone. These findings highlight the multimodal nature of joint attention coordination, extending theories of embodied, interactive cognition by demonstrating how gaze, gestures, and language dynamically integrate into a shared cognitive system.
The Compass of Commitment: Control Mechanisms Underpinning the Sense of Individual and Joint Commitment
The sense of commitment directs us toward our goals, shielding us from distractions and temptations, and thereby facilitates a wide range of cooperative activities and institutions characteristic of our species. Building upon recent research, this paper identifies cognitive, motivational, and social factors that elicit or enhance the sense of commitment. It surveys studies on cognitive and motivational mechanisms, including control mechanisms, that may support the sense of commitment. This research is organised into a framework that enables us to relate these distinct mechanisms to one another. It also allows us to formulate novel hypotheses about how these mechanisms may interact to help us stay on course toward our goals .
Does Body-Specificity Stand on Solid Ground? Z-Curving the Association Between Emotional Valence and Lateral Space
The body-specificity hypothesis proposes that people with different bodies should also have different conceptual systems. The test case of this hypothesis has been the association of emotional valence (good vs. bad) with lateral space (left vs. right) in people of different handedness. As expected, right-handers tend to associate the good with the right space, whereas left-handers show the opposite association. This body-specific effect has been very influential and followed up by an important number of studies. Here, we undertake a systematic examination of the quality of this literature by means of z-curve analysis. The results show that the expected replicability rate (statistical power) of this literature is reasonably high (71-76%), especially for those studies using binomial tasks and those that entail the severest tests for the hypothesis, whereas it is lower in reaction time studies. Moreover, the presence of publication bias cannot be statistically asserted. All in all, the literature on space-valence body-specificity appears solid, although there is still room for improvement.
The Relationship Between Surprisal and Prosodic Prominence in Conversation Reflects Intelligibility-Oriented Pressures
Conversation is a dynamic, multimodal activity involving the exchange of complex streams of information like words, prosody, gesture, eye contact, and backchannels. Understanding how these different channels interact in naturalistic scenarios is essential for understanding the mechanisms governing human communication. Past studies suggested that the duration of words is tied to their predictability in context, but it remains unclear whether this relationship is speaker-oriented (e.g., retrieval or production-based) or due to listener-oriented, intelligibility-based pressures (i.e., emphasizing unpredictable words to ease comprehension). This study aims to examine the relationship between predictability and additional acoustic variables, to test how much intelligibility-oriented principles impact conversation. We use the GPT-2 large language model to assess the relationship between surprisal, a measure of unpredictability, and several variables known to play an important role in conversation-the prosodic features of duration, intensity, and pitch. We perform this analysis on the CANDOR corpus of naturalistic spoken video call conversation between strangers in English. In keeping with previous results using n-gram predictability, we find that GPT-2 surprisal predicts significantly higher values for duration. Moreover, surprisal also predicts maximum pitch and pitch range even when controlling for duration, with mixed evidence for an effect of surprisal on intensity. Additionally, we investigated listener backchannels (short interjections like "yeah" or "mhm") and found that listener backchannels tended to be accompanied and followed by a spike in the surprisal of speakers' words. Finally, we demonstrate a divergence between the effect of context window size on the model fit of surprisal to maximum pitch and to other variables. The results provide additional support for intelligibility-based accounts, which hold that language production is sensitive to a pressure for successful communication, not just speaker-oriented pressures. Our data and analysis code are shared: https://osf.io/sqpn6/?view_only=e4d9e36c68b54863bc781e359463e1fe.
Are Transposed-Phoneme Effects Observed When Listening to Sentences?
In the present study, we asked a simple question: Can transposed-phoneme effects, previously found with nonwords presented in isolation, be observed when the transposed-phoneme nonwords are embedded in a sequence of spoken words that, apart from the transposed-phoneme nonwords, formed a correct sentence? The results are clear-cut. We found no evidence for a transposed-phoneme effect during spoken sentence processing either in a nonword detection task (Experiments 1-3) or in a correct/incorrect decision task (Experiment 4), where "correctness" could either concern individual words (i.e., the presence of a nonword in the sequence) or the entire sequence (i.e., a grammatical decision). Hence, the presence of nonwords in spoken sentences was not harder to detect whether they were created by transposing (e.g., /ʃoloka/) or substituting (e.g., /ʃoropa/) two consonants in the corresponding base-words (e.g., /ʃokola/ chocolat "chocolate"). In contrast, a robust transposed-letter effect was observed during sentence reading (Experiment 5), using the same word/nonword sequences and the same correct/incorrect decision task as in Experiment 4. We discuss the possibility that the greater seriality imposed by spoken sentences in the processing of spoken words leads to a more precise encoding of phoneme order, thus cancelling the transposed-phoneme effect. Sentence reading, on the other hand, would involve more parallel processing, hence the robust transposed-letter effect found with written sentences.
Broadening Cognitive Science in Nigeria: Foundation for a New Discipline
Cognitive science has matured into an established discipline, and its development has advanced our understanding of the human brain and cognitive processes. Despite these advancements and popularity, the limited established norms in the field have been in favor of cognitive universals, which is the idea that cognitive processes are consistent and shared across all humans irrespective of their sociocultural or environmental variations. This has limited the chances of improving and understanding variations in cognitive development, particularly among individuals from the majority of the world's population, and may have increased oversight into the unique characteristics of cognitive adaptations shaped by sociocultural and environmental factors. The objective of this paper is to draw insights from a 2-day workshop organized on broadening cognitive science in Nigeria. Inspired by the discussions from the workshop, we identified critical challenges and opportunities at the researcher, participant, and process levels, offering practical strategies for advancing cognitive science in underrepresented regions. We discussed the challenges facing cognitive science research and strategies to solve these challenges in Nigeria, particularly focusing on emerging themes from our workshop. We then discussed pathways for future directions and concluded with final thoughts.
