LANGUAGE AND SPEECH

Articulatory Settings of Japanese-English Bilinguals
Wilson I, Perkins J, Sato A and Ishii D
Articulatory setting, the underlying tendency of the articulators to assume a certain overall configuration during speech, is language-specific and can be measured by observing the inter-speech posture (ISP) of the articulators during the brief pauses between utterances. To determine a given language's ISP, observing bilingual speakers in each of their languages is ideal, so that questionable normalization across different vocal tracts does not have to be done. In this study, four English-Japanese bilinguals of various English proficiencies participated. To this end, we quantitatively tested for the existence of two settings in each bilingual, and the results showed no systematic differences between the tongue ISPs of Japanese and English. We also tried to shed light on the origin of articulatory setting, which is thought to emerge from the configurations of the most frequent phonemes of a language. We qualitatively compared the predicted ISP based on Japanese spoken frequency of occurrence data with the mean measured ISP in Japanese. Ultrasound movies of the tongue were recorded and analyzed, and the results showed that for three participants, the predicted ISP's tongue tip and blade were substantially lower than for the mean measured ISP. The mean measured ISPs also displayed greater variability in English (L2) than in Japanese (L1), much more so in the lower proficiency English speakers.
Cantonese Tone Perception by Punjabi Speakers of Cantonese: Evidence and Implications for the Perceptual Assimilation Model of Second Language Speech Learning
Choi W, Yu D, Chan KH and Lai VKW
This study tested whether the Perceptual Assimilation Model of Second Language Speech Learning (PAM-L2) predicts second language (L2) Cantonese tone discrimination across different perceptual modes. Punjabi speakers of Cantonese completed the Cantonese tone assimilation and discrimination tasks. In the assimilation task, the Punjabi listeners assimilated the Cantonese tones as two-category (TC), single-category (SC), uncategorized-categorized without overlap (UC-no), and uncategorized-categorized with partial overlap (UC-po) pairs, yielding testable predictions for PAM-L2 in the discrimination task (TC = UC-no > UC-po > SC). In the discrimination task, the model-driven predictions were largely supported in the double-talker context but not in the single-talker and pure tone contexts. These results suggest that PAM-L2 applies to phonological but not non-phonological discrimination of L2 tones. Moreover, our findings indicate that the distinction between partial and complete overlap may not be necessary for UC pairs.
Articulatory Methods for the Study of Second Language Speech
Colantoni L, Kochetov A and Steele J
In the introduction to this special issue on articulatory approaches to the study of L2 speech, we first highlight the interest and unique contributions of such methods to the investigation of speech production among second language speakers. This is followed by a brief overview of the four articulatory methods-electropalatography, nasometry, magnetic resonance imaging, and ultrasound-featured in the experimental studies presented in the seven articles that constitute the issue. We then turn to an overview of the speech phenomena investigated-consonants (laterals, rhotics), vowels (individual as well as entire inventories), and sequences (both phonemic vowel-nasal sequences as well as coarticulation in phonetic sequences)-as produced by L1 speakers of various target languages (L1s: English[-Croatian], Czech, French, Japanese, Mandarin, Spanish; Target languages: English, French, Swedish). This introduction concludes with a summary of recurring acquisition themes (L1-based crosslinguistic influence, relative difficulty and target-likeness, inter-learner variability including as conditioned by individual differences) and the general speech phenomena studied (articulatory settings, gestural timing/coarticulation, effects of phonetic context).
Disfluencies in Public and Private Speech
Verdonik D, Rupnik P and Ljubešić N
This study investigates how speakers adapt their use of disfluencies in public versus private speech settings. Existing studies suggest systematic differences in disfluency rates, depending on who we are communicating with, how interactive the communication is, how difficult the topic is, whether the interaction is broadcast or not, and whether the speech is pre-scripted or not. We aim to improve this understanding through analysis in the Slovenian language, using data from the Training Corpus of Spoken Slovenian ROG-Artur. We investigate whether quantitative differences in the use of disfluency exist between private and public speech, and aim to explain these differences by investigating the relationship between disfluency functions and the physical, social, cognitive, and other factors influencing communication behavior. Our results revealed significant differences in disfluency patterns: disfluencies, in general, are more frequent in private speech, whereas filled pauses, unrepaired pronunciations and blocks, are more common in public speech. We group disfluency functions into two general categories. In contextual analysis, we interpret that speakers reduce disfluencies in public speech due to its high relevance, formal expectations, partial pre-scripting, time constraints and advanced speaker skills, while the higher frequency of filled pauses, unrepaired pronunciations and blocks in public speech reflect the impact of longer dialog turns, time constraints and emotional stress. The findings of this study should be interpreted with caution, given the interpretative nature of qualitative analysis and the potential confounding effect of the involvement of different speakers in the public and private speech samples.
Just How Contrastive Is Word-Initial Consonant Length? Exploring the Itunyoso Triqui Spontaneous Speech Corpus
DiCanio C and Sharp J
Itunyoso Triqui (Otomanguean: Mexico) has a typologically uncommon contrast between singleton and geminate consonants which occurs in word-initial position of monosyllabic words. In this paper, we examine how functional factors contribute to the observed phonetic variation in production for this marked contrast. Geminate and singleton onsets are not equally distributed in the language-singleton onsets greatly outnumber geminate onsets. Moreover, the distribution of geminate and singleton onsets varies by manner of articulation and consonant onset. Functional factors also vary across the contrast space. In our first study, we focus on durational data from a smaller, 2-hr corpus and test the degree to which functional factors (Shannon entropy, functional status, lexical competitor size, and segment frequency) influence the production of the contrast. With the exception of entropy, we find that several of these factors play a role in predicting the robustness/hyperarticulation of the contrast realization. Content words with onset singleton obstruents are more likely to be lengthened than content words with onset singleton sonorants. Segments with a larger token frequency from a larger, 90K word corpus are more likely to be hyperarticulated. In our second study, we examine how the observed durational factors lead to differential patterns of consonant undershoot by examining patterns of lenition. Combined, our findings demonstrate how functional factors influence language variation and may lead toward particular diachronic trajectories in the evolution of rare sound contrasts like these in human language.
The Role of Prosody in Expressing Subjective and Objective Causality in English
Hu N, Chen A, Quené H and Sanders TJM
Numerous studies have established that prosody plays an important role in expressing meanings and functions. However, it remains unknown whether prosody is employed to convey the distinction between subjective causality (CLAIM-ARGUMENT) and objective causality (CONSEQUENCE-CAUSE). This study aimed to address this issue in English, where both types of causality are typically expressed using the same connective. Two production experiments were conducted, focusing on causality in backward order ( "because" ) and in forward order ( "so" ), respectively. The results show that subjective causality exhibited a larger F0 range, less integrated prosody, and a distinctive F0 contour shape compared with objective causality. These findings highlight the role of prosody in expressing subjective and objective causality in the absence of explicit lexical markers in English.
Biases for Vowel Harmony Over Disharmony in Phoneme Monitoring
Finley S
Previous research exploring the learnability of vowel harmony (a phonetically natural pattern) and vowel disharmony (a phonetically unnatural pattern) has shown mixed evidence for a naturalness bias. This study aims to clarify these mixed results by introducing a more sensitive and indirect measure of learning-a modified phoneme monitoring task. Participants listened to CV-me/CV-mo words and pressed a button to indicate the final vowel (either [e] or [o]). In the first set of trials, participants responded to words that either always obeyed harmony (HarmonyFirst) or always obeyed disharmony (DisharmonyFirst). In the second set of trials, the rule switched. Results from two studies support a learning bias for vowel harmony; participants generally showed greater decreases in response times for harmonic blocks, and greater increases in response time when the rule switched from vowel harmony to disharmony.
French Speakers Prefer Prosody Over Statistics to Segment Speech
Birulés J, Marimon M, Duroyal A, Vilain A, Bailly G and Fort M
To segment words in unfamiliar speech, listeners are known to exploit both native prosodic cues and statistical cues available in the speech signal. However, how and when these cues are combined remains a matter of debate. Here, we studied how transitional probabilities (TPs) and prosodic phrasal boundaries are combined by French speakers to segment words. Since French does not have lexical stress, prosodic phrasal boundaries unambiguously signal word boundaries, providing a unique possibility to test whether prosodic cues can overcome statistical ones, and constrain further statistically based segmentation. We tested French adults in an artificial speech segmentation task, manipulating the consistency between prosodic and TP cues, signaling either the same or different word boundaries. Results showed that participants favored prosodic phrasal boundaries over TPs, regardless of exposure time to the speech stream (Experiment 1: 3.5 minutes; Experiment 2: 7 min), supporting a prosodically driven statistical segmentation of the speech stream.
Learning Accurate Onset Clusters: Perception Lags Behind Production
Moore-Cantwell C, Tessier AM and Farris-Trimble A
This study investigates young school-aged children's knowledge (at 4-7 years) of accurate English word-initial onset clusters. By this age, we expect children to be mostly accurate in producing #CC clusters (rather than repairing them with deletion or epenthesis). We ask how well can they recognize and reject cluster repair errors, in both real and nonce word tasks. The results suggest that these learners' cluster judgment skills lag behind their cluster production abilities, and that asymmetries in error types do not overall align between the two domains. Perceptual errors are made most often when comparing clusters with epenthesis repairs, not deletion, and the cluster's sonority profile does not directly influence error rates. After comparing these findings with similar results from adult L2 English speakers as well, we discuss the ways in which issues like recoverability, salience, and contiguity can account for our findings. We also suggest that more work on phonological knowledge and judgments in older children will provide a broader understanding of sound pattern acquisition across development.
Perceptual Salience of Tones, Vowels, and Consonants in Mandarin Speech Errors
Liu Z, Chitoran I and Turco G
The present study examines the perceptual salience of tonal speech errors compared with segmental errors (consonant and vowel). Tonal errors are observed less often than segmental errors. We thus hypothesize that tone errors are more easily ignored during transcription tasks because tones may have lower perceptual salience relative to segments. We test this hypothesis in Mandarin, via a number reconstruction task. Sixty-nine Mandarin native listeners heard sequences of numbers in which one number was altered by substituting its vowel, consonant, or tone. They were asked to identify which number that was. We found that Mandarin listeners identified the original number most accurately when consonants were substituted. They were the least accurate when vowels were substituted. For tone substitution, the accuracy was lower than for consonant substitution, but not significantly different from vowel substitution. Reaction times to identify a number with tone substitution were comparable to those for other types of substitutions. The results show that, contrary to our hypothesis, tone errors are not perceptually less salient than segmental errors. Specifically, tone errors are as salient as vowel errors and more salient than consonant errors, suggesting a similar phonological status shared by tone, vowel, and consonant in constraining word selection.
The Perception of Lexical Pitch Accent in South Kyungsang Korean: The Relevance of Accent Shape
Joo H and D'Imperio M
In this study, we tested whether the perception of pitch contours within a lexical pitch accent can be better understood through tonal targets in the Autosegmental-Metrical (AM) model or as an entire tonal configuration identification. Specifically, a categorization experiment was conducted to see how South Kyungsang Korean (SKK) listeners perceive their high (H) and rising (LH) lexical pitch accents. Auditory stimuli were manipulated depending on H peak alignment (earlier vs. later), rise shape (domed or "convex" vs. scooped or "concave"), or segmental duration (shorter vs. longer). Results showed that F0 rise shape and segmental duration influenced SKK listeners' categorization, while no effect of peak alignment was observed. Specifically, they responded to more scooped shapes as an LH, while more domed shapes were mainly assigned to H responses. Moreover, shorter duration induced a H categorization, while longer duration was related to an LH. Results suggest that SKK listeners use both F0 shape and segmental duration as important cues for tonal contrast, though F0 shape shows stronger categorical effect than duration. Thus, F0 shape information is important to determine phonological representation of lexical pitch accents, as opposed to strict tonal alignment defined in Autosegmental-Metrical theory.
The Effects of Perceived Ethnicity and Prosodic Accuracy on Intelligibility, Comprehensibility, and Accentedness in L2 Mandarin Chinese
Squizzero R
Separate traditions of research have examined the impact of linguistic factors and social factors on the intelligibility, comprehensibility, and accentedness of second language (L2) speech, but studies that simultaneously investigate social and linguistic factors are rarely conducted on L2 languages other than English and outside of Western social and cultural environments. This study explores the effects of utterance-level prosody and speaker ethnicity on perception of L2 Mandarin Chinese speech. First language (L1) Mandarin listeners ( = 292) were asked to select the correct transcriptions of each of six sentences spoken by two male L2 Mandarin speakers who differed in their prosodic accuracy. While listening to each set of sentences, a picture of an Asian face or a White face was displayed on the listener's screen. Results indicate that participants were significantly more likely to select the correct transcription of each sentence both when they heard the speaker with high prosodic accuracy and when they believed that the speaker was ethnically Chinese. Listeners also rated speakers' comprehensibility, accentedness, and perceived personal characteristics; listeners rated a speaker with higher prosodic accuracy or believed to be ethnically Chinese as more comprehensible, less accented, and higher on perceived personal characteristics. This study demonstrates that a link between linguistic and social factors exists in processing L2 speech, even outside of the social, cultural, and linguistic environments typically used as a setting for investigation of L2 speech perception, and it explores implications for L2 Mandarin pronunciation teaching.
Unveiling the Relationship Between L2 Utterance Fluency and Perceived Fluency in Monologic and Dialogic Speaking
Gao J and Sun PP
The study explored the relationship between L2 utterance fluency and perceived fluency in monologic and dialogic speaking. A total of 136 Chinese university English learners with diverse L2 proficiency levels and three experienced raters participated in the study. The study employed a mixed-methods approach integrating quantitative (regression analysis) and qualitative (stimulated recalls) analyses. In the monologic task, all utterance fluency dimensions (speed, breakdown, and repair fluency measures) significantly predicted perceived fluency ratings, except for filled pause rate and false start rate. Breakdown fluency measures, particularly silent pause measures, had the most substantial impact on perceived fluency ratings. In the dialogic task, breakdown fluency emerged as the sole significant predictor for perceived fluency scores, overshadowing the predictive impact of speed and repair fluency measures. The temporal measure of turn-taking did not significantly affect perceived fluency scores. Stimulated recalls were generally consistent with the quantitative results and revealed additional factors-content quality, pronunciation, and comprehensibility-that influenced fluency perceptions. The study highlighted the contextual effect on the relationship between utterance fluency and perceived fluency, suggesting that L2 speaking proficiency rating rubrics should be adjusted to account for differences between monologic and dialogic speaking.
Language Attitudes and Stereotypes Condition the Processing of Contact-Induced Linguistic Variants
Barnes S and Chappell W
This study examines the effect of language attitudes and stereotypes on vowel perceptions by two groups of listeners from Asturias, Spain that differ in their relationship with the languages present in their communities: Spanish, the national majority language, and Asturian, the regional minority language. The responses of 165 participants from the Nalón Valley (a mining area in the region) were compared with those of 156 listeners from Gijón (the largest urban area) as they categorized words containing synthesized productions of Spanish [o] and Asturian [u] in a task that combined binary forced-choice identification and visual priming. The results of a mixed-effects regression model reveal that, for the Nalón Valley group, positive attitudes toward Asturian result in higher rates of /u/ selection, while, for the Gijón group, positive attitudes toward Asturian intersect with negative stereotypes about urban Asturians who avoid the regional language. We propose that spreading activation and weighting in exemplar-based models can account for these different findings: the greater use of Asturian in the Nalón Valley results in weaker and more varied links between vowel exemplars and social properties, limiting the effect of visual priming. However, a heavier weight exists between stereotypes of urban Asturians and Spanish exemplars in the city, resulting in a priming effect in Gijón that does not emerge in the Nalón Valley. We conclude that individual experience, attitudes, and stereotypes work together to condition speech perception uniquely in light of the local context.
Perceptual Style-Shifting Across Singing and Speech: Music Activates Pop Song English for NZ Listeners
Gibson A and Hay J
American singing accents are prevalent in popular music throughout the English-speaking world. Singing with an American-influenced phonological style is a supralocal norm, referred to here as Pop Song English (PSE). This article presents two perception experiments that explore New Zealand (NZ) listeners' speech processing in musical and non-musical contexts. An analysis of the Phonetics of Popular Song corpus provides the foundation for the first experiment, revealing that sung dress and spoken trap have similar values for F1 in NZ. Experiment 1 then examines the categorization of these phonemes for words that fall on a continuum between and . In Experiment 2, a lexical decision task, NZ listeners hear words and nonwords produced by a New Zealand and an American speaker. In both experiments, results show that listeners are influenced by the presence of music, undergoing a . In Experiment 1, their perceptual phoneme boundary shifts to a more open position in the Music condition, and in Experiment 2, they exhibit a facilitation in reaction time to the US voice in the musical compared with the non-musical conditions. PSE is thus not only the norm for singing in NZ, it is also a norm for listening to song, represented in the minds of the general music-listening public. This finding extends our understanding of how speech perception depends on context. Speech and song are two highly distinct and perceptually contrastive contexts of language use, and listeners employ knowledge of how linguistic variation maps onto these contexts to resolve ambiguities in the speech signal.
Second-Language Acquisition and First-Language Attrition of Speech: The Production of Arabic and English Short Vowels
Kornder L, Alharbi AS and Foltz A
This study investigates if two groups of experienced late bilinguals (Arabic-English, English-Arabic) produce the Arabic vowels /ɪ, u, a/ and the English vowels /ɪ, ʊ, æ/ with nativelike formant values (F1, F2) compared with Arabic and English monolinguals, respectively. We aimed to characterize the relationship between second-language (L2) acquisition and first-language (L1) attrition of vowels, that is, does nativelike acquisition of an L2 vowel correspond to attrition of a phonetically similar L1 vowel, and vice versa? Moreover, we explored if nativelikeness of bilingual vowel productions is influenced by the predictor variable sound discrimination aptitude. Results show that bilinguals who produce nativelike L2 vowels are also able to maintain native L1 productions, suggesting that an increased L2 proficiency does not inevitably entail a decline in L1 proficiency.
The Role of Phonological Factors in the Processing of Polish Phonotactics
Orzechowska P, Porębski A and Nowak M
One of the predominant questions asked in phonological research refers to the way in which strings of vowels and consonants are perceived and processed by native speakers. In this paper, we make an attempt at uncovering the mental processes that underlie the online processing of phonotactics in Polish; a language featuring an unusual array of strings of consonants. We report on a reaction time experiment using nonce monosyllables with final consonant clusters and identify phonological factors that determine their acceptability. The factors include cluster (non-)existence in the lexicon of Polish, cluster well-formedness in terms of the universally preferred sonority slope, and the quality of the nuclear vowel. The findings testify to the facilitative role of cluster existence and well-formedness on phonotactic intuitions. That is, universally preferred and existent clusters are easily identified as possible and involve the shortest reaction times. Moreover, we detected a systematic perceptual contribution of vowels, whereby the front-back dimension (rather than sonority-related high-low dimension) seems to facilitate the decision-making process.
The Articulatory and Acoustic Variability in Putonghua Onset /r/
Luo S
The debate about the rhotic sound in Standard Mandarin (i.e., Putonghua) focuses on its articulation as a retroflex and its classification as either a fricative or an approximant. To address these questions, this study examines the syllable-initial r-sound, quantifying tongue contours for the r-phoneme itself and in relation to the retroflex sibilants (i.e., /ʂ, tʂ, tʂ/). Both established and novel articulatory and acoustic measures are employed to assess their effectiveness in distinguishing phonetic contrasts. The ultrasound imaging results reveal that Putonghua onset /r/ is articulated with either a tip-up retroflex or a tip-down bunched tongue posture, specifying both coronal and dorsal gestures. Compared to /ʂ, tʂ, tʂ/, the syllable-initial /r/ is produced with a greater degree of tip-up retroflexion and more pronounced tongue inflections, supported by vertical tongue displacement and discrete Fourier transform measurements. Acoustically, Putonghua /r/ is most often produced without frication and is characterized by low F3, F3F2 distance and zero crossing rates. The results find that even the fricated /r/ variant remains substantially distinct from sibilants both in tongue gestures and acoustic properties. The study argues that this phoneme should be classified as a retroflex approximant, transcribed as [ɻ], rather than a fricative [ʐ]. The results contribute substantial evidence to the limited articulatory corpus and enhance the understanding of the Putonghua rhotic's articulatory-acoustic correspondence, highlighting the importance of contextualizing phonetic variability within the phonology of the language.
Sources of Intelligibility of Distant Languages: An Empirical Study
Milička J, Marklová A, Láznička M, Diatka V, Bednářová H, Matela J and Škrabal M
Research into iconicity, systematicity, and sound-symbolism has revealed that the connection between linguistic form and meaning is not completely arbitrary. In the present study, native Czech speakers, unfamiliar with Hindi, were presented with a task in which they had to match Hindi words with their corresponding Czech translations. The words were randomly selected from a Hindi corpus. Despite the considerable linguistic gap between the two languages, the analysis showed that the Czech participants were able to accurately discern the meanings of approximately 60% of the Hindi word pairs, surpassing the 50% success rate that would be expected by random guessing alone. This experiment was subsequently replicated using Turkish, Japanese, and Latvian words, demonstrating the robustness of this phenomenon across different languages. In the case of a closer language like Latvian, the success rate reached 80%. However, even a distant language such as Japanese reached the 60% success level. Furthermore, the study explored potential factors influencing intelligibility. Data collected from a total of 1,128 participants found that the phonological similarity of Czech words and their translation, word length alignment, presence of cognates, and the way the trials were presented had a significant effect on the success rate of guessing the correct translation across all four languages. In addition, language-specific effects were identified.
Systematicity in Variability: English Coda Laterals of English-Malay Bilinguals in Multi-Accent Singapore
Sim JH and Post B
Outcomes of early phonological acquisition in multi-accent contexts can be especially wide-ranging, raising the question of whether children exposed to multiple accents in one community are building the same linguistic systems. This present study investigates the English coda clear laterals in the spontaneous, mother-directed speech of English-Malay early bilingual preschoolers raised in multi-accent Singapore. Previous work has shown that these children were exposed to highly variable input involving three different English coda /l/ variants within and outside of their ethnic community. To elucidate the complex nature of language acquisition in such diverse settings, we examine both individual differences and group behaviors. Our findings reveal that despite the considerable between- and within-child variation, production patterns are generally systematic. Malay children with close Chinese peers, however, exhibited greater variability and unpredictability in their production, revealing word-specific inconsistencies that suggest a restructuring of or instability in their phonological representations. This study underscores the complexity of phonological development in multi-accent contexts and highlights the challenges in predicting the contributors of these variable outcomes.
Exploring the Relationship Between Mental Boundary Strength and Phonetic Accommodation
Gessinger I, Becker N and Cowan BR
We motivate a possible relationship between the psychological concept of mental boundaries and the linguistic phenomenon of phonetic accommodation, proposing that thinner boundaries may indicate a greater disposition to phonetically adapt to an interlocutor. To enable research on this relationship with German speakers, we translated the English short version of the Boundary Questionnaire (BQ-Sh), an established instrument for measuring the strength of mental boundaries, and demonstrated that the resulting German adaptation (BQ-Sh-G) can be used equivalently to the BQ-Sh. As the Big Five personality traits have previously been considered in research on both mental boundaries and phonetic accommodation, we explored the relationship between the BQ-Sh-G and the NEO-Five Factor Inventory. Consistent with previous literature, the BQ-Sh-G score correlated positively with Neuroticism and Openness, as well as negatively with Conscientiousness. We collected BQ-Sh-G scores from participants of an experiment on phonetic accommodation in a human-computer interaction context, specifically investigating the realization of the word ending 〈-ig〉 and the intonation of wh-questions in German. The analysis revealed a tendency for thicker mental boundaries to correspond with more convergence to 〈-ig〉 variants. Taking into account the results of previous work exploring the influence of the Big Five on the same data, we conclude that speakers may accommodate to different types of phonetic features depending on different personality traits. We encourage future work to investigate this further, while also exploring the predictive potential of the boundary construct with respect to a general disposition to phonetic accommodation, that is, examining a large number of phonetic features simultaneously.