Reliability in Measuring Head Related Transfer Functions of Hearing Aids
Experimental and Theoretical Investigations of Phonation Threshold Pressure as a Function of Vocal Fold Elongation
The relationship between the vocal fold elongation and the phonation threshold pressure (PTP) was experimentally and theoretically investigated. The PTP values of seventeen excised canine larynges with 0% to 15% bilateral vocal fold elongations in 5% elongation steps were measured using an excised larynx phonation system. It was found that twelve larynges exhibited a monotonic relationship between PTP and elongation; in these larynges, the 0% elongation condition had the lowest PTP. Five larynges exhibited a PTP minimum at 5% elongation. To provide a theoretical explanation of these phenomena, a two-mass model was modified to simulate vibration of the elongated vocal folds. Two pairs of longitudinal springs were used to represent the longitudinal elastin in the vocal folds. This model showed that when the vocal folds were elongated, the increased longitudinal tension would increase the PTP value and the increased vocal fold length would decrease the PTP value. The antagonistic effects contributed by these two factors were found to be able to cause either a monotonic or a non-monotonic relationship between PTP and elongation, which were consistent with experimental observations. Because PTP describes the ease of phonation, this study suggests that there may exist a nonzero optimal vocal fold elongation for the greatest ease for phonation in some larynges.
Impact of Vocal Tract Resonance on the Perception of Voice Quality Changes Caused by Varying Vocal Fold Stiffness
Experiments using animal and human larynx models are often conducted without a vocal tract. While it is often assumed that the absence of a vocal tract has only small effects on vocal fold vibration, it is not actually known how sound production and quality are affected. In this study, the validity of using data obtained in the absence of a vocal tract for voice perception studies was investigated. Using a two-layer self-oscillating physical model, three series of voice stimuli were created: one produced with conditions of left-right symmetric vocal fold stiffness, and two with left-right asymmetries in vocal fold body stiffness. Each series included a set of stimuli created with a physical vocal tract, and a second set created without a physical vocal tract. Stimuli were re-synthesized to equalize the mean F0 for each series and normalized for amplitude. Listeners were asked to evaluate the three series in a sort-and-rate task. Multidimensional scaling analysis was applied to examine the perceptual interaction between the voice source and the vocal tract resonances. The results showed that the presence or absence of a vocal tract can significantly affect perception of voice quality changes due to parametric changes in vocal fold properties, except when the parametric changes in vocal fold properties produced an abrupt shift in vocal fold vibratory pattern resulting in a salient quality change.
Evaluation of the starting point of the Lombard Effect
Speakers increase their vocal effort when their communication is disturbed by noise. This adaptation is termed the Lombard effect. The aim of the present study was to determine whether this effect has a starting point. Hence, the effects of noise at levels between 20 and 65 dB(A) on vocal effort (quantified by sound pressure level) and on both perceived noise disturbance and perceived vocal discomfort were evaluated. Results indicate that there is a Lombard effect change-point at a background noise level (Ln) of 43.3 dB(A). This change-point is anticipated by noise disturbance, and is followed by a high magnitude of vocal discomfort.
Effects of hearing loss on maintaining and switching attention
The ability to intentionally control attention based on task goals and stimulus properties is critical to communication in many environments. However, when a person has a damaged auditory system, such as with hearing loss, perceptual organization may also be impaired, making it more difficult to direct attention to different auditory objects in the environment. Here we examined the behavioral cost associated with maintaining and switching attention in people with hearing loss compared to the normal hearing population, and found a cost associated with attending to a target stream in a multi-talker environment that cannot solely be attributed to audibility issues.
Evidence for Gain Reduction by a Precursor in an On-Frequency Forward Masking Paradigm
A forward masking technique was used to measure cochlear gain reduction which might be consistent with the medial olivocochlear reflex (MOCR). A 4-kHz signal was set at 20 dB SL, and an on-frequency forward masker adjusted to just mask the signal. Adding a pink noise precursor before the signal and masker the level of the masker needed to mask the signal, in contrast to what would be expected from theories such as additivity of masking. The magnitude and pattern of this increase was similar to the increase in signal threshold seen with an off-frequency masker following a precursor.
Challenging One Model With Many Stimuli: Simulating Responses in the Inferior Colliculus
Existing models to explain human psychophysics or neural responses are typically designed for a specific stimulus type and often fail for other stimuli. The ultimate goal for a neural model is to simulate responses to many stimuli, which may provide better insights into neural mechanisms. We tested the ability of modified same-frequency inhibition-excitation models for inferior colliculus neurons to simulate individual neuron responses to both amplitude-modulated sounds and tone-in-noise stimuli. Modifications to the model were guided by receptive fields computed with 2nd-order Wiener kernel analysis. This approach successfully simulated many individual neurons' responses to different types of stimuli. Other neurons suggest limitations and future directions for modeling efforts.
Spatial speech detection for binaural hearing aids using deep phoneme classifiers
Current hearing aids are limited with respect to speech-specific optimization for spatial sound sources to perform speech enhancement. In this study, we therefore propose an approach for spatial detection of speech based on sound source localization and blind optimization of speech enhancement for binaural hearing aids. We have combined an estimator for the direction of arrival (DOA), featuring high spatial resolution but no specialization to speech, with a measure of speech quality with low spatial resolution obtained after directional filtering. The DOA estimator provides spatial sound source probability in the frontal horizontal plane. The measure of speech quality is based on phoneme representations obtained from a deep neural network, which is part of a hybrid automatic speech recognition (ASR) system. Three ASR-based speech quality measures (ASQM) are explored: entropy, mean temporal distance (M-Measure), matched phoneme (MaP) filtering. We tested the approach in four acoustic scenes with one speaker and either a localized or a diffuse noise source at various signal-to-noise ratios (SNR) in anechoic or reverberant conditions. The effects of incorrect spatial filtering and noise were analyzed. We show that two of the three ASQMs (M-Measure, MaP filtering) are suited to reliably identify the speech target in different conditions. The system is not adapted to the environment and does not require a-priori information about the acoustic scene or a reference signal to estimate the quality of the enhanced speech signal. Nevertheless, our approach performs well in all acoustic scenes tested and varying SNRs and reliably detects incorrect spatial filtering angles.
Revisiting Models of Concurrent Vowel Identification: The Critical Case of No Pitch Differences
When presented with two vowels simultaneously, humans are often able to identify the constituent vowels. Computational models exist that simulate this ability, however they predict listener confusions poorly, particularly in the case where the two vowels have the same fundamental frequency. Presented here is a model that is uniquely able to predict the combined representation of concurrent vowels. The given model is able to predict listener's systematic perceptual decisions to a high degree of accuracy.
Pitch of Harmonic Complex Tones: Rate Coding of Envelope Repetition Rate in the Auditory Midbrain
Envelope repetition rate (ERR) is an important cue for the pitch of harmonic complex tones (HCT), especially when the tone consists entirely of unresolved harmonics. Neural synchronization to the stimulus envelope provides a prominent cue for ERR in the auditory periphery, but this temporal code becomes degraded and gives way to rate codes in higher centers. The inferior colliculus (IC) likely plays a key role in this temporal-to-rate code transformation. Here we recorded single IC neuron responses to HCT at varying fundamental frequencies ( ). ERR was manipulated by applying different inter-harmonic phase relationships. We identified a subset of neurons that showed a 'non-tonotopic' rate tuning to ERR between 160 and 1500 Hz. A comparison of neural responses to HCT and sinusoidally amplitude modulated (SAM) noise suggests that this tuning is dependent on the shape of stimulus envelope. A phenomenological model is able to reproduce the non-tonotopic tuning to ERR, and suggests it arises in the IC via synaptic inhibition.
Over-representation of speech in older adults originates from early response in higher order auditory cortex
Previous research has found that, paradoxically, while older adults have more difficulty comprehending speech in challenging circumstances than younger adults, their brain responses track the envelope of the acoustic signal more robustly. Here we investigate this puzzle by using magnetoencephalography (MEG) source localization to determine the anatomical origin of this difference. Our results indicate that this robust tracking in older adults does not arise merely from having the same responses as younger adults but with larger amplitudes; instead, they recruit additional regions, inferior to core auditory cortex, with a short latency of ~30 ms relative to the acoustic signal.
Auditory brainstem response wave III is correlated with extracellular field potentials from nucleus laminaris of the barn owl
The auditory brainstem response (ABR) is generated in the auditory brainstem by local current sources, which also give rise to extracellular field potentials (EFPs). The origins of both the ABR and the EFP are not well understood. We have recently found that EFPs, especially their dipole behavior, may be dominated by the branching patterns and the activity of axonal terminal zones [1]. To test the hypothesis that axons also shape the ABR, we used the well-described barn owl early auditory system. We recorded the ABR and a series of EFPs between the brain surface and nucleus laminaris (NL) in response to binaural clicks. The ABR and the EFP within and around NL are correlated. Together, our data suggest that axonal dipoles within the barn owl nucleus laminaris contribute to the ABR wave III.
Across-frequency processing of interaural time and level differences in perceived lateralization
Interaural time and level differences (ITDs and ILDs) contribute to the localization of sound sources; however, reverberation or use of cochlear implants diminishes the role of ITDs. Intracranial lateralization was investigated in normal-hearing listeners using correlated or uncorrelated narrowband noises, where ITDs and/or ILDs from a typical headrelated transfer function were applied. Results showed that ITDs and ILDs contributed to lateralization for correlated noises. ILDs contributed to lateralization for uncorrelated noises. Frequency-dependent ITD and ILD weighting occurred. These data help understand the across-channel processing of ITDs and ILDs, particularly when ITDs may not be available to the listener.
A Model for Statistical Regularity Extraction from Dynamic Sounds
To understand our surroundings, we effortlessly parse our sound environment into sound sources, extracting invariant information-or regularities-over time to build an internal representation of the world around us. Previous experimental work has shown the brain is sensitive to many types of regularities in sound, but theoretical models that capture underlying principles of regularity tracking across diverse sequence structures have been few and far between. Existing efforts often focus on sound patterns rather the stochastic nature of sequences. In the current study, we employ a perceptual model for regularity extraction based on a Bayesian framework that posits the brain collects statistical information over time. We show this model can be used to simulate various results from the literature with stimuli exhibiting a wide range of predictability. This model can provide a useful tool for both interpreting existing experimental results under a unified model and providing predictions for new ones using more complex stimuli.
Predicting Speech Intelligibility Based on Across-Frequency Contrast in Simulated Auditory-Nerve Fluctuations
The present study proposes a modeling approach for predicting speech intelligibility for normal-hearing (NH) and hearing-impaired (HI) listeners in conditions of stationary and fluctuating interferers. The model combines a non-linear model of the auditory periphery with a decision process that is based on the contrast across characteristic frequency (CF) after modulation analysis in the range of the fundamental frequency of speech. Specifically the short-term across-CF correlation between noisy speech and noise alone is assumed to be inversely related to speech intelligibility. The model provided highly accurate predictions for NH listeners as well as largely plausible effects in response to changes in presentation level. Furthermore, the model could account for some of the main features in the HI data solely by adapting the peripheral model using a simplistic interpretation of the listeners' hearing thresholds. The model's predictive power may be substantially improved by refining the interpretation of the HI listeners' profiles and the model may thus p rovide a valuable basis for quantitatively modeling effects of outer hair-cell and inner hair-cell loss on speech intelligibility.
Towards a self-rating tool of the inability to produce soft voice based on nonlinear events: a preliminary study
The purpose of this preliminary study was to investigate the feasibility of a tool to compare a severity index of nonlinear events and vocal self-rating over a long period of time. One hundred and ninety-seven phonations were analyzed to quantify the severity of instabilities in the voice attributed to nonlinear dynamic phenomena, including voice breaks, subharmonics, and frequency jumps. Instabilities were first counted; then a severity index was calculated for the instabilities in each phonation. The two quantities were compared to the subject's autoperceptual rating. Generally speaking, the measures derived from nonlinear dynamic analysis of the high-pitched, soft phonations followed the subject's own rating of inability to produce soft voice. These preliminary single subject results provide a foundation for future multi-subject studies to formulate acoustic and autoperceptual measures for the fatiguing effects of prolonged speaking in vocally demanding professions. However, given the number of observations, the results are still useful in showing general relationships. While future work should add additional subjects, a study providing preliminary evidence is useful before attempting to undertake a multi-subject study with complex analysis (i.e., individually selecting the nonlinear events) and with a long observation duration (days, weeks, and months) of subject.
A comparative study of eight human auditory models of monaural processing
A number of auditory models have been developed using diverging approaches, either physiological or perceptual, but they share comparable stages of signal processing, as they are inspired by the same constitutive parts of the auditory system. We compare eight monaural models that are openly accessible in the Auditory Modelling Toolbox. We discuss the considerations required to make the model outputs comparable to each other, as well as the results for the following model processing stages or their equivalents: Outer and middle ear, cochlear filter bank, inner hair cell, auditory nerve synapse, cochlear nucleus, and inferior colliculus. The discussion includes a list of recommendations for future applications of auditory models.
