About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

14 Nov 2022 at 02:03
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 14 Nov 2022 at 02:03 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2022-11-09

Kim Y, A Thompson (2022)

An Acoustic-Phonetic Approach to Effects of Face Masks on Speech Intelligibility.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study aimed to examine the effects of wearing a face mask on speech acoustics and intelligibility, using an acoustic-phonetic analysis of speech. In addition, the effects of speakers' behavioral modification while wearing a mask were examined.

METHOD: Fourteen female adults were asked to read a set of words and sentences under three conditions: (a) conversational, mask-off; (b) conversational, mask-on; and (c) clear, mask-on. Seventy listeners rated speech intelligibility using two methods: orthographic transcription and visual analog scale (VAS). Acoustic measures for vowels included duration, first (F1) and second (F2) formant frequency, and intensity ratio of F1/F2. For consonants, spectral moment coefficients and consonant-vowel (CV) boundary (intensity ratio between consonant and vowel) were measured.

RESULTS: Face masks had a negative impact on speech intelligibility as measured by both intelligibility ratings. However, speech intelligibility was recovered in the clear speech condition for VAS but not for transcription scores. Analysis of orthographic transcription showed that listeners tended to frequently confuse consonants (particularly fricatives, affricates, and stops), rather than vowels in the word-initial position. Acoustic data indicated a significant effect of condition on CV intensity ratio only.

CONCLUSIONS: Our data demonstrate a negative effect of face masks on speech intelligibility, mainly affecting consonants. However, intelligibility can be enhanced by speaking clearly, likely driven by prosodic alterations.

RevDate: 2022-11-02

Baker CP, Sundberg J, Purdy SC, et al (2022)

Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering.

Logopedics, phoniatrics, vocology [Epub ahead of print].

Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence.Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences.Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal.Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics.

RevDate: 2022-10-31

Easwar V, Purcell D, Eeckhoutte MV, et al (2022)

The Influence of Male- and Female-Spoken Vowel Acoustics on Envelope-Following Responses.

Seminars in hearing, 43(3):223-239 pii:sih00931.

The influence of male and female vowel characteristics on the envelope-following responses (EFRs) is not well understood. This study explored the role of vowel characteristics on the EFR at the fundamental frequency (f0) in response to the vowel /ε/ (as in "head"). Vowel tokens were spoken by five males and five females and EFRs were measured in 25 young adults (21 females). An auditory model was used to estimate changes in auditory processing that might account for talker effects on EFR amplitude. There were several differences between male and female vowels in relation to the EFR. For male talkers, EFR amplitudes were correlated with the bandwidth and harmonic count of the first formant, and the amplitude of the trough below the second formant. For female talkers, EFR amplitudes were correlated with the range of f0 frequencies and the amplitude of the trough above the second formant. The model suggested that the f0 EFR reflects a wide distribution of energy in speech, with primary contributions from high-frequency harmonics mediated from cochlear regions basal to the peaks of the first and second formants, not from low-frequency harmonics with energy near f0. Vowels produced by female talkers tend to produce lower-amplitude EFR, likely because they depend on higher-frequency harmonics where speech sound levels tend to be lower. This work advances auditory electrophysiology by showing how the EFR evoked by speech relates to the acoustics of speech, for both male and female voices.

RevDate: 2022-10-27

Choi MK, Yoo SD, EJ Park (2022)

Destruction of Vowel Space Area in Patients with Dysphagia after Stroke.

International journal of environmental research and public health, 19(20): pii:ijerph192013301.

Dysphagia is associated with dysarthria in stroke patients. Vowel space decreases in stroke patients with dysarthria; destruction of the vowel space is often observed. We determined the correlation of destruction of acoustic vowel space with dysphagia in stroke patients. Seventy-four individuals with dysphagia and dysarthria who had experienced stroke were enrolled. For /a/, /ae/, /i/, and /u/ vowels, we determined formant parameter (it reflects vocal tract resonance frequency as a two-dimensional coordinate point), formant centralization ratio (FCR), and quadrilateral vowel space area (VSA). Swallowing function was assessed using the videofluoroscopic dysphagia scale (VDS) during videofluoroscopic swallowing studies. Pearson's correlation and linear regression were used to determine the correlation between VSA, FCR, and VDS. Subgroups were created based on VSA; vowel space destruction groups were compared using ANOVA and Scheffe's test. VSA and FCR were negatively and positively correlated with VDS, respectively. Groups were separated based on mean and standard deviation of VSA. One-way ANOVA revealed significant differences in VDS, FCR, and age between the VSA groups and no significant differences in VDS between mild and moderate VSA reduction and vowel space destruction groups. VSA and FCR values correlated with swallowing function. Vowel space destruction has characteristics similar to VSA reduction at a moderate-to-severe degree and has utility as an indicator of dysphagia severity.

RevDate: 2022-10-24

Hussain RO, Kumar P, NK Singh (2022)

Subcortical and Cortical Electrophysiological Measures in Children With Speech-in-Noise Deficits Associated With Auditory Processing Disorders.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The aim of this study was to analyze the subcortical and cortical auditory evoked potentials for speech stimuli in children with speech-in-noise (SIN) deficits associated with auditory processing disorder (APD) without any reading or language deficits.

METHOD: The study included 20 children in the age range of 9-13 years. Ten children were recruited to the APD group; they had below-normal scores on the speech-perception-in-noise test and were diagnosed as having APD. The remaining 10 were typically developing (TD) children and were recruited to the TD group. Speech-evoked subcortical (brainstem) and cortical (auditory late latency) responses were recorded and compared across both groups.

RESULTS: The results showed a statistically significant reduction in the amplitudes of the subcortical potentials (both for stimulus in quiet and in noise) and the magnitudes of the spectral components (fundamental frequency and the second formant) in children with SIN deficits in the APD group compared to the TD group. In addition, the APD group displayed enhanced amplitudes of the cortical potentials compared to the TD group.

CONCLUSION: Children with SIN deficits associated with APD exhibited impaired coding/processing of the auditory information at the level of the brainstem and the auditory cortex.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21357735.

RevDate: 2022-10-24

Bochner J, Samar V, Prud'hommeaux E, et al (2022)

Phoneme Categorization in Prelingually Deaf Adult Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Phoneme categorization (PC) for voice onset time and second formant transition was studied in adult cochlear implant (CI) users with early-onset deafness and hearing controls.

METHOD: Identification and discrimination tasks were administered to 30 participants implanted before 4 years of age, 21 participants implanted after 7 years of age, and 21 hearing individuals.

RESULTS: Distinctive identification and discrimination functions confirmed PC within all groups. Compared to hearing participants, the CI groups generally displayed longer/higher category boundaries, shallower identification function slopes, reduced identification consistency, and reduced discrimination performance. A principal component analysis revealed that identification consistency, discrimination accuracy, and identification function slope, but not boundary location, loaded on a single factor, reflecting general PC performance. Earlier implantation was associated with better PC performance within the early CI group, but not the late CI group. Within the early CI group, earlier implantation age but not PC performance was associated with better speech recognition. Conversely, within the late CI group, better PC performance but not earlier implantation age was associated with better speech recognition.

CONCLUSIONS: Results suggest that implantation timing within the sensitive period before 4 years of age partly determines the level of PC performance. They also suggest that early implantation may promote development of higher level processes that can compensate for relatively poor PC performance, as can occur in challenging listening conditions.

RevDate: 2022-10-20

Skrabal D, Rusz J, Novotny M, et al (2022)

Articulatory undershoot of vowels in isolated REM sleep behavior disorder and early Parkinson's disease.

NPJ Parkinson's disease, 8(1):137.

Imprecise vowels represent a common deficit associated with hypokinetic dysarthria resulting from a reduced articulatory range of motion in Parkinson's disease (PD). It is not yet unknown whether the vowel articulation impairment is already evident in the prodromal stages of synucleinopathy. We aimed to assess whether vowel articulation abnormalities are present in isolated rapid eye movement sleep behaviour disorder (iRBD) and early-stage PD. A total of 180 male participants, including 60 iRBD, 60 de-novo PD and 60 age-matched healthy controls performed reading of a standardized passage. The first and second formant frequencies of the corner vowels /a/, /i/, and /u/ extracted from predefined words, were utilized to construct articulatory-acoustic measures of Vowel Space Area (VSA) and Vowel Articulation Index (VAI). Compared to controls, VSA was smaller in both iRBD (p = 0.01) and PD (p = 0.001) while VAI was lower only in PD (p = 0.002). iRBD subgroup with abnormal olfactory function had smaller VSA compared to iRBD subgroup with preserved olfactory function (p = 0.02). In PD patients, the extent of bradykinesia and rigidity correlated with VSA (r = -0.33, p = 0.01), while no correlation between axial gait symptoms or tremor and vowel articulation was detected. Vowel articulation impairment represents an early prodromal symptom in the disease process of synucleinopathy. Acoustic assessment of vowel articulation may provide a surrogate marker of synucleinopathy in scenarios where a single robust feature to monitor the dysarthria progression is needed.

RevDate: 2022-10-20

Zhang T, He M, Li B, et al (2022)

Acoustic Characteristics of Cantonese Speech Through Protective Facial Coverings.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00269-7 [Epub ahead of print].

OBJECTIVES: Protective facial coverings (PFCs) such as surgical masks attenuate speech transmission and affect speech intelligibility, which is reported in languages such as English and German. The present study intended to verify the detrimental impacts on production of tonal languages such as Cantonese, by examining realization of speech correlates in Cantonese under PFCs including facial masks and shields.

METHODS: We recorded scripted speech in Hong Kong Cantonese produced by three adult speakers who wore various PFCs, including surgical masks, KF94 masks, and face shields (with and without surgical masks). Spectral and temporal parameters were measured, including mean intensity, speaking rate, long-term amplitude spectrum, formant frequencies of vowels, and duration and fundamental frequency (F0) of tone-bearing parts.

RESULTS: Significant changes were observed in all acoustic correlates of Cantonese speech under PFCs. Sound pressure levels were attenuated more intensely at ranges of higher frequencies in speech through face masks, whereas sound transmission was affected more at ranges of lower frequencies in speech under face shields. Vowel spaces derived from formant frequencies shrank under all PFCs, with the vowel /aa/ demonstrating largest changes in the first two formants. All tone-bearing parts were shortened and showed increments of F0 means in speech through PFCs. The decrease of tone duration was statistically significant in High-level and Low-level tones, while the increment of F0 means was significant in High-level tone only.

CONCLUSIONS: General filtering effect of PFCs is observed in Cantonese speech data, confirming language-universal patterns in acoustic attenuation by PFCs. The various coverings lower overall intensity levels of speech and degrade speech signal in higher frequency regions. Modification patterns specific to Hong Kong Cantonese are also identified. Vowel space area is reduced and found associated with increased speaking rates. Tones are produced with higher F0s under PFCs, which may be attributed to vocal tension caused by tightened vocal tract during speaking through facial coverings.

RevDate: 2022-10-10

Urzúa AR, KB Wolf (2022)

Unitary rotation of pixellated polychromatic images.

Journal of the Optical Society of America. A, Optics, image science, and vision, 39(8):1323-1329.

Unitary rotations of polychromatic images on finite two-dimensional pixellated screens provide invertibility, group composition, and thus conservation of information. Rotations have been applied on monochromatic image data sets, where we now examine closer the Gibbs-like oscillations that appear due to discrete "discontinuities" of the input images under unitary transformations. Extended to three-color images, we examine here the display of color at the pixels where, due to oscillations, some pixel color values may fall outside their required common numerical range [0,1], between absence and saturation of the red, green, and blue formant colors we choose to represent the images.

RevDate: 2022-10-01

Rothenberg M, S Rothenberg (2022)

Measuring the distortion of speech by a facemask.

JASA express letters, 2(9):095203.

Most prior research focuses on the reduced amplitude of speech caused by facemasks. This paper argues that the interaction between the acoustic properties of a facemask and the acoustic properties of the vocal tract contributes to speech distortion by changing the formants of the voice. Speech distortion of a number of masks was tested by measuring the increase in damping of the first formant. Results suggest that masks dampen the first formant and that increasing the distance between the mask wall and mouth can reduce this distortion. These findings contribute to the research studying the impact of masks on speech.

RevDate: 2022-10-01

Winn MB, RA Wright (2022)

Reconsidering commonly used stimuli in speech perception experiments.

The Journal of the Acoustical Society of America, 152(3):1394.

This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.

RevDate: 2022-09-28

Borodkin K, Gassner T, Ershaid H, et al (2022)

tDCS modulates speech perception and production in second language learners.

Scientific reports, 12(1):16212.

Accurate identification and pronunciation of nonnative speech sounds can be particularly challenging for adult language learners. The current study tested the effects of a brief musical training combined with transcranial direct current stimulation (tDCS) on speech perception and production in a second language (L2). The sample comprised 36 native Hebrew speakers, aged 18-38, who studied English as L2 in a formal setting and had little musical training. Training encompassed musical perception tasks with feedback (i.e., timbre, duration, and tonal memory) and concurrent tDCS applied over the left posterior auditory-related cortex (including posterior superior temporal gyrus and planum temporale). Participants were randomly assigned to anodal or sham stimulation. Musical perception, L2 speech perception (measured by a categorical AXB discrimination task) and speech production (measured by a speech imitation task) were tested before and after training. There were no tDCS-dependent effects on musical perception post-training. However, only participants who received active stimulation showed increased accuracy of L2 phoneme discrimination and greater change in the acoustic properties of L2 speech sound production (i.e., second formant frequency in vowels and center of gravity in consonants). The results of this study suggest neuromodulation can facilitate the processing of nonnative speech sounds in adult learners.

RevDate: 2022-09-26

Ying Liu Y, Polka L, Masapollo M, et al (2021)

Disentangling the roles of formant proximity and stimulus prototypicality in adult vowel perception.

JASA express letters, 1(1):015201.

The present investigation examined the extent to which asymmetries in vowel perception derive from a sensitivity to focalization (formant proximity), stimulus prototypicality, or both. English-speaking adults identified, rated, and discriminated a vowel series that spanned a less-focal/prototypic English /u/ and a more-focal/prototypic French /u/ exemplar. Discrimination pairs included one-step, two-step, and three-step intervals along the series. Asymmetries predicted by both focalization and prototype effects emerged when discrimination step-size was varied. The findings indicate that both generic/universal and language-specific biases shape vowel perception in adults; the latter are challenging to isolate without well-controlled stimuli and appropriately scaled discrimination tasks.

RevDate: 2022-09-28
CmpDate: 2022-09-28

Morse RP, Holmes SD, Irving R, et al (2022)

Noise helps cochlear implant listeners to categorize vowels.

JASA express letters, 2(4):042001.

Theoretical studies demonstrate that controlled addition of noise can enhance the amount of information transmitted by a cochlear implant (CI). The present study is a proof-of-principle for whether stochastic facilitation can improve the ability of CI users to categorize speech sounds. Analogue vowels were presented to CI users through a single electrode with independent noise on multiple electrodes. Noise improved vowel categorization, particularly in terms of an increase in information conveyed by the first and second formant. Noise, however, did not significantly improve vowel recognition: the miscategorizations were just more consistent, giving the potential to improve with experience.

RevDate: 2022-09-13

Nault DR, Mitsuya T, Purcell DW, et al (2022)

Perturbing the consistency of auditory feedback in speech.

Frontiers in human neuroscience, 16:905365.

Sensory information, including auditory feedback, is used by talkers to maintain fluent speech articulation. Current models of speech motor control posit that speakers continually adjust their motor commands based on discrepancies between the sensory predictions made by a forward model and the sensory consequences of their speech movements. Here, in two within-subject design experiments, we used a real-time formant manipulation system to explore how reliant speech articulation is on the accuracy or predictability of auditory feedback information. This involved introducing random formant perturbations during vowel production that varied systematically in their spatial location in formant space (Experiment 1) and temporal consistency (Experiment 2). Our results indicate that, on average, speakers' responses to auditory feedback manipulations varied based on the relevance and degree of the error that was introduced in the various feedback conditions. In Experiment 1, speakers' average production was not reliably influenced by random perturbations that were introduced every utterance to the first (F1) and second (F2) formants in various locations of formant space that had an overall average of 0 Hz. However, when perturbations were applied that had a mean of +100 Hz in F1 and -125 Hz in F2, speakers demonstrated reliable compensatory responses that reflected the average magnitude of the applied perturbations. In Experiment 2, speakers did not significantly compensate for perturbations of varying magnitudes that were held constant for one and three trials at a time. Speakers' average productions did, however, significantly deviate from a control condition when perturbations were held constant for six trials. Within the context of these conditions, our findings provide evidence that the control of speech movements is, at least in part, dependent upon the reliability and stability of the sensory information that it receives over time.

RevDate: 2022-09-05

Frankford SA, Cai S, Nieto-Castañón A, et al (2022)

Auditory feedback control in adults who stutter during metronome-paced speech II. Formant Perturbation.

Journal of fluency disorders, 74:105928 pii:S0094-730X(22)00033-X [Epub ahead of print].

PURPOSE: Prior work has shown that Adults who stutter (AWS) have reduced and delayed responses to auditory feedback perturbations. This study aimed to determine whether external timing cues, which increase fluency, resolve auditory feedback processing disruptions.

METHODS: Fifteen AWS and sixteen adults who do not stutter (ANS) read aloud a multisyllabic sentence either with natural stress and timing or with each syllable paced at the rate of a metronome. On random trials, an auditory feedback formant perturbation was applied, and formant responses were compared between groups and pacing conditions.

RESULTS: During normally paced speech, ANS showed a significant compensatory response to the perturbation by the end of the perturbed vowel, while AWS did not. In the metronome-paced condition, which significantly reduced the disfluency rate, the opposite was true: AWS showed a significant response by the end of the vowel, while ANS did not.

CONCLUSION: These findings indicate a potential link between the reduction in stuttering found during metronome-paced speech and changes in auditory motor integration in AWS.

RevDate: 2022-09-01

Lee SH, GS Lee (2022)

Long-term Average Spectrum and Nasal Accelerometry in Sentences of Differing Nasality and Forward-Focused Vowel Productions Under Altered Auditory Feedback.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00228-4 [Epub ahead of print].

OBJECTIVES AND BACKGROUND: To investigate whether voice focus adjustments can alter the audio-vocal feedback and consequently modulate speech/voice motor control. Speaking with a forward-focused voice was expected to enhance audio-vocal feedback and thus decrease the variability of vocal fundamental frequency (F0).

MATERIALS AND METHOD: Twenty-two healthy, untrained adults (10 males and 12 females) were requested to sustain vowel /a/ with their natural focus and a forward focus and to naturally read the nasal, oral, and mixed oral-nasal sentences in normal noise-masked auditory conditions. Meanwhile, a miniature accelerometer was externally attached on the noise to detect the nasal vibrations during vocalization. Audio recordings were made and analyzed using the long-term average spectrum (LTAS) and power spectral analysis of F0.

RESULTS: Compared with naturally-focused vowel production and oral sentences, forward-focused vowel productions and nasal sentences both showed significant increases in nasal accelerometric amplitude and the spectral power within the range of 200∼300 Hz, and significantly decreased the F0 variability below 3 Hz, which has been reported to be associated with enhanced auditory feedback in our previous research. The auditory masking not only significantly increased the low-frequency F0 variability, but also significantly decreased the ratio of the spectral power within 200∼300 Hz to the power within 300∼1000 Hz for the vowel and sentence productions. Gender differences were found in the correlations between the degree of nasal coupling and F0 stability as well as in the LTAS characteristics in response to noise.

CONCLUSIONS: Variations in nasal-oral acoustic coupling not only change the formant features of speech signals, but involuntarily influence the auditory feedback control of vocal fold vibrations. Speakers tend to show improved F0 stability in response to a forward-focused voice adjustment.

RevDate: 2022-09-01

Krumbiegel J, Ufer C, H Blank (2022)

Influence of voice properties on vowel perception depends on speaker context.

The Journal of the Acoustical Society of America, 152(2):820.

Different speakers produce the same intended vowel with very different physical properties. Fundamental frequency (F0) and formant frequencies (FF), the two main parameters that discriminate between voices, also influence vowel perception. While it has been shown that listeners comprehend speech more accurately if they are familiar with a talker's voice, it is still unclear how such prior information is used when decoding the speech stream. In three online experiments, we examined the influence of speaker context via F0 and FF shifts on the perception of /o/-/u/ vowel contrasts. Participants perceived vowels from an /o/-/u/ continuum shifted toward /u/ when F0 was lowered or FF increased relative to the original speaker's voice and vice versa. This shift was reduced when the speakers were presented in a block-wise context compared to random order. Conversely, the original base voice was perceived to be shifted toward /u/ when presented in the context of a low F0 or high FF speaker, compared to a shift toward /o/ with high F0 or low FF speaker context. These findings demonstrate that that F0 and FF jointly influence vowel perception in speaker context.

RevDate: 2022-09-01

Whalen DH, Chen WR, Shadle CH, et al (2022)

Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986).

The Journal of the Acoustical Society of America, 152(2):933.

Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.

RevDate: 2022-08-26

Beeck VC, Heilmann G, Kerscher M, et al (2022)

Sound Visualization Demonstrates Velopharyngeal Coupling and Complex Spectral Variability in Asian Elephants.

Animals : an open access journal from MDPI, 12(16): pii:ani12162119.

Sound production mechanisms set the parameter space available for transmitting biologically relevant information in vocal signals. Low-frequency rumbles play a crucial role in coordinating social interactions in elephants' complex fission-fusion societies. By emitting rumbles through either the oral or the three-times longer nasal vocal tract, African elephants alter their spectral shape significantly. In this study, we used an acoustic camera to visualize the sound emission of rumbles in Asian elephants, which have received far less research attention than African elephants. We recorded nine adult captive females and analyzed the spectral parameters of 203 calls, including vocal tract resonances (formants). We found that the majority of rumbles (64%) were nasally emitted, 21% orally, and 13% simultaneously through the mouth and trunk, demonstrating velopharyngeal coupling. Some of the rumbles were combined with orally emitted roars. The nasal rumbles concentrated most spectral energy in lower frequencies exhibiting two formants, whereas the oral and mixed rumbles contained higher formants, higher spectral energy concentrations and were louder. The roars were the loudest, highest and broadest in frequency. This study is the first to demonstrate velopharyngeal coupling in a non-human animal. Our findings provide a foundation for future research into the adaptive functions of the elephant acoustic variability for information coding, localizability or sound transmission, as well as vocal flexibility across species.

RevDate: 2022-08-24

Easwar V, Aiken S, Beh K, et al (2022)

Variability in the Estimated Amplitude of Vowel-Evoked Envelope Following Responses Caused by Assumed Neurophysiologic Processing Delays.

Journal of the Association for Research in Otolaryngology : JARO [Epub ahead of print].

Vowel-evoked envelope following responses (EFRs) reflect neural encoding of the fundamental frequency of voice (f0). Accurate analysis of EFRs elicited by natural vowels requires the use of methods like the Fourier analyzer (FA) to consider the production-related f0 changes. The FA's accuracy in estimating EFRs is, however, dependent on the assumed neurophysiological processing delay needed to time-align the f0 time course and the recorded electroencephalogram (EEG). For male-spoken vowels (f0 ~ 100 Hz), a constant 10-ms delay correction is often assumed. Since processing delays vary with stimulus and physiological factors, we quantified (i) the delay-related variability that would occur in EFR estimation, and (ii) the influence of stimulus frequency, non-f0 related neural activity, and the listener's age on such variability. EFRs were elicited by the low-frequency first formant, and mid-frequency second and higher formants of /u/, /a/, and /i/ in young adults and 6- to 17-year-old children. To time-align with the f0 time course, EEG was shifted by delays between 5 and 25 ms to encompass plausible response latencies. The delay-dependent range in EFR amplitude did not vary by stimulus frequency or age and was significantly smaller when interference from low-frequency activity was reduced. On average, the delay-dependent range was < 22% of the maximum variability in EFR amplitude that could be expected by noise. Results suggest that using a constant EEG delay correction in FA analysis does not substantially alter EFR amplitude estimation. In the present study, the lack of substantial variability was likely facilitated by using vowels with small f0 ranges.

RevDate: 2022-08-22

Clarke H, Leav S, Zestic J, et al (2022)

Enhanced Neonatal Pulse Oximetry Sounds for the First Minutes of Life: A Laboratory Trial.

Human factors [Epub ahead of print].

OBJECTIVE: Auditory enhancements to the pulse oximetry tone may help clinicians detect deviations from target ranges for oxygen saturation (SpO2) and heart rate (HR).

BACKGROUND: Clinical guidelines recommend target ranges for SpO2 and HR during neonatal resuscitation in the first 10 minutes after birth. The pulse oximeter currently maps HR to tone rate, and SpO2 to tone pitch. However, deviations from target ranges for SpO2 and HR are not easy to detect.

METHOD: Forty-one participants were presented with 30-second simulated scenarios of an infant's SpO2 and HR levels in the first minutes after birth. Tremolo marked distinct HR ranges and formants marked distinct SpO2 ranges. Participants were randomly allocated to conditions: (a) No Enhancement control, (b) Enhanced HR Only, (c) Enhanced SpO2 Only, and (d) Enhanced Both.

RESULTS: Participants in the Enhanced HR Only and Enhanced SpO2 Only conditions identified HR and SpO2 ranges, respectively, more accurately than participants in the No Enhancement condition, ps < 0.001. In the Enhanced Both condition, the tremolo enhancement of HR did not affect participants' ability to identify SpO2 range, but the formants enhancement of SpO2 may have attenuated participants' ability to identify tremolo-enhanced HR range.

CONCLUSION: Tremolo and formant enhancements improve range identification for HR and SpO2, respectively, and could improve clinicians' ability to identify SpO2 and HR ranges in the first minutes after birth.

APPLICATION: Enhancements to the pulse oximeter tone to indicate clinically important ranges could improve the management of oxygen delivery to the neonate during resuscitation in the first 10 minutes after birth.

RevDate: 2022-08-22

Titze IR, A Palaparthi (2016)

Sensitivity of Source-Filter Interaction to Specific Vocal Tract Shapes.

IEEE/ACM transactions on audio, speech, and language processing, 24(12):2507-2515.

A systematic variation of length and cross-sectional area of specific segments of the vocal tract (trachea to lips) was conducted computationally to quantify the effects of source-filter interaction. A one-dimensional Navier-Stokes (transmission line) solution was used to compute peak glottal airflow, maximum flow declination rate, and formant ripple on glottal flow for Level 1 (aero-acoustic) interactions. For Level 2 (tissue movement) interaction, peak glottal area, phonation threshold pressure, and deviation in fo were quantified. Results show that the ventricle, the false-fold glottis, the conus elasticus entry, and the laryngeal vestibule are the regions to which acoustic variables are most sensitive. Generally, any narrow section of the vocal tract increases the degree of interaction, both in terms of its length and its cross-sectional area. The closer the narrow section is to the vocal folds, the greater the effect.

RevDate: 2022-08-12

Nascimento GFD, Silva HJD, Oliveira KGSC, et al (2022)

Relationship Between Oropharyngeal Geometry and Acoustic Parameters in Singers: A Preliminary Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00214-4 [Epub ahead of print].

OBJECTIVE: To verify possible correlations between formant and cepstral parameters and oropharyngeal geometry in singers, stratified by sex.

METHOD: Voice records and oropharyngeal measures of 31 singers - 13 females and 18 males, mean age of 28 (±5.0) years - were retrieved from a database and analyzed. The oropharyngeal geometry measures were collected with acoustic pharyngometry, and the voice records consisted of sustained vowel /Ԑ/ phonation, which were exported to Praat software and edited to obtain the formant and cepstral parameters, stratified by sex. The Pearson linear correlation test was applied to relate voice parameters to oropharyngeal geometry, at the 5% significance level; the linear regression test was used to justify the variable related to the second formant.

RESULTS: Differences between the sexes were identified only in the oral cavity length (greater in males) and pharyngeal cavity length (greater in females). There was a linear correlation between the third formant and the cepstrum in the female group. In the male group, there was a linear correlation between the cepstrum and the third and fourth formants. A positive linear correlation with up to 95% confidence was also identified between the pharyngeal cavity volume and the second formant in the female group, making it possible to estimate a regression model for the second formant (R2 = 0.70).

CONCLUSION: There are correlations between the oropharyngeal geometry and formant and cepstral parameters in relation to sex. The pharyngeal cavity volume showed the greatest correlation between females and the second formant.

RevDate: 2022-08-11

Nishimura T, Tokuda IT, Miyachi S, et al (2022)

Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech.

Science (New York, N.Y.), 377(6607):760-763.

Human speech production obeys the same acoustic principles as vocal production in other animals but has distinctive features: A stable vocal source is filtered by rapidly changing formant frequencies. To understand speech evolution, we examined a wide range of primates, combining observations of phonation with mathematical modeling. We found that source stability relies upon simplifications in laryngeal anatomy, specifically the loss of air sacs and vocal membranes. We conclude that the evolutionary loss of vocal membranes allows human speech to mostly avoid the spontaneous nonlinear phenomena and acoustic chaos common in other primate vocalizations. This loss allows our larynx to produce stable, harmonic-rich phonation, ideally highlighting formant changes that convey most phonetic information. Paradoxically, the increased complexity of human spoken language thus followed simplification of our laryngeal anatomy.

RevDate: 2022-08-09

Suresh CH, A Krishnan (2022)

Frequency-Following Response to Steady-State Vowel in Quiet and Background Noise Among Marching Band Participants With Normal Hearing.

American journal of audiology [Epub ahead of print].

OBJECTIVE: Human studies enrolling individuals at high risk for cochlear synaptopathy (CS) have reported difficulties in speech perception in adverse listening conditions. The aim of this study is to determine if these individuals show a degradation in the neural encoding of speech in quiet and in the presence of background noise as reflected in neural phase-locking to both envelope periodicity and temporal fine structure (TFS). To our knowledge, there are no published reports that have specifically examined the neural encoding of both envelope periodicity and TFS of speech stimuli (in quiet and in adverse listening conditions) among a sample with loud-sound exposure history who are at risk for CS.

METHOD: Using scalp-recorded frequency-following response (FFR), the authors evaluated the neural encoding of envelope periodicity (FFRENV) and TFS (FFRTFS) for a steady-state vowel (English back vowel /u/) in quiet and in the presence of speech-shaped noise presented at +5- and 0 dB SNR. Participants were young individuals with normal hearing who participated in the marching band for at least 5 years (high-risk group) and non-marching band group with low-noise exposure history (low-risk group).

RESULTS: The results showed no group differences in the neural encoding of either the FFRENV or the first formant (F1) in the FFRTFS in quiet and in noise. Paradoxically, the high-risk group demonstrated enhanced representation of F2 harmonics across all stimulus conditions.

CONCLUSIONS: These results appear to be in line with a music experience-dependent enhancement of F2 harmonics. However, due to sound overexposure in the high-risk group, the role of homeostatic central compensation cannot be ruled out. A larger scale data set with different noise exposure background, longitudinal measurements with an array of behavioral and electrophysiological tests is needed to disentangle the nature of the complex interaction between the effects of central compensatory gain and experience-dependent enhancement.

RevDate: 2022-08-09

McAllister T, Eads A, Kabakoff H, et al (2022)

Baseline Stimulability Predicts Patterns of Response to Traditional and Ultrasound Biofeedback Treatment for Residual Speech Sound Disorder.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study aimed to identify predictors of response to treatment for residual speech sound disorder (RSSD) affecting English rhotics. Progress was tracked during an initial phase of traditional motor-based treatment and a longer phase of treatment incorporating ultrasound biofeedback. Based on previous literature, we focused on baseline stimulability and sensory acuity as predictors of interest.

METHOD: Thirty-three individuals aged 9-15 years with residual distortions of /ɹ/ received a course of individual intervention comprising 1 week of intensive traditional treatment and 9 weeks of ultrasound biofeedback treatment. Stimulability for /ɹ/ was probed prior to treatment, after the traditional treatment phase, and after the end of all treatment. Accuracy of /ɹ/ production in each probe was assessed with an acoustic measure: normalized third formant (F3)-second formant (F2) distance. Model-based clustering analysis was applied to these acoustic measures to identify different average trajectories of progress over the course of treatment. The resulting clusters were compared with respect to acuity in auditory and somatosensory domains.

RESULTS: All but four individuals were judged to exhibit a clinically significant response to the combined course of treatment. Two major clusters were identified. The "low stimulability" cluster was characterized by very low accuracy at baseline, minimal response to traditional treatment, and strong response to ultrasound biofeedback. The "high stimulability" group was more accurate at baseline and made significant gains in both traditional and ultrasound biofeedback phases of treatment. The clusters did not differ with respect to sensory acuity.

CONCLUSIONS: This research accords with clinical intuition in finding that individuals who are more stimulable at baseline are more likely to respond to traditional intervention, whereas less stimulable individuals may derive greater relative benefit from biofeedback.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.20422236.

RevDate: 2022-08-05

Mills HE, Shorey AE, Theodore RM, et al (2022)

Context effects in perception of vowels differentiated by F1 are not influenced by variability in talkers' mean F1 or F3.

The Journal of the Acoustical Society of America, 152(1):55.

Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443-1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent.

RevDate: 2022-08-03

Feng Y, G Peng (2022)

Development of categorical speech perception in Mandarin-speaking children and adolescents.

Child development [Epub ahead of print].

Although children develop categorical speech perception at a very young age, the maturation process remains unclear. A cross-sectional study in Mandarin-speaking 4-, 6-, and 10-year-old children, 14-year-old adolescents, and adults (n = 104, 56 males, all Asians from mainland China) was conducted to investigate the development of categorical perception of four Mandarin phonemic contrasts: lexical tone contrast Tone 1-2, vowel contrast /u/-/i/, consonant aspiration contrast /p/-/ph /, and consonant formant transition contrast /p/-/t/. The results indicated that different types of phonemic contrasts, and even the identification and discrimination of the same phonemic contrast, matured asynchronously. The observation that tone and vowel perception are achieved earlier than consonant perception supports the phonological saliency hypothesis.

RevDate: 2022-08-02

Song J, Wan Q, Wang Y, et al (2022)

Establishment of a Multi-parameter Evaluation Model for Risk of Aspiration in Dysphagia: A Pilot Study.

Dysphagia [Epub ahead of print].

It's difficult for clinical bedside evaluations to accurately determine the occurrence of aspiration in patients. Although VFSS and FEES are the gold standards for clinical diagnosis of dysphagia, which are mainly used to evaluate people at high risk of dysphagia found by bedside screening, the operation is complicated and time-consuming. The aim of this pilot study was to present an objective measure based on a multi-parameter approach to screen for aspiration risk in patients with dysphagia. Objective evaluation techniques based on speech parameters were used to assess the oral motor function, vocal cord function, and voice changes before and after swallowing in 32 patients with dysphagia (16 low-risk aspiration group, 16 high-risk aspiration group). Student's t test combined with stepwise logistic regression were used to determine the optimal index. The best model consists of three parameters, and the equation is: logit(P) = - 3.824 - (0.504 × maximum phonation time) + (0.008 × second formant frequency of /u/) - 0.085 × (fundamental frequency difference before and after swallowing). An additional eight patients with dysphagia were randomly selected as the validation group of the model. When applied to validation, this model can accurately identify the risk of aspiration in 87.5% of patients, and the sensitivity is as high as 100%. Therefore, it has certain clinical practical value that may help clinicians to assess the risk of aspiration in patients with dysphagia, especially for silent aspiration.

RevDate: 2022-07-29

Lee GS, CW Chang (2022)

Comparisons of auditory brainstem response elicited by compound click-sawtooths sound and synthetic consonant-vowel /da/1.

Physiology & behavior pii:S0031-9384(22)00228-1 [Epub ahead of print].

The auditory brainstem response to complex sounds (cABR) could be evoked using speech sounds such as the 40-ms synthetic consonant-vowel syllable /da/ (CV-da) that was commonly used in basic and clinical research. cABR consists of responses to formant energy as well as the energy of fundamental frequency. The co-existence of the two energy makes cABR a mixed response. We introduced a new stimulus of click-sawtooths (CSW) with similar time-lock patterns but without formant or harmonic energy. Ten young healthy volunteers were recruited and the cABRs of CV-da and CSW of their 20 ears were acquired. The response latencies, amplitudes, and frequency-domain analytic results were compared pairwisely between stimuli. The response amplitudes were significantly greater for CSW and the latencies were significantly shorter for CSW. The latency-intensity functions were also greater for CSW. For CSW, adjustments of energy component can be made without causing biased changes to the other. CSW may be used in future basic research and clinical applications.

RevDate: 2022-07-25

Wang H, L Max (2022)

Inter-Trial Formant Variability in Speech Production Is Actively Controlled but Does Not Affect Subsequent Adaptation to a Predictable Formant Perturbation.

Frontiers in human neuroscience, 16:890065.

Despite ample evidence that speech production is associated with extensive trial-to-trial variability, it remains unclear whether this variability represents merely unwanted system noise or an actively regulated mechanism that is fundamental for maintaining and adapting accurate speech movements. Recent work on upper limb movements suggest that inter-trial variability may be not only actively regulated based on sensory feedback, but also provide a type of workspace exploration that facilitates sensorimotor learning. We therefore investigated whether experimentally reducing or magnifying inter-trial formant variability in the real-time auditory feedback during speech production (a) leads to adjustments in formant production variability that compensate for the manipulation, (b) changes the temporal structure of formant adjustments across productions, and (c) enhances learning in a subsequent adaptation task in which a predictable formant-shift perturbation is applied to the feedback signal. Results show that subjects gradually increased formant variability in their productions when hearing auditory feedback with reduced variability, but subsequent formant-shift adaptation was not affected by either reducing or magnifying the perceived variability. Thus, findings provide evidence for speakers' active control of inter-trial formant variability based on auditory feedback from previous trials, but-at least for the current short-term experimental manipulation of feedback variability-not for a role of this variability regulation mechanism in subsequent auditory-motor learning.

RevDate: 2022-07-23

Mailhos A, Egea-Caparrós DA, Guerrero Rodríguez C, et al (2022)

Vocal Cues to Male Physical Formidability.

Frontiers in psychology, 13:879102.

Animal vocalizations convey important information about the emitter, including sex, age, biological quality, and emotional state. Early on, Darwin proposed that sex differences in auditory signals and vocalizations were driven by sexual selection mechanisms. In humans, studies on the association between male voice attributes and physical formidability have thus far reported mixed results. Hence, with a view to furthering our understanding of the role of human voice in advertising physical formidability, we sought to identify acoustic attributes of male voices associated with physical formidability proxies. Mean fundamental frequency (F 0), formant dispersion (D f), formant position (P f), and vocal tract length (VTL) data from a sample of 101 male voices was analyzed for potential associations with height, weight, and maximal handgrip strength (HGS). F 0 correlated negatively with HGS; P f showed negative correlations with HGS, height and weight, whereas VTL positively correlated with HGS, height and weight. All zero-order correlations remained significant after controlling for false discovery rate (FDR) with the Benjamini-Hochberg method. After controlling for height and weight-and controlling for FDR-the correlation between F 0 and HGS remained significant. In addition, to evaluate the ability of human male voices to advertise physical formidability to potential mates, 151 heterosexual female participants rated the voices of the 10 strongest and the 10 weakest males from the original sample for perceived physical strength, and given that physical strength is a desirable attribute in male partners, perceived attractiveness. Generalized linear mixed model analyses-which allow for generalization of inferences to other samples of both raters and targets-failed to support a significant association of perceived strength or attractiveness from voices alone and actual physical strength. These results add to the growing body of work on the role of human voices in conveying relevant biological information.

RevDate: 2022-07-20

Shao J, Bakhtiar M, C Zhang (2022)

Impaired Categorical Perception of Speech Sounds Under the Backward Masking Condition in Adults Who Stutter.

Journal of speech, language, and hearing research : JSLHR, 65(7):2554-2570.

PURPOSE: Evidence increasingly indicates that people with developmental stuttering have auditory perception deficits. Our previous research has indicated similar but slower performance in categorical perception of the speech sounds under the quiet condition in children who stutter and adults who stutter (AWS) compared with their typically fluent counterparts. We hypothesized that the quiet condition may not be sufficiently sensitive to reveal subtle perceptual deficiencies in people who stutter. This study examined this hypothesis by testing the categorical perception of speech and nonspeech sounds under backward masking condition (i.e., a noise was presented immediately after the target stimuli).

METHOD: Fifteen Cantonese-speaking AWS and 15 adults who do not stutter (AWNS) were tested on the categorical perception of four stimulus continua, namely, consonant varying in voice onset time (VOT), vowel, lexical tone, and nonspeech, under the backward masking condition using identification and discrimination tasks.

RESULTS: AWS demonstrated a broader boundary width than AWNS in the identification task. AWS also exhibited a worse performance than AWNS in the discrimination of between-category stimuli but a comparable performance in the discrimination of within-category stimuli, indicating reduced sensitivity to sounds that belonged to different phonemic categories among AWS. Moreover, AWS showed similar patterns of impaired categorical perception across the four stimulus types, although the boundary location on the VOT continuum occurred at an earlier point in AWS than in AWNS.

CONCLUSIONS: The findings provide robust evidence that AWS exhibit impaired categorical perception of speech and nonspeech sounds under the backward masking condition. Temporal processing (i.e., VOT manipulation), frequency/spectral/formant processing (i.e., lexical tone or vowel manipulations), and nonlinguistic pitch processing were all found to be impaired in AWS. Altogether, the findings support the hypothesis that AWS might be less efficient in accessing the phonemic representations when exposed to a demanding listening condition.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.20249718.

RevDate: 2022-07-20

Baciadonna L, Solvi C, Del Vecchio F, et al (2022)

Vocal accommodation in penguins (Spheniscus demersus) as a result of social environment.

Proceedings. Biological sciences, 289(1978):20220626.

The ability to vary the characteristics of one's voice is a critical feature of human communication. Understanding whether and how animals change their calls will provide insights into the evolution of language. We asked to what extent the vocalizations of penguins, a phylogenetically distant species from those capable of explicit vocal learning, are flexible and responsive to their social environment. Using a principal components (PCs) analysis, we reduced 14 vocal parameters of penguin's contact calls to four PCs, each comprising highly correlated parameters and which can be categorized as fundamental frequency, formant frequency, frequency modulation, and amplitude modulation rate and duration. We compared how these differed between individuals with varying degrees of social interactions: same-colony versus different-colony, same colony over 3 years and partners versus non-partners. Our analyses indicate that the more penguins experience each other's calls, the more similar their calls become over time, that vocal convergence requires a long time and relative stability in colony membership, and that partners' unique social bond may affect vocal convergence differently than non-partners. Our results suggest that this implicit form of vocal plasticity is perhaps more widespread across the animal kingdom than previously thought and may be a fundamental capacity of vertebrate vocalization.

RevDate: 2022-07-08

Easwar V, L Chung (2022)

The influence of phoneme contexts on adaptation in vowel-evoked envelope following responses.

The European journal of neuroscience [Epub ahead of print].

Repeated stimulus presentation leads to neural adaptation and consequent amplitude reduction in vowel-evoked envelope following responses (EFRs)-a response that reflects neural activity phase-locked to envelope periodicity. EFRs are elicited by vowels presented in isolation or in the context of other phonemes such as in syllables. While context phonemes could exert some forward influence on vowel-evoked EFRs, they may reduce the degree of adaptation. Here, we evaluated whether the properties of context phonemes between consecutive vowel stimuli influence adaptation. EFRs were elicited by the low-frequency first formant (resolved harmonics) and mid-to-high frequency second and higher formants (unresolved harmonics) of a male-spoken/i/when the presence, number, and predictability of context phonemes (/s/, /a/, /∫/, /u/) between vowel repetitions varied. Monitored over four iterations of /i/, adaptation was evident only for EFRs elicited by the unresolved harmonics. EFRs elicited by the unresolved harmonics decreased in amplitude by ~16-20 nV (10-17%) after the first presentation of/i/and remained stable thereafter. EFR adaptation was reduced by the presence of a context phoneme, but the reduction did not change with their number or predictability. The presence of a context phoneme, however, attenuated EFRs by a degree similar to that caused by adaptation (~21-23 nV). Such a trade-off in the short- and long-term influence of context phonemes suggests that the benefit of interleaving EFR-eliciting vowels with other context phonemes depends on whether the use of consonant-vowel syllables is critical to improve the validity of EFR applications.

RevDate: 2022-07-08

Teferra BG, Borwein S, DeSouza DD, et al (2022)

Acoustic and Linguistic Features of Impromptu Speech and Their Association With Anxiety: Validation Study.

JMIR mental health, 9(7):e36828 pii:v9i7e36828.

BACKGROUND: The measurement and monitoring of generalized anxiety disorder requires frequent interaction with psychiatrists or psychologists. Access to mental health professionals is often difficult because of high costs or insufficient availability. The ability to assess generalized anxiety disorder passively and at frequent intervals could be a useful complement to conventional treatment and help with relapse monitoring. Prior work suggests that higher anxiety levels are associated with features of human speech. As such, monitoring speech using personal smartphones or other wearable devices may be a means to achieve passive anxiety monitoring.

OBJECTIVE: This study aims to validate the association of previously suggested acoustic and linguistic features of speech with anxiety severity.

METHODS: A large number of participants (n=2000) were recruited and participated in a single web-based study session. Participants completed the Generalized Anxiety Disorder 7-item scale assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. Acoustic and linguistic speech features were a priori selected based on the existing speech and anxiety literature, along with related features. Associations between speech features and anxiety levels were assessed using age and personal income as covariates.

RESULTS: Word count and speaking duration were negatively correlated with anxiety scores (r=-0.12; P<.001), indicating that participants with higher anxiety scores spoke less. Several acoustic features were also significantly (P<.05) associated with anxiety, including the mel-frequency cepstral coefficients, linear prediction cepstral coefficients, shimmer, fundamental frequency, and first formant. In contrast to previous literature, second and third formant, jitter, and zero crossing rate for the z score of the power spectral density acoustic features were not significantly associated with anxiety. Linguistic features, including negative-emotion words, were also associated with anxiety (r=0.10; P<.001). In addition, some linguistic relationships were sex dependent. For example, the count of words related to power was positively associated with anxiety in women (r=0.07; P=.03), whereas it was negatively associated with anxiety in men (r=-0.09; P=.01).

CONCLUSIONS: Both acoustic and linguistic speech measures are associated with anxiety scores. The amount of speech, acoustic quality of speech, and gender-specific linguistic characteristics of speech may be useful as part of a system to screen for anxiety, detect relapse, or monitor treatment.

RevDate: 2022-07-01

Lin YC, Yan HT, Lin CH, et al (2022)

Predicting frailty in older adults using vocal biomarkers: a cross-sectional study.

BMC geriatrics, 22(1):549.

BACKGROUND: Frailty is a common issue in the aging population. Given that frailty syndrome is little discussed in the literature on the aging voice, the current study aims to examine the relationship between frailty and vocal biomarkers in older people.

METHODS: Participants aged ≥ 60 years visiting geriatric outpatient clinics were recruited. They underwent frailty assessment (Cardiovascular Health Study [CHS] index; Study of Osteoporotic Fractures [SOF] index; and Fatigue, Resistance, Ambulation, Illness, and Loss of weight [FRAIL] index) and were asked to pronounce a sustained vowel /a/ for approximately 1 s. Four voice parameters were assessed: average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4).

RESULTS: Among 277 older adults, increased A1 was associated with a lower likelihood of frailty as defined by SOF (odds ratio [OR] 0.84, 95% confidence interval [CI] 0.74-0.96). Participants with larger A2 values were more likely to be frail, as defined by FRAIL and CHS (FRAIL: OR 1.41, 95% CI 1.12-1.79; CHS: OR 1.38, 95% CI 1.10-1.75). Sex differences were observed across the three frailty indices. In male participants, an increase in A3 by 10 points increased the odds of frailty by almost 7% (SOF: OR 1.07, 95% CI 1.02-1.12), 6% (FRAIL: OR 1.06, 95% CI 1.02-1.11), or 6% (CHS: OR 1.06, 95% CI 1.01-1.11). In female participants, an increase in A4 by 0.1 conferred a significant 2.8-fold (SOF: OR 2.81, 95% CI 1.71-4.62), 2.3-fold (FRAIL: OR 2.31, 95% CI 1.45-3.68), or 2.8-fold (CHS: OR 2.82, 95% CI 1.76-4.51, CHS) increased odds of frailty.

CONCLUSIONS: Vocal biomarkers, especially spectral-domain voice parameters, might have potential for estimating frailty, as a non-invasive, instantaneous, objective, and cost-effective estimation tool, and demonstrating sex differences for individualised treatment of frailty.

RevDate: 2022-07-01

Jibson J (2022)

Formant detail needed for identifying, rating, and discriminating vowels in Wisconsin English.

The Journal of the Acoustical Society of America, 151(6):4004.

Neel [(2004). Acoust. Res. Lett. Online 5, 125-131] asked how much time-varying formant detail is needed for vowel identification. In that study, multiple stimuli were synthesized for each vowel: 1-point (monophthongal with midpoint frequencies), 2-point (linear from onset to offset), 3-point, 5-point, and 11-point. Results suggested that a 3-point model was optimal. This conflicted with the dual-target hypothesis of vowel inherent spectral change research, which has found that two targets are sufficient to model vowel identification. The present study replicates and expands upon the work of Neel. Ten English monophthongs were chosen for synthesis. One-, two-, three-, and five-point vowels were created as described above, and another 1-point stimulus was created with onset frequencies rather than midpoint frequencies. Three experiments were administered (n = 18 for each): vowel identification, goodness rating, and discrimination. The results ultimately align with the dual-target hypothesis, consistent with most vowel inherent spectral change studies.

RevDate: 2022-06-24

Groll MD, Dahl KL, Cádiz MD, et al (2022)

Resynthesis of Transmasculine Voices to Assess Gender Perception as a Function of Testosterone Therapy.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The goal of this study was to use speech resynthesis to investigate the effects of changes to individual acoustic features on speech-based gender perception of transmasculine voice samples following the onset of hormone replacement therapy (HRT) with exogenous testosterone. We hypothesized that mean fundamental frequency (f o) would have the largest effect on gender perception of any single acoustic feature.

METHOD: Mean f o, f o contour, and formant frequencies were calculated for three pairs of transmasculine speech samples before and after HRT onset. Sixteen speech samples with unique combinations of these acoustic features from each pair of speech samples were resynthesized. Twenty young adult listeners evaluated each synthesized speech sample for gender perception and synthetic quality. Two analyses of variance were used to investigate the effects of acoustic features on gender perception and synthetic quality.

RESULTS: Of the three acoustic features, mean f o was the only single feature that had a statistically significant effect on gender perception. Differences between the speech samples before and after HRT onset that were not captured by changes in f o and formant frequencies also had a statistically significant effect on gender perception.

CONCLUSION: In these transmasculine voice samples, mean f o was the most important acoustic feature for voice masculinization as a result of HRT; future investigations in a larger number of transmasculine speakers and on the effects of behavioral therapy-based changes in concert with HRT is warranted.

RevDate: 2022-06-23

Ham J, Yoo HJ, Kim J, et al (2022)

Vowel speech recognition from rat electroencephalography using long short-term memory neural network.

PloS one, 17(6):e0270405 pii:PONE-D-21-40838.

Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen's κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.

RevDate: 2022-06-22

Pravitharangul N, Miyamoto JJ, Yoshizawa H, et al (2022)

Vowel sound production and its association with cephalometric characteristics in skeletal Class III subjects.

European journal of orthodontics pii:6613233 [Epub ahead of print].

BACKGROUND: This study aimed to evaluate differences in vowel production using acoustic analysis in skeletal Class III and Class I Japanese participants and to identify the correlation between vowel sounds and cephalometric variables in skeletal Class III subjects.

MATERIALS AND METHODS: Japanese males with skeletal Class III (ANB < 0°) and Class I skeletal anatomy (0.62° < ANB < 5.94°) were recruited (n = 18/group). Acoustic analysis of vowel sounds and cephalometric analysis of lateral cephalograms were performed. For sound analysis, an isolated Japanese vowel (/a/,/i/,/u/,/e/,/o/) pattern was recorded. Praat software was used to extract acoustic parameters such as fundamental frequency (F0) and the first four formants (F1, F2, F3, and F4). The formant graph area was calculated. Cephalometric values were obtained using ImageJ. Correlations between acoustic and cephalometric variables in skeletal Class III subjects were then investigated.

RESULTS: Skeletal Class III subjects exhibited significantly higher/o/F2 and lower/o/F4 values. Mandibular length, SNB, and overjet of Class III subjects were moderately negatively correlated with acoustic variables.

LIMITATIONS: This study did not take into account vertical skeletal patterns and tissue movements during sound production.

CONCLUSION: Skeletal Class III males produced different /o/ (back and rounded vowel), possibly owing to their anatomical positions or adaptive changes. Vowel production was moderately associated with cephalometric characteristics of Class III subjects. Thus, changes in speech after orthognathic surgery may be expected. A multidisciplinary team approach that included the input of a speech pathologist would be useful.

RevDate: 2022-06-21

Kabakoff H, Gritsyk O, Harel D, et al (2022)

Characterizing sensorimotor profiles in children with residual speech sound disorder: a pilot study.

Journal of communication disorders, 99:106230 pii:S0021-9924(22)00049-1 [Epub ahead of print].

PURPOSE: Children with speech errors who have reduced motor skill may be more likely to develop residual errors associated with lifelong challenges. Drawing on models of speech production that highlight the role of somatosensory acuity in updating motor plans, this pilot study explored the relationship between motor skill and speech accuracy, and between somatosensory acuity and motor skill in children. Understanding the connections among sensorimotor measures and speech outcomes may offer insight into how somatosensation and motor skill cooperate during speech production, which could inform treatment decisions for this population.

METHOD: Twenty-five children (ages 9-14) produced syllables in an /ɹ/ stimulability task before and after an ultrasound biofeedback treatment program targeting rhotics. We first tested whether motor skill (as measured by two ultrasound-based metrics of tongue shape complexity) predicted acoustically measured accuracy (the normalized difference between the second and third formant frequencies). We then tested whether somatosensory acuity (as measured by an oral stereognosis task) predicted motor skill, while controlling for auditory acuity.

RESULTS: One measure of tongue shape complexity was a significant predictor of accuracy, such that higher tongue shape complexity was associated with lower accuracy at pre-treatment but higher accuracy at post-treatment. Based on the same measure, children with better somatosensory acuity produced /ɹ/ tongue shapes that were more complex, but this relationship was only present at post-treatment.

CONCLUSION: The predicted relationships among somatosensory acuity, motor skill, and acoustically measured /ɹ/ production accuracy were observed after treatment, but unexpectedly did not hold before treatment. The surprising finding that greater tongue shape complexity was associated with lower accuracy at pre-treatment highlights the importance of evaluating tongue shape patterns (e.g., using ultrasound) prior to treatment, and has the potential to suggest that children with high tongue shape complexity at pre-treatment may be good candidates for ultrasound-based treatment.

RevDate: 2022-06-21

González-Alvarez J, R Sos-Peña (2022)

Perceiving Body Height From Connected Speech: Higher Fundamental Frequency Is Associated With the Speaker's Height.

Perceptual and motor skills [Epub ahead of print].

To a certain degree, human listeners can perceive a speaker's body size from their voice. The speaker's voice pitch or fundamental frequency (Fo) and the vocal formant frequencies are the voice parameters that have been most intensively studied in past body size perception research (particularly for body height). Artificially lowering the Fo of isolated vowels from male speakers improved listeners' accuracy of binary (i.e., tall vs not tall) body height perceptions. This has been explained by the theory that a denser harmonic spectrum provided by a low pitch improved the perceptual resolution of formants that aid formant-based size assessments. In the present study, we extended this research using connected speech (i.e., words and sentences) pronounced by speakers of both sexes. Unexpectedly, we found that raising Fo, not lowering it, increased the participants' perceptual performance in two binary discrimination tasks of body size. We explain our new finding in the temporal domain by the dynamic and time-varying acoustic properties of connected speech. Increased Fo might increase the sampling density of sound wave acoustic cycles and provide more detailed information, such as higher resolution, on the envelope shape.

RevDate: 2022-06-17

Sugiyama Y (2022)

Identification of Minimal Pairs of Japanese Pitch Accent in Noise-Vocoded Speech.

Frontiers in psychology, 13:887761.

The perception of lexical pitch accent in Japanese was assessed using noise-excited vocoder speech, which contained no fundamental frequency (f o) or its harmonics. While prosodic information such as in lexical stress in English and lexical tone in Mandarin Chinese is known to be encoded in multiple acoustic dimensions, such multidimensionality is less understood for lexical pitch accent in Japanese. In the present study, listeners were tested under four different conditions to investigate the contribution of non-f o properties to the perception of Japanese pitch accent: noise-vocoded speech stimuli consisting of 10 3-ERBN-wide bands and 15 2-ERBN-wide bands created from a male and female speaker. Results found listeners were able to identify minimal pairs of final-accented and unaccented words at a rate better than chance in all conditions, indicating the presence of secondary cues to Japanese pitch accent. Subsequent analyses were conducted to investigate if the listeners' ability to distinguish minimal pairs was correlated with duration, intensity or formant information. The results found no strong or consistent correlation, suggesting the possibility that listeners used different cues depending on the information available in the stimuli. Furthermore, the comparison of the current results with equivalent studies in English and Mandarin Chinese suggest that, although lexical prosodic information exists in multiple acoustic dimensions in Japanese, the primary cue is more salient than in other languages.

RevDate: 2022-06-14

Preisig B, Riecke L, A Hervais-Adelman (2022)

Speech sound categorization: The contribution of non-auditory and auditory cortical regions.

NeuroImage pii:S1053-8119(22)00494-3 [Epub ahead of print].

Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.

RevDate: 2022-06-13

Sayyahi F, V Boulenger (2022)

A temporal-based therapy for children with inconsistent phonological disorder: A case-series.

Clinical linguistics & phonetics [Epub ahead of print].

Deficits in temporal auditory processing, and in particular higher gap detection thresholds have been reported in children with inconsistent phonological disorder (IPD). Here we hypothesized that providing these children with extra time for phoneme identification may in turn enhance their phonological planning abilities for production, and accordingly improve not only consistency but also accuracy of their speech. We designed and tested a new temporal-based therapy, inspired by Core Vocabulary Therapy and called it T-CVT, where we digitally lengthened formant transitions between phonemes of words used for therapy. This allowed to target both temporal auditory processing and word phonological planning. Four preschool Persian native children with IPD received T-CVT for eight weeks. We measured changes in speech consistency (% inconsistency) and accuracy (percentage of consonants correct PCC) to assess the effects of the intervention. Therapy significantly improved both consistency and accuracy of word production in the four children: % inconsistency decreased from 59% on average before therapy to 2% post-T-CVT, and PCC increased from 61% to 92% on average. Consistency and accuracy were furthermore maintained or even still improved at three-month follow-up (2% inconsistency and 99% PCC). Results in a nonword repetition task showed the generalization of these effects to non-treated material: % inconsistency for nonwords decreased from 67% to 10% post-therapy, and PCC increased from 63% to 90%. These preliminary findings support the efficacy of the T-CVT intervention for children with IPD who show temporal auditory processing deficits as reflected by higher gap detection thresholds.

RevDate: 2022-06-08

Di Dona G, Scaltritti M, S Sulpizio (2022)

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli.

The European journal of neuroscience [Epub ahead of print].

The present study investigated whether listeners can form abstract voice representations while ignoring constantly changing phonological information and if they can use the resulting information to facilitate voice-change detection. Further, the study aimed at understanding whether the use of abstraction is restricted to the speech domain, or can be deployed also in non-speech contexts. We ran an EEG experiment including one passive and one active oddball task, each featuring a speech and a rotated-speech condition. In the speech condition, participants heard constantly changing vowels uttered by a male speaker (standard stimuli) which were infrequently replaced by vowels uttered by a female speaker with higher pitch (deviant stimuli). In the rotated-speech condition, participants heard rotated vowels, in which the natural formant structure of speech was disrupted. In the passive task, the Mismatch Negativity was elicited after the presentation of the deviant voice in both conditions, indicating that listeners could successfully group together different stimuli into a formant-invariant voice representation. In the active task, participants showed shorter RTs, higher accuracy and a larger P3b in the speech condition with respect to the rotated-speech condition. Results showed that whereas at a pre-attentive level the cognitive system can track pitch regularities while presumably ignoring constantly changing formant information both in speech and in rotated-speech, at an attentive level the use of such information is facilitated for speech. This facilitation was also testified by a stronger synchronization in the theta band (4-7 Hz), potentially pointing towards differences in encoding/retrieval processes.

RevDate: 2022-06-06

Hampsey E, Meszaros M, Skirrow C, et al (2022)

Protocol for Rhapsody: a longitudinal observational study examining the feasibility of speech phenotyping for remote assessment of neurodegenerative and psychiatric disorders.

BMJ open, 12(6):e061193 pii:bmjopen-2022-061193.

INTRODUCTION: Neurodegenerative and psychiatric disorders (NPDs) confer a huge health burden, which is set to increase as populations age. New, remotely delivered diagnostic assessments that can detect early stage NPDs by profiling speech could enable earlier intervention and fewer missed diagnoses. The feasibility of collecting speech data remotely in those with NPDs should be established.

METHODS AND ANALYSIS: The present study will assess the feasibility of obtaining speech data, collected remotely using a smartphone app, from individuals across three NPD cohorts: neurodegenerative cognitive diseases (n=50), other neurodegenerative diseases (n=50) and affective disorders (n=50), in addition to matched controls (n=75). Participants will complete audio-recorded speech tasks and both general and cohort-specific symptom scales. The battery of speech tasks will serve several purposes, such as measuring various elements of executive control (eg, attention and short-term memory), as well as measures of voice quality. Participants will then remotely self-administer speech tasks and follow-up symptom scales over a 4-week period. The primary objective is to assess the feasibility of remote collection of continuous narrative speech across a wide range of NPDs using self-administered speech tasks. Additionally, the study evaluates if acoustic and linguistic patterns can predict diagnostic group, as measured by the sensitivity, specificity, Cohen's kappa and area under the receiver operating characteristic curve of the binary classifiers distinguishing each diagnostic group from each other. Acoustic features analysed include mel-frequency cepstrum coefficients, formant frequencies, intensity and loudness, whereas text-based features such as number of words, noun and pronoun rate and idea density will also be used.

ETHICS AND DISSEMINATION: The study received ethical approval from the Health Research Authority and Health and Care Research Wales (REC reference: 21/PR/0070). Results will be disseminated through open access publication in academic journals, relevant conferences and other publicly accessible channels. Results will be made available to participants on request.


RevDate: 2022-06-06

Coughler C, Quinn de Launay KL, Purcell DW, et al (2022)

Pediatric Responses to Fundamental and Formant Frequency Altered Auditory Feedback: A Scoping Review.

Frontiers in human neuroscience, 16:858863.

Purpose: The ability to hear ourselves speak has been shown to play an important role in the development and maintenance of fluent and coherent speech. Despite this, little is known about the developing speech motor control system throughout childhood, in particular if and how vocal and articulatory control may differ throughout development. A scoping review was undertaken to identify and describe the full range of studies investigating responses to frequency altered auditory feedback in pediatric populations and their contributions to our understanding of the development of auditory feedback control and sensorimotor learning in childhood and adolescence.

Method: Relevant studies were identified through a comprehensive search strategy of six academic databases for studies that included (a) real-time perturbation of frequency in auditory input, (b) an analysis of immediate effects on speech, and (c) participants aged 18 years or younger.

Results: Twenty-three articles met inclusion criteria. Across studies, there was a wide variety of designs, outcomes and measures used. Manipulations included fundamental frequency (9 studies), formant frequency (12), frequency centroid of fricatives (1), and both fundamental and formant frequencies (1). Study designs included contrasts across childhood, between children and adults, and between typical, pediatric clinical and adult populations. Measures primarily explored acoustic properties of speech responses (latency, magnitude, and variability). Some studies additionally examined the association of these acoustic responses with clinical measures (e.g., stuttering severity and reading ability), and neural measures using electrophysiology and magnetic resonance imaging.

Conclusion: Findings indicated that children above 4 years generally compensated in the opposite direction of the manipulation, however, in several cases not as effectively as adults. Overall, results varied greatly due to the broad range of manipulations and designs used, making generalization challenging. Differences found between age groups in the features of the compensatory vocal responses, latency of responses, vocal variability and perceptual abilities, suggest that maturational changes may be occurring in the speech motor control system, affecting the extent to which auditory feedback is used to modify internal sensorimotor representations. Varied findings suggest vocal control develops prior to articulatory control. Future studies with multiple outcome measures, manipulations, and more expansive age ranges are needed to elucidate findings.

RevDate: 2022-05-31

Wang X, T Wang (2022)

Voice Recognition and Evaluation of Vocal Music Based on Neural Network.

Computational intelligence and neuroscience, 2022:3466987.

Artistic voice is the artistic life of professional voice users. In the process of selecting and cultivating artistic performing talents, the evaluation of voice even occupies a very important position. Therefore, an appropriate evaluation of the artistic voice is crucial. With the development of art education, how to scientifically evaluate artistic voice training methods and fairly select artistic voice talents is an urgent need for objective evaluation of artistic voice. The current evaluation methods for artistic voices are time-consuming, laborious, and highly subjective. In the objective evaluation of artistic voice, the selection of evaluation acoustic parameters is very important. Attempt to extract the average energy, average frequency error, and average range error of singing voice by using speech analysis technology as the objective evaluation acoustic parameters, use neural network method to objectively evaluate the singing quality of artistic voice, and compare with the subjective evaluation of senior professional teachers. In this paper, voice analysis technology is used to extract the first formant, third formant, fundamental frequency, sound range, fundamental frequency perturbation, first formant perturbation, third formant perturbation, and average energy of singing acoustic parameters. By using BP neural network methods, the quality of singing was evaluated objectively and compared with the subjective evaluation of senior vocal professional teachers. The results show that the BP neural network method can accurately and objectively evaluate the quality of singing voice by using the evaluation parameters, which is helpful in scientifically guiding the selection and training of artistic voice talents.

RevDate: 2022-05-13

Tomaschek F, M Ramscar (2022)

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing.

Frontiers in psychology, 13:754395.

The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.

RevDate: 2022-05-09

Haiduk F, WT Fitch (2022)

Understanding Design Features of Music and Language: The Choric/Dialogic Distinction.

Frontiers in psychology, 13:786899.

Music and spoken language share certain characteristics: both consist of sequences of acoustic elements that are combinatorically combined, and these elements partition the same continuous acoustic dimensions (frequency, formant space and duration). However, the resulting categories differ sharply: scale tones and note durations of small integer ratios appear in music, while speech uses phonemes, lexical tone, and non-isochronous durations. Why did music and language diverge into the two systems we have today, differing in these specific features? We propose a framework based on information theory and a reverse-engineering perspective, suggesting that design features of music and language are a response to their differential deployment along three different continuous dimensions. These include the familiar propositional-aesthetic ('goal') and repetitive-novel ('novelty') dimensions, and a dialogic-choric ('interactivity') dimension that is our focus here. Specifically, we hypothesize that music exhibits specializations enhancing coherent production by several individuals concurrently-the 'choric' context. In contrast, language is specialized for exchange in tightly coordinated turn-taking-'dialogic' contexts. We examine the evidence for our framework, both from humans and non-human animals, and conclude that many proposed design features of music and language follow naturally from their use in distinct dialogic and choric communicative contexts. Furthermore, the hybrid nature of intermediate systems like poetry, chant, or solo lament follows from their deployment in the less typical interactive context.

RevDate: 2022-05-06

Hall A, Kawai K, Graber K, et al (2021)

Acoustic analysis of surgeons' voices to assess change in the stress response during surgical in situ simulation.

BMJ simulation & technology enhanced learning, 7(6):471-477 pii:bmjstel-2020-000727.

Introduction: Stress may serve as an adjunct (challenge) or hindrance (threat) to the learning process. Determining the effect of an individual's response to situational demands in either a real or simulated situation may enable optimisation of the learning environment. Studies of acoustic analysis suggest that mean fundamental frequency and formant frequencies of voice vary with an individual's response during stressful events. This hypothesis is reviewed within the otolaryngology (ORL) simulation environment to assess whether acoustic analysis could be used as a tool to determine participants' stress response and cognitive load in medical simulation. Such an assessment could lead to optimisation of the learning environment.

Methodology: ORL simulation scenarios were performed to teach the participants teamwork and refine clinical skills. Each was performed in an actual operating room (OR) environment (in situ) with a multidisciplinary team consisting of ORL surgeons, OR nurses and anaesthesiologists. Ten of the scenarios were led by an ORL attending and ten were led by an ORL fellow. The vocal communication of each of the 20 individual leaders was analysed using a long-term pitch analysis PRAAT software (autocorrelation method) to obtain mean fundamental frequency (F0) and first four formant frequencies (F1, F2, F3 and F4). In reviewing individual scenarios, each leader's voice was analysed during a non-stressful environment (WHO sign-out procedure) and compared with their voice during a stressful portion of the scenario (responding to deteriorating oxygen saturations in the manikin).

Results: The mean unstressed F0 for the male voice was 161.4 Hz and for the female voice was 217.9 Hz. The mean fundamental frequency of speech in the ORL fellow (lead surgeon) group increased by 34.5 Hz between the scenario's baseline and stressful portions. This was significantly different to the mean change of -0.5 Hz noted in the attending group (p=0.01). No changes were seen in F1, F2, F3 or F4.

Conclusions: This study demonstrates a method of acoustic analysis of the voices of participants taking part in medical simulations. It suggests acoustic analysis of participants may offer a simple, non-invasive, non-intrusive adjunct in evaluating and titrating the stress response during simulation.

RevDate: 2022-05-02

Jarollahi F, Valadbeigi A, Jalaei B, et al (2022)

Comparing Sound-Field Speech-Auditory Brainstem Response Components between Cochlear Implant Users with Different Speech Recognition in Noise Scores.

Iranian journal of child neurology, 16(2):93-105.

Objectives: Many studies have suggested that cochlear implant (CI) users vary in terms of speech recognition in noise. Studies in this field attribute this variety partly to subcortical auditory processing. Studying speech-Auditory Brainstem Response (speech-ABR) provides good information about speech processing; thus, this work was designed to compare speech-ABR components between two groups of CI users with good and poor speech recognition in noise scores.

Materials & Methods: The present study was conducted on two groups of CI users aged 8-10 years old. The first group (CI-good) consisted of 15 children with prelingual CI who had good speech recognition in noise performance. The second group (CI-poor) was matched with the first group, but they had poor speech recognition in noise performance. The speech-ABR test in a sound-field presentation was performed for all the participants.

Results: The speech-ABR response showed more delay in C, D, E, F, O latencies in CI-poor than CI-good users (P <0.05), meanwhile no significant difference was observed in initial wave (V(t= -0.293, p= 0.771 and A (t= -1.051, p= 0.307). Analysis in spectral-domain showed a weaker representation of fundamental frequency as well as the first formant and high-frequency component of speech stimuli in the CI users with poor auditory performance.

Conclusions: Results revealed that CI users who showed poor auditory performance in noise performance had deficits in encoding the periodic portion of speech signals at the brainstem level. Also, this study could be as physiological evidence for poorer pitch processing in CI users with poor speech recognition in noise performance.

RevDate: 2022-04-22

Houle N, Goudelias D, Lerario MP, et al (2022)

Effect of Anchor Term on Auditory-Perceptual Ratings of Feminine and Masculine Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

BACKGROUND: Studies investigating auditory perception of gender expression vary greatly in the specific terms applied to gender expression in rating scales.

PURPOSE: This study examined the effects of different anchor terms on listeners' auditory perceptions of gender expression in phonated and whispered speech. Additionally, token and speaker cues were examined to identify predictors of the auditory-perceptual ratings.

METHOD: Inexperienced listeners (n = 105) completed an online rating study in which they were asked to use one of five visual analog scales (VASs) to rate cis men, cis women, and transfeminine speakers in both phonated and whispered speech. The VASs varied by anchor term (very female/very male, feminine/masculine, feminine female/masculine male, very feminine/not at all feminine, and not at all masculine/very masculine).

RESULTS: Linear mixed-effects models revealed significant two-way interactions of gender expression by anchor term and gender expression by condition. In general, the feminine female/masculine male scale resulted in the most extreme ratings (closest to the end points), and the feminine/masculine scale resulted in the most central ratings. As expected, for all speakers, whispered speech was rated more centrally than phonated speech. Additionally, ratings of phonated speech were predicted by mean fundamental frequency (f o) within each speaker group and by smoothed cepstral peak prominence in cisgender speakers. In contrast, ratings of whispered speech, which lacks an f o, were predicted by indicators of vocal tract resonance (second formant and speaker height).

CONCLUSIONS: The current results indicate that differences in the terms applied to rating scales limit generalization of results across studies. Identifying the patterns across listener ratings of gender expression provide a rationale for researchers and clinicians when making choices about terms. Additionally, beyond f o and vocal tract resonance, predictors of listener ratings vary based on the anchor terms used to describe gender expression.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19617564.

RevDate: 2022-04-14

Kırbac A, Turkyılmaz MD, S Yağcıoglu (2022)

Gender Effects on Binaural Speech Auditory Brainstem Response.

The journal of international advanced otology, 18(2):125-130.

BACKGROUND: The speech auditory brainstem response is a tool that provides direct information on how speech sound is temporally and spectrally coded by the auditory brainstem. Speech auditory brainstem response is influenced by many variables, but the effect of gender is unclear, particularly in the binaural recording. Studies on speech auditory brainstem response evoked by binaural stimulation are limited, but gender studies are even more limited and contradictory. This study aimed at examining the effect of gender on speech auditory brainstem response in adults.

METHODS: Time- and frequency-domain analyses of speech auditory brainstem response recordings of 30 healthy participants (15 women and 15 men) aged 18-35 years with normal hearing and no musical education were obtained. For each adult, speech auditory brainstem response was recorded with the syllable /da/ presented binaurally. Peaks of time (V, A, C, D, E, F, and O) and frequency (fundamental frequency, first formant frequency, and high frequency) domains of speech auditory brainstem response were compared between men and women.

RESULTS: V, A, and F peak latencies of women were significantly shorter than those of men (P< .05). However, no difference was found in the peak amplitude of the time (P > .05) or frequency domain between women and men (P > .05).

CONCLUSION: Gender differences in binaural speech auditory brainstem response are significant in adults, particularly in the time domain. When speech stimuli are used for auditory brainstem responses, normative data specific to gender are required. Preliminary normative data from this study could serve as a reference for future studies on binaural speech auditory brainstem response among Turkish adults.

RevDate: 2022-04-13

Cangokce Yasar O, Ozturk S, Kemal O, et al (2021)

Effects of Subthalamic Nucleus Deep Brain Stimulation Surgery on Voice and Formant Frequencies of Vowels in Turkish.

Turkish neurosurgery [Epub ahead of print].

AIM: This study aimed to investigate the effects of deep brain stimulation (DBS) of the subthalamic nucleus (STN) on acoustic characteristics of voice production in Turkish patients with Parkinson's disease (PD).

MATERIAL AND METHODS: This study recruited 20 patients diagnosed with PD. Voice samples were recorded under the "stimulation on" and "stimulation off" conditions of STN-DBS. Acoustic recordings of the patients were made during the production of vowels /a/, /o/, and /i/ and repetition of the syllables /pa/-/ta/-/ka/. Acoustic analyses were performed using Praat.

RESULTS: A significant difference in the parameters was observed among groups for vowels. A positive significant difference was observed between preoperative med-on and postoperative med-on/stim-on groups for /a/ and the postoperative med-on/stim-on and postoperative med-on/stim-off groups for /o/ and /i/ for frequency perturbation (jitter) and noise-to-harmonics ratio. No significant difference was noted between the preoperative med-on and postoperative med-on/stim-off groups for any vowels.

CONCLUSION: STN-DBS surgery has an acute positive effect on voice. Studies on formant frequency analysis in STN-DBS may be expanded with both articulation and intelligibility tests to enable us to combine patient abilities in various perspectives and to obtain precise results.

RevDate: 2022-04-11

Quatieri TF, Talkar T, JS Palmer (2020)

A Framework for Biomarkers of COVID-19 Based on Coordination of Speech-Production Subsystems.

IEEE open journal of engineering in medicine and biology, 1:203-206.

Goal: We propose a speech modeling and signal-processing framework to detect and track COVID-19 through asymptomatic and symptomatic stages. Methods: The approach is based on complexity of neuromotor coordination across speech subsystems involved in respiration, phonation and articulation, motivated by the distinct nature of COVID-19 involving lower (i.e., bronchial, diaphragm, lower tracheal) versus upper (i.e., laryngeal, pharyngeal, oral and nasal) respiratory tract inflammation, as well as by the growing evidence of the virus' neurological manifestations. Preliminary results: An exploratory study with audio interviews of five subjects provides Cohen's d effect sizes between pre-COVID-19 (pre-exposure) and post-COVID-19 (after positive diagnosis but presumed asymptomatic) using: coordination of respiration (as measured through acoustic waveform amplitude) and laryngeal motion (fundamental frequency and cepstral peak prominence), and coordination of laryngeal and articulatory (formant center frequencies) motion. Conclusions: While there is a strong subject-dependence, the group-level morphology of effect sizes indicates a reduced complexity of subsystem coordination. Validation is needed with larger more controlled datasets and to address confounding influences such as different recording conditions, unbalanced data quantities, and changes in underlying vocal status from pre-to-post time recordings.

RevDate: 2022-04-08

Dahl KL, François FA, Buckley DP, et al (2022)

Voice and Speech Changes in Transmasculine Individuals Following Circumlaryngeal Massage and Laryngeal Reposturing.

American journal of speech-language pathology [Epub ahead of print].

PURPOSE: The purpose of this study was to measure the short-term effects of circumlaryngeal massage and laryngeal reposturing on acoustic and perceptual characteristics of voice in transmasculine individuals.

METHOD: Fifteen transmasculine individuals underwent one session of sequential circumlaryngeal massage and laryngeal reposturing with a speech-language pathologist. Voice recordings were collected at three time points-baseline, postmassage, and postreposturing. Fundamental frequency (f o), formant frequencies, and relative fundamental frequency (RFF; an acoustic correlate of laryngeal tension) were measured. Estimates of vocal tract length (VTL) were derived from formant frequencies. Twelve listeners rated the perceived masculinity of participants' voices at each time point. Repeated-measures analyses of variance measured the effect of time point on f o, estimated VTL, RFF, and perceived voice masculinity. Significant effects were evaluated with post hoc Tukey's tests.

RESULTS: Between baseline and end of the session, f o decreased, VTL increased, and participant voices were perceived as more masculine, all with statistically significant differences. RFF did not differ significantly at any time point. Outcomes were highly variable at the individual level.

CONCLUSION: Circumlaryngeal massage and laryngeal reposturing have short-term effects on select acoustic (f o, estimated VTL) and perceptual characteristics (listener-assigned voice masculinity) of voice in transmasculine individuals.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19529299.

RevDate: 2022-04-04

Zhang G, Shao J, Zhang C, et al (2022)

The Perception of Lexical Tone and Intonation in Whispered Speech by Mandarin-Speaking Congenital Amusics.

Journal of speech, language, and hearing research : JSLHR, 65(4):1331-1348.

PURPOSE: A fundamental feature of human speech is variation, including the manner of phonation, as exemplified in the case of whispered speech. In this study, we employed whispered speech to examine an unresolved issue about congenital amusia, a neurodevelopmental disorder of musical pitch processing, which also affects speech pitch processing such as lexical tone and intonation perception. The controversy concerns whether amusia is a pitch-processing disorder or can affect speech processing beyond pitch.

METHOD: We examined lexical tone and intonation recognition in 19 Mandarin-speaking amusics and 19 matched controls in phonated and whispered speech, where fundamental frequency (f o) information is either present or absent.

RESULTS: The results revealed that the performance of congenital amusics was inferior to that of controls in lexical tone identification in both phonated and whispered speech. These impairments were also detected in identifying intonation (statements/questions) in phonated and whispered modes. Across the experiments, regression models revealed that f o and non-f o (duration, intensity, and formant frequency) acoustic cues predicted tone and intonation recognition in phonated speech, whereas non-f o cues predicted tone and intonation recognition in whispered speech. There were significant differences between amusics and controls in the use of both f o and non-f o cues.

CONCLUSION: The results provided the first evidence that the impairments of amusics in lexical tone and intonation identification prevail into whispered speech and support the hypothesis that the deficits of amusia extend beyond pitch processing.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19302275.

RevDate: 2022-04-01

Carl M, Levy ES, M Icht (2022)

Speech treatment for Hebrew-speaking adolescents and young adults with developmental dysarthria: A comparison of mSIT and Beatalk.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Individuals with developmental dysarthria typically demonstrate reduced functioning of one or more of the speech subsystems, which negatively impacts speech intelligibility and communication within social contexts. A few treatment approaches are available for improving speech production and intelligibility among individuals with developmental dysarthria. However, these approaches have only limited application and research findings among adolescents and young adults.

AIMS: To determine and compare the effectiveness of two treatment approaches, the modified Speech Intelligibility Treatment (mSIT) and the Beatalk technique, on speech production and intelligibility among Hebrew-speaking adolescents and young adults with developmental dysarthria.

METHODS & PROCEDURES: Two matched groups of adolescents and young adults with developmental dysarthria participated in the study. Each received one of the two treatments, mSIT or Beatalk, over the course of 9 weeks. Measures of speech intelligibility, articulatory accuracy, voice and vowel acoustics were assessed both pre- and post-treatment.

OUTCOMES & RESULTS: Both the mSIT and Beatalk groups demonstrated gains in at least some of the outcome measures. Participants in the mSIT group exhibited improvement in speech intelligibility and voice measures, while participants in the Beatalk group demonstrated increased articulatory accuracy and gains in voice measures from pre- to post-treatment. Significant increases were noted post-treatment for first formant values for select vowels.

Results of this preliminary study are promising for both treatment approaches. The differentiated results indicate their distinct application to speech intelligibility deficits. The current findings also hold clinical significance for treatment among adolescents and young adults with motor speech disorders and application for a language other than English.

WHAT THIS PAPER ADDS: What is already known on the subject Developmental dysarthria (e.g., secondary to cerebral palsy) is a motor speech disorder that negatively impacts speech intelligibility, and thus communication participation. Select treatment approaches are available with the aim of improving speech intelligibility in individuals with developmental dysarthria; however, these approaches are limited in number and have only seldomly been applied specifically to adolescents and young adults. What this paper adds to existing knowledge The current study presents preliminary data regarding two treatment approaches, the mSIT and Beatalk technique, administered to Hebrew-speaking adolescents and young adults with developmental dysarthria in a group setting. Results demonstrate the initial effectiveness of the treatment approaches, with different gains noted for each approach across speech and voice domains. What are the potential or actual clinical implications of this work? The findings add to the existing literature on potential treatment approaches aiming to improve speech production and intelligibility among individuals with developmental dysarthria. The presented approaches also show promise for group-based treatments as well as the potential for improvement among adolescents and young adults with motor speech disorders.

RevDate: 2022-03-30

Sen A, Thakkar H, Vincent V, et al (2022)

Endothelial colony forming cells' tetrahydrobiopterin level in coronary artery disease patients and its association with circulating endothelial progenitor cells.

Canadian journal of physiology and pharmacology [Epub ahead of print].

Endothelial colony forming cells (ECFCs) participate in neovascularization. Endothelial nitric oxide synthase (eNOS) derived NO· helps in homing of endothelial progenitor cells (EPCs) at the site of vascular injury. The enzyme cofactor tetrahydrobiopterin (BH4) stabilizes the catalytic active state of eNOS. Association of intracellular ECFCs biopterins and ratio of reduced to oxidized biopterin (BH4:BH2) with circulatory EPCs and ECFCs functionality have not been studied. We investigated ECFCs biopterin levels and its association with circulatory EPCs as well as ECFCs proliferative potential in terms of day of appearance in culture. Circulatory EPCs were enumerated by flowcytometry in 53 coronary artery disease (CAD) patients and 42 controls. ECFCs were cultured, characterized, and biopterin levels assessed by high performance liquid chromatography. Appearance of ECFCs' colony and their number were recorded. Circulatory EPCs were significantly lower in CAD and ECFCs appeared in 56% and 33% of CAD and control subjects, respectively. Intracellular BH4 and BH4:BH2 were significantly reduced in CAD. BH4:BH2 was positively correlated with circulatory EPCs (p = 0.01), and negatively with day of appearance of ECFCs (p = 0.04). Circulatory EPCs negatively correlated with ECFCs appearance (p = 0.02). These findings suggest the role of biopterins in maintaining circulatory EPCs and functional integrity of ECFCs.

RevDate: 2022-03-28

Ho GY, Kansy IK, Klavacs KA, et al (2022)

3Effect of FFP2+3 masks on voice range profile measurement and voice acoustics in routine voice diagnostics.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000524299 [Epub ahead of print].

INTRODUCTION: Voice diagnostics including voice range profile measurement (VRP) and acoustic voice analysis is essential in laryngology and phoniatrics. Due to Covid-19 pandemic, wearing of filtering face masks (FFP2/3) is recommended when high risk aerosol generating procedures like singing and speaking are being performed. Goal of this study was to compare VRP parameters when performed without and with FFP2/3 masks. Further, formant analysis for sustained vowels, singer's formant and analysis of reading standard text samples were performed without/with FFP2/3 masks.

METHODS: 20 subjects (6 male and 14 female) were enrolled in this study with an average age of 36±16 y (mean ± SD). 14 patients were rated as euphonic/not hoarse and 6 patients as mildly hoarse. All subjects underwent the VRP measurements, vowel and text recordings without/with FFP2/3 mask using the software DiVAS by XION medical (Berlin, Germany). Voice range of singing voice, equivalent of voice extension measure (eVEM), fundamental frequency (F0), sound pressure level (SPL) of soft speaking and shouting were calculated and analyzed. Maximum phonation time (MPT) and jitter-% were included for Dysphonia Severity Index (DSI) measurement. Analyses of singer's formant were performed. Spectral analyses of sustained vowels /a:/, /i:/ and /u:/ (first=F1 and second=F2 formants), intensity of long term average spectrum (LTAS) and alpha-ratio (α-ratio) were calculated using the freeware praat.

RESULTS: For all subjects the mean values of routine voice parameters without/with mask were analyzed: no significant differences were found in results of singing voice range, eVEM. SPL and frequency of soft speaking/shouting, except significant lower mean SPL of shouting with FFP2/3 mask, in particular that of the female subjects (p=0.002). Results of MPT, jitter and DSI without/with FFP2/3 mask showed no significant differences. Further mean values analyzed without/with mask were: ratio singer's formant/loud singing, with lower ratio with FFP2/3 mask (p=0.001). F1 and F2 of /a:/, /i:/, /u:/, with no significant differences of the results, with the exception of F2 of /i:/ with lower value with FFP2/3 mask (p=0.005). With the exceptions mentioned, the t-test revealed no significant differences for each of the routine parameters tested in the recordings without and with wearing a FFP2/3 mask.

CONCLUSION: It can be concluded, that VRP measurements including DSI performed with FFP2/3 masks provide reliable data in clinical routine with respect to voice condition/constitution. Spectral analyses of sustained vowel, text and singer's formant will be affected by wearing FFP2/3 masks.

RevDate: 2022-03-28

Chauvette L, Fournier P, A Sharp (2022)

The frequency-following response to assess the neural representation of spectral speech cues in older adults.

Hearing research, 418:108486 pii:S0378-5955(22)00057-0 [Epub ahead of print].

Older adults often present difficulties understanding speech that cannot be explained by age-related changes in sound audibility. Psychoacoustic and electrophysiologic studies have linked these suprathreshold difficulties to age-related deficits in the auditory processing of temporal and spectral sound information. These studies suggest the existence of an age-related temporal processing deficit in the central auditory system, but the existence of such deficit in the spectral domain remains understudied. The FFR is an electrophysiological evoked response that assesses the ability of the neural auditory system to reproduce the spectral and temporal patterns of a sound. The main goal of this short review is to investigate if the FFR can identify and measure spectral processing deficits in the elderly compared to younger adults (for both, without hearing loss or competing noise). Furthermore, we want to determine what stimuli and analyses have been used in the literature to assess the neural encoding of spectral cues in older adults. Almost all reviewed articles showed an age-related decline in the auditory processing of spectral acoustic information. Even when using different speech and non-speech stimuli, studies reported an age-related decline at the fundamental frequency, at the first formant, and at other harmonic components using different metrics, such as the response's amplitude, inter-trial phase coherence, signal-to-response correlation, and signal-to-noise ratio. These results suggest that older adults may present age-related spectral processing difficulties, but further FFR studies are needed to clarify the effect of advancing age on the neural encoding of spectral speech cues. Spectral processing research on aging would benefit from using a broader variety of stimuli and from rigorously controlling for hearing thresholds even in the absence of disabling hearing loss. Advances in the understanding of the effect of age on FFR measures of spectral encoding could lead to the development of new clinical tools, with possible applications in the field of hearing aid fitting.

RevDate: 2022-03-22

Zaltz Y, L Kishon-Rabin (2022)

Difficulties Experienced by Older Listeners in Utilizing Voice Cues for Speaker Discrimination.

Frontiers in psychology, 13:797422.

Human listeners are assumed to apply different strategies to improve speech recognition in background noise. Young listeners with normal hearing (NH), e.g., have been shown to follow the voice of a particular speaker based on the fundamental (F0) and formant frequencies, which are both influenced by the gender, age, and size of the speaker. However, the auditory and cognitive processes that underlie the extraction and discrimination of these voice cues across speakers may be subject to age-related decline. The present study aimed to examine the utilization of F0 and formant cues for voice discrimination (VD) in older adults with hearing expected for their age. Difference limens (DLs) for VD were estimated in 15 healthy older adults (65-78 years old) and 35 young adults (18-35 years old) using only F0 cues, only formant frequency cues, and a combination of F0 + formant frequencies. A three-alternative forced-choice paradigm with an adaptive-tracking threshold-seeking procedure was used. Wechsler backward digit span test was used as a measure of auditory working memory. Trail Making Test (TMT) was used to provide cognitive information reflecting a combined effect of processing speed, mental flexibility, and executive control abilities. The results showed that (a) the mean VD thresholds of the older adults were poorer than those of the young adults for all voice cues, although larger variability was observed among the older listeners; (b) both age groups found the formant cues more beneficial for VD, compared to the F0 cues, and the combined (F0 + formant) cues resulted in better thresholds, compared to each cue separately; (c) significant associations were found for the older adults in the combined F0 + formant condition between VD and TMT scores, and between VD and hearing sensitivity, supporting the notion that a decline with age in both top-down and bottom-up mechanisms may hamper the ability of older adults to discriminate between voices. The present findings suggest that older listeners may have difficulty following the voice of a specific speaker and thus implementing doing so as a strategy for listening amid noise. This may contribute to understanding their reported difficulty listening in adverse conditions.

RevDate: 2022-03-15

Paulino CEB, Silva HJD, Gomes AOC, et al (2022)

Relationship Between Oropharyngeal Geometry and Vocal Parameters in Subjects With Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00021-2 [Epub ahead of print].

OBJECTIVE: To verify whether the dimensions of different segments of the oropharyngeal cavity have different proportions between Parkinson's disease patients and vocally healthy subjects and investigate whether the measurements of these subjects' oropharyngeal geometry associate with their acoustic measurements of voice.

METHOD: Quantitative, descriptive, cross-sectional, and retrospective study with secondary data, approved by the Human Research Ethics Committee under no. 4.325.029. We used vocal samples and data from the oropharyngeal geometry of 40 subjects - 20 with Parkinson's disease stages I to III and 20 who formed the control group, matched for sex and age. Each group had 10 males and 10 females, mean age of 61 years (±6.0). Formant (F1, F2, and F3) and cepstral measures of the sustained vowel /ε/ were extracted and arranged in the database to determine their values using Praat software. The data were descriptively analyzed, with statistics generated with R software. The proportion of oropharyngeal geometry measurements was arranged by mean values and coefficients of variation. Pearson's linear correlation test was applied to relate voice parameters to oropharyngeal geometry, considering P < 0.05, and linear regression test, to justify F2.

RESULTS: The Parkinson's disease group showed a linear relationship between oral cavity length and F1 in males (P = 0.04) and between glottal area and F2 in females (P = 0.00); linear relationships were established according to age in both groups, and a regression model for F2 was estimated (R2 = 0.61). There was no difference between pathological and healthy voices; there was a difference in the proportional relationship of oropharyngeal geometry between the groups.

CONCLUSION: The proportional relationship of oropharyngeal geometry differs between the Parkinson's disease group and the control group, as well as the relationship between oropharyngeal geometry and formant and cepstral values of voice according to the subjects' sex and age.

RevDate: 2022-03-11

Jüchter C, Beutelmann R, GM Klump (2022)

Speech sound discrimination by Mongolian gerbils.

Hearing research, 418:108472 pii:S0378-5955(22)00043-0 [Epub ahead of print].

The present study establishes the Mongolian gerbil (Meriones unguiculatus) as a model for investigating the perception of human speech sounds. We report data on the discrimination of logatomes (CVCs - consonant-vowel-consonant combinations with outer consonants /b/, /d/, /s/ and /t/ and central vowels /a/, /aː/, /ɛ/, /eː/, /ɪ/, /iː/, /ɔ/, /oː/, /ʊ/ and /uː/, VCVs - vowel-consonant-vowel combinations with outer vowels /a/, /ɪ/ and /ʊ/ and central consonants /b/, /d/, /f/, /g/, /k/, /l/, /m/, /n/, /p/, /s/, /t/ and /v/) by gerbils. Four gerbils were trained to perform an oddball target detection paradigm in which they were required to discriminate a deviant CVC or VCV in a sequence of CVC or VCV standards, respectively. The experiments were performed with an ICRA-1 noise masker with speech-like spectral properties, and logatomes of multiple speakers were presented at various signal-to-noise ratios. Response latencies were measured to generate perceptual maps employing multidimensional scaling, which visualize the gerbils' internal maps of the sounds. The dimensions of the perceptual maps were correlated to multiple phonetic features of the speech sounds for evaluating which features of vowels and consonants are most important for the discrimination. The perceptual representation of vowels and consonants in gerbils was similar to that of humans, although gerbils needed higher signal-to-noise ratios for the discrimination of speech sounds than humans. The gerbils' discrimination of vowels depended on differences in the frequencies of the first and second formant determined by tongue height and position. Consonants were discriminated based on differences in combinations of their articulatory features. The similarities in the perception of logatomes by gerbils and humans renders the gerbil a suitable model for human speech sound discrimination.

RevDate: 2022-03-08

Tamura T, Tanaka Y, Watanabe Y, et al (2022)

Relationships between maximum tongue pressure and second formant transition in speakers with different types of dysarthria.

PloS one, 17(3):e0264995 pii:PONE-D-21-32058.

The effects of muscle weakness on speech are currently not fully known. We investigated the relationships between maximum tongue pressure and second formant transition in adults with different types of dysarthria. It focused on the slope in the second formant transition because it reflects the tongue velocity during articulation. Sixty-three Japanese speakers with dysarthria (median age, 68 years; interquartile range, 58-77 years; 44 men and 19 women) admitted to acute and convalescent hospitals were included. Thirty neurologically normal speakers aged 19-85 years (median age, 22 years; interquartile range, 21.0-23.8 years; 14 men and 16 women) were also included. The relationship between the maximum tongue pressure and speech function was evaluated using correlation analysis in the dysarthria group. Speech intelligibility, the oral diadochokinesis rate, and the second formant slope were based on the impaired speech index. More than half of the speakers had mild to moderate dysarthria. Speakers with dysarthria showed significantly lower maximum tongue pressure, speech intelligibility, oral diadochokinesis rate, and second formant slope than neurologically normal speakers. Only the second formant slope was significantly correlated with the maximum tongue pressure (r = 0.368, p = 0.003). The relationship between the second formant slope and maximum tongue pressure showed a similar correlation in the analysis of subgroups divided by sex. The oral diadochokinesis rate, which is related to the speed of articulation, is affected by voice on/off, mandibular opening/closing, and range of motion. In contrast, the second formant slope was less affected by these factors. These results suggest that the maximum isometric tongue strength is associated with tongue movement speed during articulation.

RevDate: 2022-03-07

Georgiou GP (2022)

Acoustic markers of vowels produced with different types of face masks.

Applied acoustics. Acoustique applique. Angewandte Akustik, 191:108691.

The wide spread of SARS-CoV-2 led to the extensive use of face masks in public places. Although masks offer significant protection from infectious droplets, they also impact verbal communication by altering speech signal. The present study examines how two types of face masks affect the speech properties of vowels. Twenty speakers were recorded producing their native vowels in a /pVs/ context, maintaining a normal speaking rate. Speakers were asked to produce the vowels in three conditions: (a) with a surgical mask, (b) with a cotton mask, and (c) without a mask. The speakers' output was analyzed through Praat speech acoustics software. We fitted three linear mixed-effects models to investigate the mask-wearing effects on the first formant (F1), second formant (F2), and duration of vowels. The results demonstrated that F1 and duration of vowels remained intact in the masked conditions compared to the unmasked condition, while F2 was altered for three out of five vowels (/e a u/) in the surgical mask and two out of five vowels (/e a/) in the cotton mask. So, both types of masks altered to some extent speech signal and they mostly affected the same vowel qualities. It is concluded that some acoustic properties are more sensitive than other to speech signal modification when speech is filtered through masks, while various sounds are affected in a different way. The findings may have significant implications for second/foreign language instructors who teach pronunciation and for speech therapists who teach sounds to individuals with language disorders.

RevDate: 2022-03-04

Anikin A, Pisanski K, D Reby (2022)

Static and dynamic formant scaling conveys body size and aggression.

Royal Society open science, 9(1):211496 pii:rsos211496.

When producing intimidating aggressive vocalizations, humans and other animals often extend their vocal tracts to lower their voice resonance frequencies (formants) and thus sound big. Is acoustic size exaggeration more effective when the vocal tract is extended before, or during, the vocalization, and how do listeners interpret within-call changes in apparent vocal tract length? We compared perceptual effects of static and dynamic formant scaling in aggressive human speech and nonverbal vocalizations. Acoustic manipulations corresponded to elongating or shortening the vocal tract either around (Experiment 1) or from (Experiment 2) its resting position. Gradual formant scaling that preserved average frequencies conveyed the impression of smaller size and greater aggression, regardless of the direction of change. Vocal tract shortening from the original length conveyed smaller size and less aggression, whereas vocal tract elongation conveyed larger size and more aggression, and these effects were stronger for static than for dynamic scaling. Listeners familiarized with the speaker's natural voice were less often 'fooled' by formant manipulations when judging speaker size, but paid more attention to formants when judging aggressive intent. Thus, within-call vocal tract scaling conveys emotion, but a better way to sound large and intimidating is to keep the vocal tract consistently extended.

RevDate: 2022-03-03

Haider CL, Suess N, Hauswald A, et al (2022)

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker.

NeuroImage pii:S1053-8119(22)00173-2 [Epub ahead of print].

Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.

RevDate: 2022-03-02

Hoyer P, Riedler M, Unterhofer C, et al (2022)

Vocal Tract and Subglottal Impedance in High Performance Singing: A Case Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00015-7 [Epub ahead of print].

OBJECTIVES/HYPOTHESIS: The respiratory process is important in vocal training and in professional singing, the airflow is highly important. It is hypothesized that subglottal resonances are important to the singing voice in high performance singing.

STUDY DESIGN: Single subject, prospective.

METHOD: A professional soprano singer shaped her vocal tract to form the vowels [a], [e], [i], [o], and [u] at the pitch d4. We measured phonated vowels and the vocal tract impedance spectra with a deterministic noise supplied by an iPhone buzzer in the range of 200 to 4,000 Hz at closed glottis, during exhalation and during inhalation while maintaining the shape of the vocal tract.

RESULTS: Measurements of the phonated vowels before and after the different glottal adjustments were highly reproducible. Vocal tract resonances and the ones resulting during respiration are reported. The impedance spectra show vowel dependent resonances with closed and open glottis. The formants of the vocal spectra are explained by including both, the vocal tract, and the subglottal resonances.

CONCLUSION: The findings indicate that subglottal resonances influence the first formant as well as the singers's formant cluster in high-performance singing. The instrumental setup used for the impedance measurement allows a simple and lightweight procedure for a measurement of vocal tract and subglottal resonances.

RevDate: 2022-03-02

Saba JN, JHL Hansen (2022)

The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.

The Journal of the Acoustical Society of America, 151(2):1007.

Natural compensation of speech production in challenging listening environments is referred to as the Lombard effect (LE). The resulting acoustic differences between neutral and Lombard speech have been shown to provide intelligibility benefits for normal hearing (NH) and cochlear implant (CI) listeners alike. Motivated by this outcome, three LE perturbation approaches consisting of pitch, duration, formant, intensity, and spectral contour modifications were designed specifically for CI listeners to combat speech-in-noise performance deficits. Experiment 1 analyzed the effects of loudness, quality, and distortion of approaches on speech intelligibility with and without formant-shifting. Significant improvements of +9.4% were observed in CI listeners without the formant-shifting approach at +5 dB signal-to-noise ratio (SNR) large-crowd-noise (LCN) when loudness was controlled, however, performance was found to be significantly lower for NH listeners. Experiment 2 evaluated the non-formant-shifting approach with additional spectral contour and high pass filtering to reduce spectral smearing and decrease distortion observed in Experiment 1. This resulted in significant intelligibility benefits of +30.2% for NH and +21.2% for CI listeners at 0 and +5 dB SNR LCN, respectively. These results suggest that LE perturbation may be useful as front-end speech modification approaches to improve intelligibility for CI users in noise.

RevDate: 2022-02-15

Nguyen DD, Chacon A, Payten C, et al (2022)

Acoustic characteristics of fricatives, amplitude of formants and clarity of speech produced without and with a medical mask.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Previous research has found that high-frequency energy of speech signals decreased while wearing face masks. However, no study has examined the specific spectral characteristics of fricative consonants and vowels and the perception of clarity of speech in mask wearing.

AIMS: To investigate acoustic-phonetic characteristics of fricative consonants and vowels and auditory perceptual rating of clarity of speech produced with and without wearing a face mask.

METHODS & PROCEDURES: A total of 16 healthcare workers read the Rainbow Passage using modal phonation in three conditions: without a face mask, with a standard surgical mask and with a KN95 mask (China GB2626-2006, a medical respirator with higher barrier level than the standard surgical mask). Speech samples were acoustically analysed for root mean square (RMS) amplitude (ARMS) and spectral moments of four fricatives /f/, /s/, /ʃ/ and /z/; and amplitude of the first three formants (A1, A2 and A3) measured from the reading passage and extracted vowels. Auditory perception of speech clarity was performed. Data were compared across mask and non-mask conditions using linear mixed models.

OUTCOMES & RESULTS: The ARMS of all included fricatives was significantly lower in surgical mask and KN95 mask compared with non-mask condition. Centre of gravity of /f/ decreased in both surgical and KN95 mask while other spectral moments did not show systematic significant linear trends across mask conditions. None of the formant amplitude measures was statistically different across conditions. Speech clarity was significantly poorer in both surgical and KN95 mask conditions.

Speech produced while wearing either a surgical mask or KN95 mask was associated with decreased fricative amplitude and poorer speech clarity.

WHAT THIS PAPER ADDS: What is already known on the subject Previous studies have shown that the overall spectral levels in high frequency ranges and intelligibility are decreased for speech produced with a face mask. It is unclear how different types of the speech signals that is, fricatives and vowels are presented in speech produced with wearing either a medical surgical or KN95 mask. It is also unclear whether ratings of speech clarity are similar for speech produced with these face masks. What this paper adds to existing knowledge Speech data collected using a real-world, clinical and non-laboratory-controlled settings showed differences in the amplitude of fricatives and speech clarity ratings between non-mask and mask-wearing conditions. Formant amplitude did not show significant differences in mask-wearing conditions compared with non-mask. What are the potential or actual clinical implications of this work? Wearing a surgical mask or a KN95 mask had different effects on consonants and vowels. It appeared from the findings in this study that these masks only affected fricative consonants and did not affect vowel production. The poorer speech clarity in these mask-wearing conditions has important implications for speech perception in communication between clinical staff and between medical officers and patients in clinics, and between people in everyday situations. The impact of these masks on speech perception may be more pronounced in people with hearing impairment and communication disorders. In voice evaluation and/or therapy sessions, the effects of wearing a medical mask can occur bidirectionally for both the clinician and the patient. The patient may find it more challenging to understand the speech conveyed by the clinician while the clinician may not perceptually assess patient's speech and voice accurately. Given the significant correlation between clarity ratings and fricative amplitude, improving fricative signals would be useful to improve speech clarity while wearing these medical face masks.

RevDate: 2022-02-10

Gábor A, Kaszás N, Faragó T, et al (2022)

The acoustic bases of human voice identity processing in dogs.

Animal cognition [Epub ahead of print].

Speech carries identity-diagnostic acoustic cues that help individuals recognize each other during vocal-social interactions. In humans, fundamental frequency, formant dispersion and harmonics-to-noise ratio serve as characteristics along which speakers can be reliably separated. The ability to infer a speaker's identity is also adaptive for members of other species (like companion animals) for whom humans (as owners) are relevant. The acoustic bases of speaker recognition in non-humans are unknown. Here, we tested whether dogs can recognize their owner's voice and whether they rely on the same acoustic parameters for such recognition as humans use to discriminate speakers. Stimuli were pre-recorded sentences spoken by the owner and control persons, played through loudspeakers placed behind two non-transparent screens (with each screen hiding a person). We investigated the association between acoustic distance of speakers (examined along several dimensions relevant in intraspecific voice identification) and dogs' behavior. Dogs chose their owner's voice more often than that of control persons', suggesting that they can identify it. Choosing success and time spent looking in the direction of the owner's voice were positively associated, showing that looking time is an index of the ease of choice. Acoustic distance of speakers in mean fundamental frequency and jitter were positively associated with looking time, indicating that the shorter the acoustic distance between speakers with regard to these parameters, the harder the decision. So, dogs use these cues to discriminate their owner's voice from unfamiliar voices. These findings reveal that dogs use some but probably not all acoustic parameters that humans use to identify speakers. Although dogs can detect fine changes in speech, their perceptual system may not be fully attuned to identity-diagnostic cues in the human voice.

RevDate: 2022-02-07

Rishiq D, Harkrider AW, Springer C, et al (2022)

Effects of Spectral Shaping on Speech Auditory Brainstem Responses to Stop Consonant-Vowel Syllables.

Journal of the American Academy of Audiology [Epub ahead of print].

BACKGROUND: Spectral shaping is employed by hearing aids to make consonantal information, such as formant transitions, audible for listeners with hearing loss. How manipulations of the stimuli, such as spectral shaping, may alter encoding in the auditory brainstem has not been thoroughly studied.

PURPOSE: To determine how spectral shaping of synthetic consonant-vowel (CV) syllables, varying in their second formant (F2) onset frequency, may affect encoding of the syllables in the auditory brainstem.

RESEARCH DESIGN: We employed a repeated measure design.

STUDY SAMPLE: Sixteen young adults (mean = 20.94 years, 6 males) and 11 older adults (mean = 58.60 years, 4 males) participated in this study.

DATA COLLECTION AND ANALYSIS: Speech-evoked auditory brainstem responses (speech-ABRs) were obtained from each participant using three CV exemplars selected from synthetic stimuli generated for a /ba-da-ga/ continuum. Brainstem responses were also recorded to corresponding three CV exemplars that were spectrally shaped to decrease low-frequency information and provide gain for middle and high frequencies according to a Desired Sensation Level function. In total, six grand average waveforms [3 phonemes (/ba/, /da/, /ga/) X 2 shaping conditions (unshaped, shaped)] were produced for each participant. Peak latencies and amplitudes, referenced to pre-stimulus baseline, were identified for 15 speech-ABR peaks. Peaks were marked manually using the program cursor on each individual waveform. Repeated-measures ANOVAs were used to determine the effects of shaping on the latencies and amplitudes of the speech-ABR peaks.

RESULTS: Shaping effects produced changes within participants in ABR latencies and amplitudes involving onset and major peaks of the speech-ABR waveform for certain phonemes. Specifically, data from onset peaks showed that shaping decreased latency for /ga/ in older listeners, and decreased amplitude onset for /ba/ in younger listeners. Shaping also increased the amplitudes of major peaks for /ga/ stimuli in both groups.

CONCLUSIONS: Encoding of speech in the ABR waveform may be more complex and multidimensional than a simple demarcation of source and filter information, and may also be influenced by cue intensity and age. These results suggest a more complex subcortical encoding of vocal tract filter information in the ABR waveform, which may also be influenced by cue intensity and age.

RevDate: 2022-02-04

Easwar V, Boothalingam S, E Wilson (2022)

Sensitivity of Vowel-Evoked Envelope Following Responses to Spectra and Level of Preceding Phoneme Context.

Ear and hearing pii:00003446-900000000-98357 [Epub ahead of print].

OBJECTIVE: Vowel-evoked envelope following responses (EFRs) could be a useful noninvasive tool for evaluating neural activity phase-locked to the fundamental frequency of voice (f0). Vowel-evoked EFRs are often elicited by vowels in consonant-vowel syllables or words. Considering neural activity is susceptible to temporal masking, EFR characteristics elicited by the same vowel may vary with the features of the preceding phoneme. To this end, the objective of the present study was to evaluate the influence of the spectral and level characteristics of the preceding phoneme context on vowel-evoked EFRs.

DESIGN: EFRs were elicited by a male-spoken /i/ (stimulus; duration = 350 msec), modified to elicit two EFRs, one from the region of the first formant (F1) and one from the second and higher formants (F2+). The stimulus, presented at 65 dB SPL, was preceded by one of the four contexts: /∫/, /m/, /i/ or a silent gap of duration equal to that of the stimulus. The level of the context phonemes was either 50 or 80 dB SPL, 15 dB lower and higher than the level of the stimulus /i/. In a control condition, EFRs to the stimulus /i/ were elicited in isolation without any preceding phoneme contexts. The stimulus and the contexts were presented monaurally to a randomly chosen test ear in 21 young adults with normal hearing. EFRs were recorded using single-channel electroencephalogram between the vertex and the nape.

RESULTS: A repeated measures analysis of variance indicated a significant three-way interaction between context type (/∫/, /i/, /m/, silent gap), level (50, 80 dB SPL), and EFR-eliciting formant (F1, F2+). Post hoc analyses indicated no influence of the preceding phoneme context on F1-elicited EFRs. Relative to a silent gap as the preceding context, F2+-elicited EFRs were attenuated by /∫/ and /m/ presented at 50 and 80 dB SPL, as well as by /i/ presented at 80 dB SPL. The average attenuation ranged from 14.9 to 27.9 nV. When the context phonemes were presented at matched levels of 50 or 80 dB SPL, F2+-elicited EFRs were most often attenuated when preceded by /∫/. At 80 dB SPL, relative to the silent preceding gap, the average attenuation was 15.7 nV, and at 50 dB SPL, relative to the preceding context phoneme /i/, the average attenuation was 17.2 nV.

CONCLUSION: EFRs elicited by the second and higher formants of /i/ are sensitive to the spectral and level characteristics of the preceding phoneme context. Such sensitivity, measured as an attenuation in the present study, may influence the comparison of EFRs elicited by the same vowel in different consonant-vowel syllables or words. However, the degree of attenuation with realistic context levels exceeded the minimum measurable change only 12% of the time. Although the impact of the preceding context is statistically significant, it is likely to be clinically insignificant a majority of the time.

RevDate: 2022-02-03

Chiu C, Weng Y, BW Chen (2021)

Tongue Postures and Tongue Centers: A Study of Acoustic-Articulatory Correspondences Across Different Head Angles.

Frontiers in psychology, 12:768754.

Recent research on body and head positions has shown that postural changes may induce varying degrees of changes on acoustic speech signals and articulatory gestures. While the preservation of formant profiles across different postures is suitably accounted for by the two-tube model and perturbation theory, it remains unclear whether it is resulted from the accommodation of tongue postures. Specifically, whether the tongue accommodates the changes in head angle to maintain the target acoustics is yet to be determined. The present study examines vowel acoustics and their correspondence with the articulatory maneuvers of the tongue, including both tongue postures and movements of the tongue center, across different head angles. The results show that vowel acoustics, including pitch and formants, are largely unaffected by upward or downward tilting of the head. These preserved acoustics may be attributed to the lingual gestures that compensate for the effects of gravity. Our results also reveal that the tongue postures in response to head movements appear to be vowel-dependent, and the tongue center may serve as an underlying drive that covariates with the head angle changes. These results imply a close relationship between vowel acoustics and tongue postures as well as a target-oriented strategy for different head angles.

RevDate: 2022-02-02

Merritt B, T Bent (2022)

Revisiting the acoustics of speaker gender perception: A gender expansive perspective.

The Journal of the Acoustical Society of America, 151(1):484.

Examinations of speaker gender perception have primarily focused on the roles of fundamental frequency (fo) and formant frequencies from structured speech tasks using cisgender speakers. Yet, there is evidence to suggest that fo and formants do not fully account for listeners' perceptual judgements of gender, particularly from connected speech. This study investigated the perceptual importance of fo, formant frequencies, articulation, and intonation in listeners' judgements of gender identity and masculinity/femininity from spontaneous speech from cisgender male and female speakers as well as transfeminine and transmasculine speakers. Stimuli were spontaneous speech samples from 12 speakers who are cisgender (6 female and 6 male) and 12 speakers who are transgender (6 transfeminine and 6 transmasculine). Listeners performed a two-alternative forced choice (2AFC) gender identification task and masculinity/femininity rating task in two experiments that manipulated which acoustic cues were available. Experiment 1 confirmed that fo and formant frequency manipulations were insufficient to alter listener judgements across all speakers. Experiment 2 demonstrated that articulatory cues had greater weighting than intonation cues on the listeners' judgements when the fo and formant frequencies were in a gender ambiguous range. These findings counter the assumptions that fo and formant manipulations are sufficient to effectively alter perceived speaker gender.

RevDate: 2022-02-01

Kim Y, Chung H, A Thompson (2022)

Acoustic and Articulatory Characteristics of English Semivowels /ɹ, l, w/ Produced by Adult Second-Language Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study presents the results of acoustic and kinematic analyses of word-initial semivowels (/ɹ, l, w/) produced by second-language (L2) speakers of English whose native language is Korean. In addition, the relationship of acoustic and kinematic measures to the ratings of foreign accent was examined by correlation analyses.

METHOD: Eleven L2 speakers and 10 native speakers (first language [L1]) of English read The Caterpillar passage. Acoustic and kinematic data were simultaneously recorded using an electromagnetic articulography system. In addition to speaking rate, two acoustic measures (ratio of third-formant [F3] frequency to second-formant [F2] frequency and duration of steady states of F2) and two kinematic measures (lip aperture and duration of lingual maximum hold) were obtained from individual target sounds. To examine the degree of contrast among the three sounds, acoustic and kinematic Euclidean distances were computed on the F2-F3 and x-y planes, respectively.

RESULTS: Compared with L1 speakers, L2 speakers exhibited a significantly slower speaking rate. For the three semivowels, L2 speakers showed a reduced F3/F2 ratio during constriction, increased lip aperture, and reduced acoustic Euclidean distances among semivowels. Additionally, perceptual ratings of foreign accent were significantly correlated with three measures: duration of steady F2, acoustic Euclidean distance, and kinematic Euclidean distance.

CONCLUSIONS: The findings provide acoustic and kinematic evidence for challenges that L2 speakers experience in the production of English semivowels, especially /ɹ/ and /w/. The robust and consistent finding of reduced contrasts among semivowels and their correlations with perceptual accent ratings suggests using sound contrasts as a potentially effective approach to accent modification paradigms.

RevDate: 2022-01-30

Takemoto N, Sanuki T, Esaki S, et al (2022)

Rabbit model with vocal fold hyperadduction.

Auris, nasus, larynx pii:S0385-8146(22)00026-8 [Epub ahead of print].

OBJECTIVE: Adductor spasmodic dysphonia (AdSD) is caused by hyperadduction of the vocal folds during phonation, resulting in a strained voice. Animal models are not yet used to elucidate this intractable disease because AdSD has a difficult pathology without a definitive origin. For the first step, we established an animal model with vocal fold hyperadduction and evaluated its validity by assessing laryngeal function.

METHODS: In this experimental animal study, three adult Japanese 20-week-old rabbits were used. The models were created using a combination of cricothyroid approximation, forced airflow, and electrical stimulation of the recurrent laryngeal nerves (RLNs). Cricothyroid approximation was added to produce a glottal slit. Thereafter, both RLNs were electrically stimulated to induce vocal fold hyperadduction. Finally, the left RLN was transected to relieve hyperadduction. The sound, endoscopic images, and subglottal pressure were recorded, and acoustic analysis was performed.

RESULTS: Subglottal pressure increased significantly, and the strained sound was produced after the electrical stimulation of the RLNs. After transecting the left RLN, the subglottal pressure decreased significantly, and the strained sound decreased. Acoustic analysis revealed an elevation of the standard deviation of F0 (SDF0) and degree of voice breaks (DVB) through stimulation of the RLNs, and degradation of SDF0 and DVB through RLN transection. Formant bands in the sound spectrogram were interrupted by the stimulation and appeared again after the RLN section.

CONCLUSION: This study developed a rabbit model with vocal fold hyperadduction . The subglottal pressure and acoustic analysis of this model resembled the characteristics of patients with AdSD. This model could be helpful to elucidate the pathology of the larynx caused by hyperadduction, and evaluate and compare the treatments for strained phonation.

RevDate: 2022-01-28

Heeringa AN, C Köppl (2022)

Auditory nerve fiber discrimination and representation of naturally-spoken vowels in noise.

eNeuro pii:ENEURO.0474-21.2021 [Epub ahead of print].

To understand how vowels are encoded by auditory nerve fibers, a number of representation schemes have been suggested that extract the vowel's formant frequencies from auditory nerve-fiber spiking patterns. The current study aims to apply and compare these schemes for auditory nerve-fiber responses to naturally-spoken vowels in a speech-shaped background noise. Responses to three vowels were evaluated; based on behavioral experiments in the same species, two of these were perceptually difficult to discriminate from each other (/e/vs/i/) and one was perceptually easy to discriminate from the other two (/a:/).Single-unit auditory nerve fibers were recorded from ketamine/xylazine-anesthetized Mongolian gerbils of either sex (n = 8). First, single-unit discrimination between the three vowels was studied. Compared to the perceptually easy discriminations, the average spike timing-based discrimination values were significantly lower for the perceptually difficult vowel discrimination. This was not true for an average rate-based discrimination metric, the rate d-prime. Consistently, spike timing-based representation schemes, plotting the temporal responses of all recorded units as a function of their best frequency, i.e. dominant component schemes, average localized interval rate, and fluctuation profiles, revealed representation of the vowel's formant frequencies, whereas no such representation was apparent in the rate-based excitation pattern.Making use of perceptual discrimination data, this study reveals that discrimination difficulties of naturally-spoken vowels in speech-shaped noise originate peripherally and can be studied in the spike timing patterns of single auditory nerve fibers.Significance statementUnderstanding speech in noisy environments is an everyday challenge. This study investigates how single auditory nerve fibers, recorded in the Mongolian gerbil, discriminate and represent naturally-spoken vowels in a noisy background approximating real-life situations. Neural discrimination metrics were compared to the known behavioral performance by the same species, comparing easy to difficult vowel discriminations. A spike-timing-based discrimination metric agreed well with perceptual performance, while mean discharge rate was a poor predictor. Furthermore, only spike-timing-based, but not the rate-based, representation schemes revealed peaks at the formant frequencies, which are paramount for perceptual vowel identification and discrimination. This study reveals that vowel discrimination difficulties in noise originate peripherally and can be studied in the spike-timing patterns of single auditory nerve fibers.

RevDate: 2022-01-25

Yüksel M (2022)

Reliability and Efficiency of Pitch-Shifting Plug-Ins in Voice and Hearing Research.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Auditory feedback perturbation with voice pitch manipulation has been widely used in previous studies. There are several hardware and software tools for such manipulations, but audio plug-ins developed for music, movies, and radio applications that operate in digital audio workstations may be extremely beneficial and are easy to use, accessible, and cost effective. However, it is unknown whether these plug-ins can perform similarly to tools that have been described in previous literature. Hence, this study aimed to evaluate the reliability and efficiency of these plug-ins.

METHOD: Six different plug-ins were used at +1 and -1 st pitch shifting with formant correction on and off to pitch shift the sustained /ɑ/ voice recording sample of 12 healthy participants (six cisgender males and six cisgender females). Pitch-shifting accuracy, formant shifting amount, intensity changes, and total latency values were reported.

RESULTS: Some variability was observed between different plug-ins and pitch shift settings. One plug-in managed to perform similarly in all four measured aspects with well-known hardware and software units with 1-cent pitch-shifting accuracy, low latency values, negligible intensity difference, and preserved formants. Other plug-ins performed similarly in some respects.

CONCLUSIONS: Audio plug-ins may be used effectively in pitch-shifting applications. Researchers and clinicians can access these plug-ins easily and test whether the features also fit their aims.

RevDate: 2022-01-21

Suess N, Hauswald A, Reisinger P, et al (2022)

Cortical Tracking of Formant Modulations Derived from Silently Presented Lip Movements and Its Decline with Age.

Cerebral cortex (New York, N.Y. : 1991) pii:6513733 [Epub ahead of print].

The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.

RevDate: 2022-01-17

Almaghrabi SA, Thewlis D, Thwaites S, et al (2022)

The reproducibility of bio-acoustic features is associated with sample duration, speech task and gender.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society, PP: [Epub ahead of print].

Bio-acoustic properties of speech show evolving value in analyzing psychiatric illnesses. Obtaining a sufficient speech sample length to quantify these properties is essential, but the impact of sample duration on the stability of bio-acoustic features has not been systematically explored. We aimed to evaluate bio-acoustic features' reproducibility against changes in speech durations and tasks. We extracted source, spectral, formant, and prosodic features in 185 English-speaking adults (98 w, 87 m) for reading-a-story and counting tasks. We compared features at 25% of the total sample duration of the reading task to those obtained from non-overlapping randomly selected sub-samples shortened to 75%, 50%, and 25% of total duration using intraclass correlation coefficients. We also compared the features extracted from entire recordings to those measured at 25% of the duration and features obtained from 50% of the duration. Further, we compared features extracted from reading-a-story to counting tasks. Our results show that the number of reproducible features (out of 125) decreased stepwise with duration reduction. Spectral shape, pitch, and formants reached excellent reproducibility. Mel-frequency cepstral coefficients (MFCCs), loudness, and zero-crossing rate achieved excellent reproducibility only at a longer duration. Reproducibility of source, MFCC derivatives, and voicing probability (VP) was poor. Significant gender differences existed in jitter, MFCC first-derivative, spectral skewness, pitch, VP, and formants. Around 97% of features in both genders were not reproducible across speech tasks, in part due to the short counting task duration. In conclusion, bio-acoustic features are less reproducible in shorter samples and are affected by gender.

RevDate: 2022-01-10

Gaines JL, Kim KS, Parrell B, et al (2021)

Discrete constriction locations describe a comprehensive range of vocal tract shapes in the Maeda model.

JASA express letters, 1(12):124402.

The Maeda model was used to generate a large set of vocoid-producing vocal tract configurations. The resulting dataset (a) produced a comprehensive range of formant frequencies and (b) displayed discrete tongue body constriction locations (palatal, velar/uvular, and lower pharyngeal). The discrete parameterization of constriction location across the vowel space suggests this is likely a fundamental characteristic of the human vocal tract, and not limited to any specific set of vowel contrasts. These findings suggest that in addition to established articulatory-acoustic constraints, fundamental biomechanical constraints of the vocal tract may also explain such discreteness.

RevDate: 2022-01-06

Cheng FY, Xu C, Gold L, et al (2021)

Rapid Enhancement of Subcortical Neural Responses to Sine-Wave Speech.

Frontiers in neuroscience, 15:747303.

The efferent auditory nervous system may be a potent force in shaping how the brain responds to behaviorally significant sounds. Previous human experiments using the frequency following response (FFR) have shown efferent-induced modulation of subcortical auditory function online and over short- and long-term time scales; however, a contemporary understanding of FFR generation presents new questions about whether previous effects were constrained solely to the auditory subcortex. The present experiment used sine-wave speech (SWS), an acoustically-sparse stimulus in which dynamic pure tones represent speech formant contours, to evoke FFRSWS. Due to the higher stimulus frequencies used in SWS, this approach biased neural responses toward brainstem generators and allowed for three stimuli (/bɔ/, /bu/, and /bo/) to be used to evoke FFRSWS before and after listeners in a training group were made aware that they were hearing a degraded speech stimulus. All SWS stimuli were rapidly perceived as speech when presented with a SWS carrier phrase, and average token identification reached ceiling performance during a perceptual training phase. Compared to a control group which remained naïve throughout the experiment, training group FFRSWS amplitudes were enhanced post-training for each stimulus. Further, linear support vector machine classification of training group FFRSWS significantly improved post-training compared to the control group, indicating that training-induced neural enhancements were sufficient to bolster machine learning classification accuracy. These results suggest that the efferent auditory system may rapidly modulate auditory brainstem representation of sounds depending on their context and perception as non-speech or speech.

RevDate: 2022-01-04

Meykadeh A, Golfam A, Nasrabadi AM, et al (2021)

First Event-Related Potentials Evidence of Auditory Morphosyntactic Processing in a Subject-Object-Verb Nominative-Accusative Language (Farsi).

Frontiers in psychology, 12:698165.

While most studies on neural signals of online language processing have focused on a few-usually western-subject-verb-object (SVO) languages, corresponding knowledge on subject-object-verb (SOV) languages is scarce. Here we studied Farsi, a language with canonical SOV word order. Because we were interested in the consequences of second-language acquisition, we compared monolingual native Farsi speakers and equally proficient bilinguals who had learned Farsi only after entering primary school. We analyzed event-related potentials (ERPs) to correct and morphosyntactically incorrect sentence-final syllables in a sentence correctness judgment task. Incorrect syllables elicited a late posterior positivity at 500-700 ms after the final syllable, resembling the P600 component, as previously observed for syntactic violations at sentence-middle positions in SVO languages. There was no sign of a left anterior negativity (LAN) preceding the P600. Additionally, we provide evidence for a real-time discrimination of phonological categories associated with morphosyntactic manipulations (between 35 and 135 ms), manifesting the instantaneous neural response to unexpected perturbations. The L2 Farsi speakers were indistinguishable from L1 speakers in terms of performance and neural signals of syntactic violations, indicating that exposure to a second language at school entry may results in native-like performance and neural correlates. In nonnative (but not native) speakers verbal working memory capacity correlated with the late posterior positivity and performance accuracy. Hence, this first ERP study of morphosyntactic violations in a spoken SOV nominative-accusative language demonstrates ERP effects in response to morphosyntactic violations and the involvement of executive functions in non-native speakers in computations of subject-verb agreement.

RevDate: 2021-12-30

Yamada Y, Shinkawa K, Nemoto M, et al (2021)

Automatic Assessment of Loneliness in Older Adults Using Speech Analysis on Responses to Daily Life Questions.

Frontiers in psychiatry, 12:712251.

Loneliness is a perceived state of social and emotional isolation that has been associated with a wide range of adverse health effects in older adults. Automatically assessing loneliness by passively monitoring daily behaviors could potentially contribute to early detection and intervention for mitigating loneliness. Speech data has been successfully used for inferring changes in emotional states and mental health conditions, but its association with loneliness in older adults remains unexplored. In this study, we developed a tablet-based application and collected speech responses of 57 older adults to daily life questions regarding, for example, one's feelings and future travel plans. From audio data of these speech responses, we automatically extracted speech features characterizing acoustic, prosodic, and linguistic aspects, and investigated their associations with self-rated scores of the UCLA Loneliness Scale. Consequently, we found that with increasing loneliness scores, speech responses tended to have less inflections, longer pauses, reduced second formant frequencies, reduced variances of the speech spectrum, more filler words, and fewer positive words. The cross-validation results showed that regression and binary-classification models using speech features could estimate loneliness scores with an R 2 of 0.57 and detect individuals with high loneliness scores with 95.6% accuracy, respectively. Our study provides the first empirical results suggesting the possibility of using speech data that can be collected in everyday life for the automatic assessments of loneliness in older adults, which could help develop monitoring technologies for early detection and intervention for mitigating loneliness.

RevDate: 2021-12-20

Zheng Z, Li K, Feng G, et al (2021)

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition.

Frontiers in neuroscience, 15:744959.

Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43-84.82%, 76.27-95.24%, and 96.58%, respectively; for consonant recognition 35.49-63.77%, 67.75-78.87%, and 87.87%; for lexical tone recognition 60.80-97.15%, 73.16-96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Conclusion: Regions that contributed most for vowel recognition was Region 2 (502-1,022 Hz) that contains first formant (F1) information; Region 5 (3,856-7,562 Hz) contributed most to consonant recognition; Region 1 (80-502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.

RevDate: 2021-12-13

Cap H, Deleporte P, Joachim J, et al (2008)

Male vocal behavior and phylogeny in deer.

Cladistics : the international journal of the Willi Hennig Society, 24(6):917-931.

The phylogenetic relationships among 11 species of the Cervidae family were inferred from an analysis of male vocalizations. Eighteen characters, including call types (e.g. antipredator barks, mating loudcalls) and acoustic characteristics (call composition, fundamental frequency and formant frequencies), were used for phylogeny inference. The resulting topology and the phylogenetic consistency of behavioral characters were compared with those of current molecular phylogenies of Cervidae and with separate and simultaneous parsimony analyses of molecular and behavioral data. Our results indicate that male vocalizations constitute plausible phylogenetic characters in this taxon. Evolutionary scenarios for the vocal characters are discussed in relation with associated behaviors.

RevDate: 2021-12-03

Sundberg J, Lindblom B, AM Hefele (2021)

Voice source, formant frequencies and vocal tract shape in overtone singing. A case study.

Logopedics, phoniatrics, vocology [Epub ahead of print].

Purpose: In overtone singing a singer produces two pitches simultaneously, a low-pitched, continuous drone plus a melody played on the higher, flutelike and strongly enhanced overtones of the drone. The purpose of this study was to analyse underlying acoustical, phonatory and articulatory phenomena.Methods: The voice source was analyzed by inverse filtering the sound, the articulation from a dynamic MRI video of the vocal tract profile, and the lip opening from a frontal-view video recording. Vocal tract cross-distances were measured in the MR recording and converted to area functions, the formant frequencies of which computed.Results: Inverse filtering revealed that the overtone enhancement resulted from a close clustering of formants 2 and 3. The MRI material showed that for low enhanced overtone frequencies (FE) the tongue tip was raised and strongly retracted, while for high FE the tongue tip was less retracted but forming a longer constriction. Thus, the tongue configuration changed from an apical/anterior to a dorsal/posterior articulation. The formant frequencies derived from the area functions matched almost perfectly those used for the inverse filtering. Further, analyses of the area functions revealed that the second formant frequency was strongly dependent on the back cavity, and the third on the front cavity, which acted like a Helmholtz resonator, tuned by the tongue tip position and lip opening.Conclusions: This type of overtone singing can be fully explained by the well-established source-filter theory of voice production, as recently found by Bergevin et al. [1] for another type of overtone singing.

RevDate: 2021-12-02

Roberts B, Summers RJ, PJ Bailey (2021)

Mandatory dichotic integration of second-formant information: Contralateral sine bleats have predictable effects on consonant place judgments.

The Journal of the Acoustical Society of America, 150(5):3693.

Speech-on-speech informational masking arises because the interferer disrupts target processing (e.g., capacity limitations) or corrupts it (e.g., intrusions into the target percept); the latter should produce predictable errors. Listeners identified the consonant in monaural buzz-excited three-formant analogues of approximant-vowel syllables, forming a place of articulation series (/w/-/l/-/j/). There were two 11-member series; the vowel was either high-front or low-back. Series members shared formant-amplitude contours, fundamental frequency, and F1+F3 frequency contours; they were distinguished solely by the F2 frequency contour before the steady portion. Targets were always presented in the left ear. For each series, F2 frequency and amplitude contours were also used to generate interferers with altered source properties-sine-wave analogues of F2 (sine bleats) matched to their buzz-excited counterparts. Accompanying each series member with a fixed mismatched sine bleat in the contralateral ear produced systematic and predictable effects on category judgments; these effects were usually largest for bleats involving the fastest rate or greatest extent of frequency change. Judgments of isolated sine bleats using the three place labels were often unsystematic or arbitrary. These results indicate that informational masking by interferers involved corruption of target processing as a result of mandatory dichotic integration of F2 information, despite the grouping cues disfavoring this integration.

RevDate: 2021-12-02

Lodermeyer A, Bagheri E, Kniesburges S, et al (2021)

The mechanisms of harmonic sound generation during phonation: A multi-modal measurement-based approach.

The Journal of the Acoustical Society of America, 150(5):3485.

Sound generation during voiced speech remains an open research topic because the underlying process within the human larynx is hardly accessible for direct measurements. In the present study, harmonic sound generation during phonation was investigated with a model that replicates the fully coupled fluid-structure-acoustic interaction (FSAI). The FSAI was captured using a multi-modal approach by measuring the flow and acoustic source fields based on particle image velocimetry, as well as the surface velocity of the vocal folds based on laser vibrometry and high-speed imaging. Strong harmonic sources were localized near the glottis, as well as further downstream, during the presence of the supraglottal jet. The strongest harmonic content of the vocal fold surface motion was verified for the area near the glottis, which directly interacts with the glottal jet flow. Also, the acoustic back-coupling of the formant frequencies onto the harmonic oscillation of the vocal folds was verified. These findings verify that harmonic sound generation is the result of a strong interrelation between the vocal fold motion, modulated flow field, and vocal tract geometry.

RevDate: 2021-12-02

Barreda S, PF Assmann (2021)

Perception of gender in children's voices.

The Journal of the Acoustical Society of America, 150(5):3949.

To investigate the perception of gender from children's voices, adult listeners were presented with /hVd/ syllables, in isolation and in sentence context, produced by children between 5 and 18 years. Half the listeners were informed of the age of the talker during trials, while the other half were not. Correct gender identifications increased with talker age; however, performance was above chance even for age groups where the cues most often associated with gender differentiation (i.e., average fundamental frequency and formant frequencies) were not consistently different between boys and girls. The results of acoustic models suggest that cues were used in an age-dependent manner, whether listeners were explicitly told the age of the talker or not. Overall, results are consistent with the hypothesis that talker age and gender are estimated jointly in the process of speech perception. Furthermore, results show that the gender of individual talkers can be identified accurately well before reliable anatomical differences arise in the vocal tracts of females and males. In general, results support the notion that the transmission of gender information from voice depends substantially on gender-dependent patterns of articulation, rather than following deterministically from anatomical differences between male and female talkers.

RevDate: 2021-11-27

Hedwig D, Poole J, P Granli (2021)

Does Social Complexity Drive Vocal Complexity? Insights from the Two African Elephant Species.

Animals : an open access journal from MDPI, 11(11): pii:ani11113071.

The social complexity hypothesis (SCH) for communication states that the range and frequency of social interactions drive the evolution of complex communication systems. Surprisingly, few studies have empirically tested the SHC for vocal communication systems. Filling this gap is important because a co-evolutionary runaway process between social and vocal complexity may have shaped the most intricate communication system, human language. We here propose the African elephant Loxodonta spec. as an excellent study system to investigate the relationships between social and vocal complexity. We review how the distinct differences in social complexity between the two species of African elephants, the forest elephant L. cyclotis and the savanna elephant L. africana, relate to repertoire size and structure, as well as complex communication skills in the two species, such as call combination or intentional formant modulation including the trunk. Our findings suggest that Loxodonta may contradict the SCH, as well as other factors put forth to explain patterns of vocal complexity across species. We propose that life history traits, a factor that has gained little attention as a driver of vocal complexity, and the extensive parental care associated with a uniquely low and slow reproductive rate, may have led to the emergence of pronounced vocal complexity in the forest elephant despite their less complex social system compared to the savanna elephant. Conclusions must be drawn cautiously, however. A better understanding of vocal complexity in the genus Loxodonta will depend on continuing advancements in remote data collection technologies to overcome the challenges of observing forest elephants in their dense rainforest habitat, as well as the availability of directly comparable data and methods, quantifying both structural and contextual variability in the production of rumbles and other vocalizations in both species of African elephants.

RevDate: 2021-11-23

Du X, Zhang X, Wang Y, et al (2021)

Highly sensitive detection of plant growth regulators by using terahertz time-domain spectroscopy combined with metamaterials.

Optics express, 29(22):36535-36545.

The rapid and sensitive detection of plant-growth-regulator (PGR) residue is essential for ensuring food safety for consumers. However, there are many disadvantages in current approaches to detecting PGR residue. In this paper, we demonstrate a highly sensitive PGR detection method by using terahertz time-domain spectroscopy combined with metamaterials. We propose a double formant metamaterial resonator based on a split-ring structure with titanium-gold nanostructure. The metamaterial resonator is a split-ring structure composed of a titanium-gold nanostructure based on polyimide film as the substrate. Also, terahertz spectral response and electric field distribution of metamaterials under different analyte thickness and refractive index were investigated. The simulation results showed that the theoretical sensitivity of resonance peak 1 and peak 2 of the refractive index sensor based on our designed metamaterial resonator approaches 780 and 720 gigahertz per refractive index unit (GHz/RIU), respectively. In experiments, a rapid solution analysis platform based on the double formant metamaterial resonator was set up and PGR residues in aqueous solution were directly and rapidly detected through terahertz time-domain spectroscopy. The results showed that metamaterials can successfully detect butylhydrazine and N-N diglycine at a concentration as low as 0.05 mg/L. This study paves a new way for sensitive, rapid, low-cost detection of PGRs. It also means that the double formant metamaterial resonator has significant potential for other applications in terahertz sensing.

RevDate: 2021-11-22

Li P, Ross CF, ZX Luo (2021)

Morphological disparity and evolutionary transformations in the primate hyoid apparatus.

Journal of human evolution, 162:103094 pii:S0047-2484(21)00146-9 [Epub ahead of print].

The hyoid apparatus plays an integral role in swallowing, respiration, and vocalization in mammals. Most placental mammals have a rod-shaped basihyal connected to the basicranium via both soft tissues and a mobile bony chain-the anterior cornu-whereas anthropoid primates have broad, shield-like or even cup-shaped basihyals suspended from the basicranium by soft tissues only. How the unique anthropoid hyoid morphology evolved is unknown, and hyoid morphology of nonanthropoid primates is poorly documented. Here we use phylogenetic comparative methods and linear morphometrics to address knowledge gaps in hyoid evolution among primates and their euarchontan outgroups. We find that dermopterans have variable reduction of cornu elements. Cynocephalus volans are sexually dimorphic in hyoid morphology. Tupaia and all lemuroids except Daubentonia have a fully ossified anterior cornu connecting a rod-shaped basihyal to the basicranium; this is the ancestral mammalian pattern that is also characteristic of the last common ancestor of Primates. Haplorhines exhibit a reduced anterior cornu, and anthropoids underwent further increase in basihyal aspect ratio values and in relative basihyal volume. Convergent with haplorhines, lorisoid strepsirrhines independently evolved a broad basihyal and reduced anterior cornua. While a reduced anterior cornu is hypothesized to facilitate vocal tract lengthening and lower formant frequencies in some mammals, our results suggest vocalization adaptations alone are unlikely to drive the iterative reduction of anterior cornua within Primates. Our new data on euarchontan hyoid evolution provide an anatomical basis for further exploring the form-function relationships of the hyoid across different behaviors, including vocalization, chewing, and swallowing.

RevDate: 2021-11-20

Xu L, Luo J, Xie D, et al (2021)

Reverberation Degrades Pitch Perception but Not Mandarin Tone and Vowel Recognition of Cochlear Implant Users.

Ear and hearing pii:00003446-900000000-98400 [Epub ahead of print].

OBJECTIVES: The primary goal of this study was to investigate the effects of reverberation on Mandarin tone and vowel recognition of cochlear implant (CI) users and normal-hearing (NH) listeners. To understand the performance of Mandarin tone recognition, this study also measured participants' pitch perception and the availability of temporal envelope cues in reverberation.

DESIGN: Fifteen CI users and nine NH listeners, all Mandarin speakers, were asked to recognize Mandarin single-vowels produced in four lexical tones and rank harmonic complex tones in pitch with different reverberation times (RTs) from 0 to 1 second. Virtual acoustic techniques were used to simulate rooms with different degrees of reverberation. Vowel duration and correlation between amplitude envelope and fundamental frequency (F0) contour were analyzed for different tones as a function of the RT.

RESULTS: Vowel durations of different tones significantly increased with longer RTs. Amplitude-F0 correlation remained similar for the falling Tone 4 but greatly decreased for the other tones in reverberation. NH listeners had robust pitch-ranking, tone recognition, and vowel recognition performance as the RT increased. Reverberation significantly degraded CI users' pitch-ranking thresholds but did not significantly affect the overall scores of tone and vowel recognition with CIs. Detailed analyses of tone confusion matrices showed that CI users reduced the flat Tone-1 responses but increased the falling Tone-4 responses in reverberation, possibly due to the falling amplitude envelope of late reflections after the original vowel segment. CI users' tone recognition scores were not correlated with their pitch-ranking thresholds.

CONCLUSIONS: NH listeners can reliably recognize Mandarin tones in reverberation using salient pitch cues from spectral and temporal fine structures. However, CI users have poorer pitch perception using F0-related amplitude modulations that are reduced in reverberation. Reverberation distorts speech amplitude envelopes, which affect the distribution of tone responses but not the accuracy of tone recognition with CIs. Recognition of vowels with stationary formant trajectories is not affected by reverberation for both NH listeners and CI users, regardless of the available spectral resolution. Future studies should test how the relatively stable vowel and tone recognition may contribute to sentence recognition in reverberation of Mandarin-speaking CI users.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )