About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

28 Sep 2021 at 01:44
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 28 Sep 2021 at 01:44 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2021-09-20

Wang Y, Qiu X, Wang F, et al (2021)

Single-crystal ordered macroporous metal-organic framework as support for molecularly imprinted polymers and their integration in membrane formant for the specific recognition of zearalenone.

Journal of separation science [Epub ahead of print].

Zearalenone is a fungal contaminant that is widely present in grains. Here, a novel molecularly imprinted membrane based on SOM-ZIF-8 was developed for the rapid and highly selective identification of zearalenone in grain samples. The molecularly imprinted membrane was prepared using polyvinylidene fluoride, cyclododecyl 2,4-dihydroxybenzoate as a template and SOM-ZIF-8 as a carrier. The factors influencing the extraction of zearalenone using this membrane, including the solution pH, extraction time, elution solvent, elution time and elution volume were studied in detail. The optimized conditions were 5 mL of sample solution at pH 6, extraction time of 45 min, 4 mL of acetonitrile:methanol=9:1 as elution solvent, and elution time of 20 min. This method displayed a good linear range of 12∼120 ng·g-1 (R2 =0.998) with the limits of detection and quantification of this method are 1.7 ng·g-1 and 5.5 ng·g-1 , respectively. In addition, the membrane was used to selectively identify zearalenone in grain samples with percent recoveries ranging from 87.9% to 101.0% and relative standard deviation of less than 6.6 %. Overall, this study presents a simple and effective chromatographic pretreatment method for detecting zearalenone in food samples. This article is protected by copyright. All rights reserved.

RevDate: 2021-09-20

Erdur OE, BS Yilmaz (2021)

Voice changes after surgically assisted rapid maxillary expansion.

American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics pii:S0889-5406(21)00563-1 [Epub ahead of print].

INTRODUCTION: This study aimed to investigate voice changes in patients who had surgically assisted rapid maxillary expansion (SARME).

METHODS: Nineteen adult patients with maxillary transverse deficiency were asked to pronounce the sounds "[a], [ϵ], [ɯ], [i], [ɔ], [œ] [u], [y]" for 3 seconds. Voice records were taken before the expansion appliance was placed (T0) and 5.8 weeks after removal (T1, after 5.2 months of retention). The same records were taken for the control group (n = 19). The formant frequencies (F0, F1, F2, and F3), shimmer, jitter, and noise-to-harmonics ratio (NHR) parameters were considered with Praat (version 6.0.43).

RESULTS: In the SARME group, significant differences were observed in the F1 of [a] (P = 0.005), F2 of [ϵ] (P = 0.008), and [œ] sounds (P = 0.004). The postexpansion values were lower than those recorded before. In contrast, the F1 of [y] sound (P = 0.02), F2 of [u] sound (P = 0.01), the jitter parameter of [ɯ] and [i] sounds (P = 0.04; P = 0.002), and the NHR value of [ϵ] sound (P = 0.04) were significantly than the baseline values. In the comparison with the control group, significant differences were found in the F0 (P = 0.025) and F1 (P = 0.046) of the [u] sound, the F1 of the [a] sound (P = 0.03), and the F2 of the [ϵ] sound (P = 0.037). Significant differences were also found in the shimmer of [i] (P = 0.017) and [ɔ] (P = 0.002), the jitter of [ϵ] (P = 0.046) and [i] (P = 0.017), and the NHR of [i] (P = 0.012) and [ɔ] (P = 0.009).

CONCLUSION: SARME led to significant differences in some of the acoustics parameters.

RevDate: 2021-09-09

Perlman M, Paul J, G Lupyan (2021)

Vocal communication of magnitude across language, age, and auditory experience.

Journal of experimental psychology. General pii:2021-82980-001 [Epub ahead of print].

Like many other vocalizing vertebrates, humans convey information about their body size through the sound of their voice. Vocalizations of larger animals are typically longer in duration, louder in intensity, and lower in frequency. We investigated people's ability to use voice-size correspondences to communicate about the magnitude of external referents. First, we asked hearing children, as well as deaf children and adolescents, living in China to improvise nonlinguistic vocalizations to distinguish between paired items contrasting in magnitude (e.g., a long vs. short string, a big vs. small ball). Then we played these vocalizations back to adult listeners in the United States and China to assess their ability to correctly guess the intended referents. We find that hearing and deaf producers both signaled greater magnitude items with longer and louder vocalizations and with smaller formant spacing. Only hearing producers systematically used fundamental frequency, communicating greater magnitude with higher fo. The vocalizations of both groups were understandable to Chinese and American listeners, although accuracy was higher with vocalizations from older producers. American listeners relied on the same acoustic properties as Chinese listeners: both groups interpreted vocalizations with longer duration and greater intensity as referring to greater items; neither American nor Chinese listeners consistently used fo or formant spacing as a cue. These findings show that the human ability to use vocalizations to communicate about the magnitude of external referents is highly robust, extending across listeners of disparate linguistic and cultural backgrounds, as well as across age and auditory experience. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

RevDate: 2021-09-07

Stansbury AL, VM Janik (2021)

The role of vocal learning in call acquisition of wild grey seal pups.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 376(1836):20200251.

Pinnipeds have been identified as one of the best available models for the study of vocal learning. Experimental evidence for their learning skills is demonstrated with advanced copying skills, particularly in formant structure when copying human speech sounds and melodies. By contrast, almost no data are available on how learning skills are used in their own communication systems. We investigated the impact of playing modified seal sounds in a breeding colony of grey seals (Halichoerus grypus) to study how acoustic input influenced vocal development of eight pups. Sequences of two or three seal pup calls were edited so that the average peak frequency between calls in a sequence changed up or down. We found that seals copied the specific stimuli played to them and that copies became more accurate over time. The differential response of different groups showed that vocal production learning was used to achieve conformity, suggesting that geographical variation in seal calls can be caused by horizontal cultural transmission. While learning of pup calls appears to have few benefits, we suggest that it also affects the development of the adult repertoire, which may facilitate social interactions such as mate choice. This article is part of the theme issue 'Vocal learning in animals and humans'.

RevDate: 2021-09-02

Stehr DA, Hickok G, Ferguson SH, et al (2021)

Examining vocal attractiveness through articulatory working space.

The Journal of the Acoustical Society of America, 150(2):1548.

Robust gender differences exist in the acoustic correlates of clearly articulated speech, with females, on average, producing speech that is acoustically and phonetically more distinct than that of males. This study investigates the relationship between several acoustic correlates of clear speech and subjective ratings of vocal attractiveness. Talkers were recorded producing vowels in /bVd/ context and sentences containing the four corner vowels. Multiple measures of working vowel space were computed from continuously sampled formant trajectories and were combined with measures of speech timing known to co-vary with clear articulation. Partial least squares regression (PLS-R) modeling was used to predict ratings of vocal attractiveness for male and female talkers based on the acoustic measures. PLS components that loaded on size and shape measures of working vowel space-including the quadrilateral vowel space area, convex hull area, and bivariate spread of formants-along with measures of speech timing were highly successful at predicting attractiveness in female talkers producing /bVd/ words. These findings are consistent with a number of hypotheses regarding human attractiveness judgments, including the role of sexual dimorphism in mate selection, the significance of traits signalling underlying health, and perceptual fluency accounts of preferences.

RevDate: 2021-09-02

Sahoo S, S Dandapat (2021)

Analyzing the vocal tract characteristics for out-of-breath speech.

The Journal of the Acoustical Society of America, 150(2):1524.

In this work, vocal tract characteristic changes under the out-of-breath condition are explored. Speaking under the influence of physical exercise is called out-of-breath speech. The change in breathing pattern results in perceptual changes in the produced sound. For vocal tract, the first four formants show a lowering in their average frequency. The bandwidths BF1 and BF2 widen, whereas the other two get narrowed. The change in bandwidth is small for the last three. For a speaker, the change in frequency and bandwidth may not be uniform across formants. Subband analysis is carried out around formants for comparing the variation of the vocal tract with the source. A vocal tract adaptive empirical wavelet transform is used for extracting formant specific subbands from speech and source. The support vector machine performs the subband-based binary classification between the normal and out-of-breath speech. For all speakers, it shows an F1-score improvement of 4% over speech subbands. Similarly, a performance improvement of 5% can be seen for both male and female speakers. Furthermore, the misclassification amount is less for source compared to speech. These results suggest that physical exercise influences the source more than the vocal tract.

RevDate: 2021-09-01

Dastolfo-Hromack C, Bush A, Chrabaszcz A, et al (2021)

Articulatory Gain Predicts Motor Cortex and Subthalamic Nucleus Activity During Speech.

Cerebral cortex (New York, N.Y. : 1991) pii:6362001 [Epub ahead of print].

Speaking precisely is important for effective verbal communication, and articulatory gain is one component of speech motor control that contributes to achieving this goal. Given that the basal ganglia have been proposed to regulate the speed and size of limb movement, that is, movement gain, we explored the basal ganglia contribution to articulatory gain, through local field potentials (LFP) recorded simultaneously from the subthalamic nucleus (STN), precentral gyrus, and postcentral gyrus. During STN deep brain stimulation implantation for Parkinson's disease, participants read aloud consonant-vowel-consonant syllables. Articulatory gain was indirectly assessed using the F2 Ratio, an acoustic measurement of the second formant frequency of/i/vowels divided by/u/vowels. Mixed effects models demonstrated that the F2 Ratio correlated with alpha and theta activity in the precentral gyrus and STN. No correlations were observed for the postcentral gyrus. Functional connectivity analysis revealed that higher phase locking values for beta activity between the STN and precentral gyrus were correlated with lower F2 Ratios, suggesting that higher beta synchrony impairs articulatory precision. Effects were not related to disease severity. These data suggest that articulatory gain is encoded within the basal ganglia-cortical loop.

RevDate: 2021-08-17

Aires MM, de Vasconcelos D, Lucena JA, et al (2021)

Effect of Wendler glottoplasty on voice and quality of life of transgender women.

Brazilian journal of otorhinolaryngology pii:S1808-8694(21)00134-8 [Epub ahead of print].

OBJECTIVE: To investigate the effect of Wendler glottoplasty on voice feminization, voice quality and voice-related quality of life.

METHODS: Prospective interventional cohort of transgender women submitted to Wendler glottoplasty. Acoustic analysis of the voice included assessment of fundamental frequency, maximum phonation time formant frequencies (F1 and F2), frequency range, jitter and shimmer. Voice quality was blindly assessed through GRBAS scale. Voice-related quality of life was measured using the Trans Woman Voice Questionnaire and the self-perceived femininity of the voice.

RESULTS: A total of 7 patients were included. The mean age was 35.4 years, and the mean postoperative follow-up time was 13.7 months. There was a mean increase of 47.9 ± 46.6 Hz (p = 0.023) in sustained/e/F0 and a mean increase of 24.6 ± 27.5 Hz (p = 0.029) in speaking F0 after glottoplasty. There was no statistical significance in the pre- and postoperative comparison of maximum phonation time, formant frequencies, frequency range, jitter, shimmer, and grade, roughness, breathiness, asthenia, and strain scale. Trans Woman Voice Questionnaire decreased following surgery from 98.3 ± 9.2 to 54.1 ± 25.0 (p = 0.007) and mean self-perceived femininity of the voice increased from 2.8 ± 1.8 to 7.7 ± 2.4 (p = 0.008). One patient (14%) presented a postoperative granuloma and there was 1 (14%) premature suture dehiscence.

CONCLUSION: Glottoplasty is safe and effective for feminizing the voice of transgender women. There was an increase in fundamental frequency, without aggravating other acoustic parameters or voice quality. Voice-related quality of life improved after surgery.

RevDate: 2021-08-16

Chung H (2021)

Acoustic Characteristics of Pre- and Post-vocalic /l/: Patterns from One Southern White Vernacular English.

Language and speech [Epub ahead of print].

This study examined acoustic characteristics of the phoneme /l/ produced by young female and male adult speakers of Southern White Vernacular English (SWVE) from Louisiana. F1, F2, and F2-F1 values extracted at the /l/ midpoint were analyzed by word position (pre- vs. post-vocalic) and vowel contexts (/i, ɪ/ vs. /ɔ, a/). Descriptive analysis showed that SWVE /l/ exhibited characteristics of the dark /l/ variant. The formant patterns of /l/, however, differed significantly by word position and vowel context, with pre-vocalic /l/ showing significantly higher F2-F1 values than post-vocalic /l/, and /l/ in the high front vowel context showing significantly higher F2-F1 values than those in the low back vowel context. Individual variation in the effects of word position and vowel contexts on /l/ pattern was also observed. Overall, the findings of the current study showed a gradient nature of SWVE /l/ variants whose F2-F1 patterns generally fell into the range of the dark /l/ variant, while varying by word position and vowel context.

RevDate: 2021-09-02

Yang L, Fu K, Zhang J, et al (2021)

Non-native acoustic modeling for mispronunciation verification based on language adversarial representation learning.

Neural networks : the official journal of the International Neural Network Society, 142:597-607.

Non-native mispronunciation verification is designed to provide feedback to guide language learners to correct their pronunciation errors in their further learning and it plays an important role in the computer-aided pronunciation training (CAPT) system. Most existing approaches focus on establishing the acoustic model directly using non-native corpus thus they are suffering the data sparsity problem due to time-consuming non-native speech data collection and annotation tasks. In this work, to address this problem, we propose a pre-trained approach to utilize the speech data of two native languages (the learner's native and target languages) for non-native mispronunciation verification. We set up an unsupervised model to extract knowledge from a large scale of unlabeled raw speech of the target language by making predictions about future observations in the speech signal, then the model is trained with language adversarial training using the learner's native language to align the feature distribution of two languages by confusing a language discriminator. In addition, sinc filter is incorporated at the first convolutional layer to capture the formant-like feature. Formant is relevant to the place and manner of articulation. Therefore, it is useful not only for pronunciation error detection but also for providing instructive feedback. Then the pre-trained model serves as the feature extractor in the downstream mispronunciation verification task. Through the experiments on the Japanese part of the BLCU inter-Chinese speech corpus, the experimental results demonstrate that for the non-native phone recognition and mispronunciation verification tasks (1) the knowledge learned from two native languages speech with the proposed unsupervised approach is useful for these two tasks (2) our proposed language adversarial representation learning is effective to improve the performance (3) formant-like feature can be incorporated by introducing sinc filter to further improve the performance of mispronunciation verification.

RevDate: 2021-08-13

Leyns C, Corthals P, Cosyns M, et al (2021)

Acoustic and Perceptual Effects of Articulation Exercises in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00242-3 [Epub ahead of print].

PURPOSE: This study measured the impact of articulation exercises using a cork and articulation exercises for lip spreading on the formant frequencies of vowels and listener perceptions of femininity in transgender women.

METHODS: Thirteen transgender women were recorded before and after the cork exercise and before and after the lip spreading exercise. Speech samples included continuous speech during reading and were analyzed using Praat software. Vowel formant frequencies (F1, F2, F3, F4, F5) and vowel space were determined. A listening experiment was organized using naïve cisgender women and cisgender men rating audio samples of continuous speech. Masculinity/femininity, vocal quality and age were rated, using a visual analogue scale (VAS).

RESULTS: Concerning vowel formant frequencies, F2 /a/ and F5 /u/ significantly increased after the lip spreading exercise, as well as F3 /a/, F3 /u/ and F4 /a/ after the cork exercise. The lip spreading exercise had more impact on the F2 /a/ than the cork exercise. Vowel space did not change after the exercises. The fundamental frequency (fo) increased simultaneously during both exercises. Both articulation exercises were associated with significantly increased listener perceptions of femininity of the voice.

CONCLUSION: Subtle changes in formant frequencies can be observed after performing articulation exercises, but not in every formant frequency or vowel. Cisgender listeners rated the speech of the transgender women more feminine after the exercises. Further research with a more extensive therapy program and listening experiment is needed to examine these preliminary findings.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Yang JJ, Cheng LY, W Xu (2021)

[Study on changes of voice characteristics after adenotonsillectomy or adenoidectomy in children].

Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery, 56(7):724-729.

Objective: To study voice changes in children after adenotonsillectomy or adenoidectomy and the relationship with the vocal tract structure. Methods: Fifty patients were recruited in this study prospectively, aged from 4 to 12 years old with the median age of 6. They were underwent adenotonsillectomy or adenoidectomy in Beijing Tongren Hospital, Capital Medical University from July 2019 to August 2020. In the cases, there are 31 males and 19 females. Thirty-six patients underwent adenotonsillectomy and 14 patients underwent adenoidectomy alone. Twenty-two children (13 males, 9 females) with Ⅰ degree of bilateral tonsils without adenoid hypertrophy and no snoring were selected as normal controls. Adenoid and tonsil sizes were evaluated. Subjective changes of voice were recorded after surgery. Moreover, voice data including fundamental frequency(F0), jitter, shimmer, noise to harmonic ratio(NHR), maximum phonation time(MPT), formant frequencies(F1-F5) and bandwidths(B1-B5) of vowel/a/and/i/were analyzed before, 3 days and 1 month after surgery respectively.SPSS 23.0 was used for statistical analysis. Results: Thirty-six patients(72.0%,36/50) complained of postoperative voice changes. The incidence was inversely correlated with age. In children aged 4-6, 7-9, and 10-12, the incidence was 83.3%(25/30), 63.6%(7/11) and 44.4%(4/9) respectively. Voice changes appeared more common in children underwent adenotonsillectomy(77.8%,28/36) than in those underwent adenoidectomy alone(57.1%,8/14), but there was no statistical difference. After operation, for vowel/a/, MPT(Z=2.18,P=0.041) and F2(t=2.13,P=0.040) increased, B2(Z=2.04,P=0.041) and B4(Z=2.00,P=0.046) decreased. For vowel/i/, F2(t=2.035,P=0.050) and F4(t=4.44,P=0.0001) increased, B2(Z=2.36,P=0.019) decreased. Other acoustic parameters were not significantly different from those before surgery. The F2(r=-0.392, P =0.032) of vowel/a/and F2(r=-0.279, P=0.048) and F4 (r=-0.401, P =0.028) of vowel/i/after adenotonsillectomy were significantly higher than those of adenoidectomy alone. Half of patients with postopertive voice changes can recover spontaneously 1 month after surgery. Conclusions: Voice changes in children underwent adenotonsillectomy or adenoidectomy might be related to their changes in formants and bandwidths. The effect of adenotonsillectomy on voice was more significant compared with that of adenoidectomy alone. The acoustic parameters did not change significantly after surgery except MPT.

RevDate: 2021-08-03

Frey R, Wyman MT, Johnston M, et al (2021)

Roars, groans and moans: Anatomical correlates of vocal diversity in polygynous deer.

Journal of anatomy [Epub ahead of print].

Eurasian deer are characterized by the extraordinary diversity of their vocal repertoires. Male sexual calls range from roars with relatively low fundamental frequency (hereafter fo) in red deer Cervus elaphus, to moans with extremely high fo in sika deer Cervus nippon, and almost infrasonic groans with exceptionally low fo in fallow deer Dama dama. Moreover, while both red and fallow males are capable of lowering their formant frequencies during their calls, sika males appear to lack this ability. Female contact calls are also characterized by relatively less pronounced, yet strong interspecific differences. The aim of this study is to examine the anatomical bases of these inter-specific and inter-sexual differences by identifying if the acoustic variation is reflected in corresponding anatomical variation. To do this, we investigated the vocal anatomy of male and female specimens of each of these three species. Across species and sexes, we find that the observed acoustic variability is indeed related to expected corresponding anatomical differences, based on the source-filter theory of vocal production. At the source level, low fo is associated with larger vocal folds, whereas high fo is associated with smaller vocal folds: sika deer have the smallest vocal folds and male fallow deer the largest. Red and sika deer vocal folds do not appear to be sexually dimorphic, while fallow deer exhibit strong sexual dimorphism (after correcting for body size differences). At the filter level, the variability in formants is related to the configuration of the vocal tract: in fallow and red deer, both sexes have evolved a permanently descended larynx (with a resting position of the larynx much lower in males than in females). Both sexes also have the potential for momentary, call-synchronous vocal tract elongation, again more pronounced in males than in females. In contrast, the resting position of the larynx is high in both sexes of sika deer and the potential for further active vocal tract elongation is virtually absent in both sexes. Anatomical evidence suggests an evolutionary reversal in larynx position within sika deer, that is, a secondary larynx ascent. Together, our observations confirm that the observed diversity of vocal behaviour in polygynous deer is supported by strong anatomical differences, highlighting the importance of anatomical specializations in shaping mammalian vocal repertoires. Sexual selection is discussed as a potential evolutionary driver of the observed vocal diversity and sexual dimorphisms.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Strycharczuk P, Ćavar M, S Coretta (2021)

Distance vs time. Acoustic and articulatory consequences of reduced vowel duration in Polish.

The Journal of the Acoustical Society of America, 150(1):592.

This paper presents acoustic and articulatory (ultrasound) data on vowel reduction in Polish. The analysis focuses on the question of whether the change in formant value in unstressed vowels can be explained by duration-driven undershoot alone or whether there is also evidence for additional stress-specific articulatory mechanisms that systematically affect vowel formants. On top of the expected durational differences between the stressed and unstressed conditions, the duration is manipulated by inducing changes in the speech rate. The observed vowel formants are compared to expected formants derived from the articulatory midsagittal tongue data in different conditions. The results show that the acoustic vowel space is reduced in size and raised in unstressed vowels compared to stressed vowels. Most of the spectral reduction can be explained by reduced vowel duration, but there is also an additional systematic effect of F1-lowering in unstressed non-high vowels that does not follow from tongue movement. The proposed interpretation is that spectral vowel reduction in Polish behaves largely as predicted by the undershoot model of vowel reduction, but the effect of undershoot is enhanced for low unstressed vowels, potentially by a stress marking strategy which involves raising the fundamental frequency.

RevDate: 2021-08-03

Petersen EA, Colinot T, Silva F, et al (2021)

The bassoon tonehole lattice: Links between the open and closed holes and the radiated sound spectrum.

The Journal of the Acoustical Society of America, 150(1):398.

The acoustics of the bassoon has been the subject of relatively few studies compared with other woodwind instruments. One reason for this may lie in its complicated resonator geometry, which includes irregularly spaced toneholes with chimney heights ranging from 3 to 31 mm. The current article evaluates the effect of the open and closed tonehole lattice (THL) on the acoustic response of the bassoon resonator. It is shown that this response can be divided into three distinct frequency bands that are determined by the open and closed THL: below 500 Hz, 500-2200 Hz, and above 2200 Hz. The first is caused by the stopband of the open THL, where the low frequency effective length of the instrument is determined by the location of the first open tonehole. The second is due to the passband of the open THL, such that the modes are proportional to the total length of the resonator. The third is due to the closed THL, where part of the acoustical power is trapped within the resonator. It is proposed that these three frequency bands impact the radiated spectrum by introducing a formant in the vicinity of 500 Hz and suppressing radiation above 2200 Hz for most first register fingerings.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Uezu Y, Hiroya S, T Mochida (2021)

Articulatory compensation for low-pass filtered formant-altered auditory feedback.

The Journal of the Acoustical Society of America, 150(1):64.

Auditory feedback while speaking plays an important role in stably controlling speech articulation. Its importance has been verified in formant-altered auditory feedback (AAF) experiments where speakers utter while listening to speech with perturbed first (F1) and second (F2) formant frequencies. However, the contribution of the frequency components higher than F2 to the articulatory control under the perturbations of F1 and F2 has not yet been investigated. In this study, a formant-AAF experiment was conducted in which a low-pass filter was applied to speech. The experimental results showed that the deviation in the compensatory response was significantly larger when a low-pass filter with a cutoff frequency of 3 kHz was used compared to that when cutoff frequencies of 4 and 8 kHz were used. It was also found that the deviation in the 3-kHz condition correlated with the fundamental frequency and spectral tilt of the produced speech. Additional simulation results using a neurocomputational model of speech production (SimpleDIVA model) and the experimental data showed that the feedforward learning rate increased as the cutoff frequency decreased. These results suggest that high-frequency components of the auditory feedback would be involved in the determination of corrective motor commands from auditory errors.

RevDate: 2021-07-23

Lynn E, Narayanan SS, AC Lammert (2021)

Dark tone quality and vocal tract shaping in soprano song production: Insights from real-time MRI.

JASA express letters, 1(7):075202.

Tone quality termed "dark" is an aesthetically important property of Western classical voice performance and has been associated with lowered formant frequencies, lowered larynx, and widened pharynx. The present study uses real-time magnetic resonance imaging with synchronous audio recordings to investigate dark tone quality in four professionally trained sopranos with enhanced ecological validity and a relatively complete view of the vocal tract. Findings differ from traditional accounts, indicating that labial narrowing may be the primary driver of dark tone quality across performers, while many other aspects of vocal tract shaping are shown to differ significantly in a performer-specific way.

RevDate: 2021-07-18

Joshi A, Procter T, PA Kulesz (2021)

COVID-19: Acoustic Measures of Voice in Individuals Wearing Different Facemasks.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

AIM: The global health pandemic caused by the SARS-coronavirus 2 (COVID-19) has led to the adoption of facemasks as a necessary safety precaution. Depending on the level of risk for exposure to the virus, the facemasks that are used can vary. The aim of this study was to examine the effect of different types of facemasks, typically used by healthcare professionals and the public during the COVID-19 pandemic, on measures of voice.

METHODS: Nineteen adults (ten females, nine males) with a normal voice quality completed sustained vowel tasks. All tasks were performed for each of the six mask conditions: no mask, cloth mask, surgical mask, KN95 mask and, surgical mask over a KN95 mask with and without a face shield. Intensity measurements were obtained at a 1ft and 6ft distance from the speaker with sound level meters. Tasks were recorded with a 1ft mouth-to-microphone distance. Acoustic variables of interest were fundamental frequency (F0), and formant frequencies (F1, F2) for /a/ and /i/ and smoothed cepstral peak prominence (CPPs) for /a/.

RESULTS: Data were analyzed to compare differences between sex and mask types. There was statistical significance between males and females for intensity measures and all acoustic variables except F2 for /a/ and F1 for /i/. Few pairwise comparisons between masks reached significance even though main effects for mask type were observed. These are further discussed in the article.

CONCLUSION: The masks tested in this study did not have a significant impact on intensity, fundamental frequency, CPPs, first or second formant frequency compared to voice output without a mask. Use of a face shield seemed to affect intensity and CPPs to some extent. Implications of these findings are discussed further in the article.

RevDate: 2021-08-04

Easwar V, Birstler J, Harrison A, et al (2021)

The Influence of Sensation Level on Speech-Evoked Envelope Following Responses.

Ear and hearing pii:00003446-900000000-98474 [Epub ahead of print].

OBJECTIVES: To evaluate sensation level (SL)-dependent characteristics of envelope following responses (EFRs) elicited by band-limited speech dominant in low, mid, and high frequencies.

DESIGN: In 21 young normal hearing adults, EFRs were elicited by 8 male-spoken speech stimuli-the first formant, and second and higher formants of /u/, /a/ and /i/, and modulated fricatives, /∫/ and /s/. Stimulus SL was computed from behaviorally measured thresholds.

RESULTS: At 30 dB SL, the amplitude and phase coherence of fricative-elicited EFRs were ~1.5 to 2 times higher than all vowel-elicited EFRs, whereas fewer and smaller differences were found among vowel-elicited EFRs. For all stimuli, EFR amplitude and phase coherence increased by roughly 50% for every 10 dB increase in SL between ~0 and 50 dB.

CONCLUSIONS: Stimulus and frequency dependency in EFRs exist despite accounting for differences in audibility of speech sounds. The growth rate of EFR characteristics with SL is independent of stimulus and its frequency.

RevDate: 2021-07-17

Zealouk O, Satori H, Hamidi M, et al (2021)

Analysis of COVID-19 Resulting Cough Using Formants and Automatic Speech Recognition System.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

As part of our contributions to researches on the ongoing COVID-19 pandemic worldwide, we have studied the cough changes to the infected people based on the Hidden Markov Model (HMM) speech recognition classification, formants frequency and pitch analysis. In this paper, An HMM-based cough recognition system was implemented with 5 HMM states, 8 Gaussian Mixture Distributions (GMMs) and 13 dimensions of the basic Mel-Frequency Cepstral Coefficients (MFCC) with 39 dimensions of the overall feature vector. A comparison between formants frequency and pitch extracted values is realized based on the cough of COVID-19 infected people and healthy ones to confirm our cough recognition system results. The experimental results present that the difference between the recognition rates of infected and non-infected people is 6.7%. Whereas, the formant analysis variation based on the cough of infected and non-infected people is clearly observed with F1, F3, and F4 and lower for F0 and F2.

RevDate: 2021-07-15
CmpDate: 2021-07-15

Zhang C, Jepson K, Lohfink G, et al (2021)

Comparing acoustic analyses of speech data collected remotely.

The Journal of the Acoustical Society of America, 149(6):3910.

Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.

RevDate: 2021-08-08

Diamant N, O Amir (2021)

Examining the voice of Israeli transgender women: Acoustic measures, voice femininity and voice-related quality-of-life.

International journal of transgender health, 22(3):281-293.

Background: Transgender women may experience gender-dysphoria associated with their voice and the way it is perceived. Previous studies have shown that specific acoustic measures are associated with the perception of voice-femininity and with voice-related quality-of-life, yet results are inconsistent.

Aims: This study aimed to examine the associations between specific voice measures of transgender women, voice-related quality-of-life, and the perception of voice-femininity by listeners and by the speakers themselves.

Methods: Thirty Hebrew speaking transgender women were recorded. They had also rated their voice-femininity and completed the Hebrew version of the TVQMtF questionnaire. Recordings were analyzed to extract mean fundamental frequency (F0), formant frequencies (F1, F2, F3), and vocal-range (calculated in Hz. and in semitones). Recordings were also rated on a voice-gender 7-point scale, by 20 naïve cisgender listeners.

Results: Significant correlations were found between both F0 and F1 and listeners' as well as speakers' evaluation of voice-femininity. TVQMtF scores were significantly correlated with F0 and with the lower and upper boundaries of the vocal-range. Voice-femininity ratings were strongly correlated with vocal-range, when calculated in Hz, but not when defined in semitones. Listeners' evaluation and speakers' self-evaluation of voice-femininity were significantly correlated. However, TVQMtF scores were significantly correlated only with the speakers' voice-femininity ratings, but not with those of the listeners.

Conclusion: Higher F0 and F1, which are perceived as more feminine, jointly improved speakers' satisfaction with their voice. Speakers' self-evaluation of voice-femininity does not mirror listeners' judgment, as it is affected by additional factors, related to self-satisfaction and personal experience. Combining listeners' and speakers' voice evaluation with acoustic analysis is valuable by providing a more holistic view on how transgender women feel about their voice and how it is perceived by listeners.

RevDate: 2021-08-06
CmpDate: 2021-08-06

Leung Y, Oates J, Chan SP, et al (2021)

Associations Between Speaking Fundamental Frequency, Vowel Formant Frequencies, and Listener Perceptions of Speaker Gender and Vocal Femininity-Masculinity.

Journal of speech, language, and hearing research : JSLHR, 64(7):2600-2622.

Purpose The aim of the study was to examine associations between speaking fundamental frequency (f os), vowel formant frequencies (F), listener perceptions of speaker gender, and vocal femininity-masculinity. Method An exploratory study was undertaken to examine associations between f os, F 1-F 3, listener perceptions of speaker gender (nominal scale), and vocal femininity-masculinity (visual analog scale). For 379 speakers of Australian English aged 18-60 years, f os mode and F 1-F 3 (12 monophthongs; total of 36 Fs) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity-masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 Fs could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity-masculinity behaved differently and were predicted by F 1, F 3, and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os. Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity-masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity-masculinity relative to f os than has previously been reported.

RevDate: 2021-08-09

Easwar V, Boothalingam S, R Flaherty (2021)

Fundamental frequency-dependent changes in vowel-evoked envelope following responses.

Hearing research, 408:108297.

Scalp-recorded envelope following responses (EFRs) provide a non-invasive method to assess the encoding of the fundamental frequency (f0) of voice that is important for speech understanding. It is well-known that EFRs are influenced by voice f0. However, this effect of f0 has not been examined independent of concomitant changes in spectra or neural generators. We evaluated the effect of voice f0 on EFRs while controlling for vowel formant characteristics and potentially avoiding significant changes in dominant neural generators using a small f0 range. EFRs were elicited by a male-spoken vowel /u/ (average f0 = 100.4 Hz) and its lowered f0 version (average f0 = 91.9 Hz) with closely matched formant characteristics. Vowels were presented to each ear of 17 young adults with normal hearing. EFRs were simultaneously recorded between the vertex and the nape, and the vertex and the ipsilateral mastoid-the two most common electrode montages used for EFRs. Our results indicate that when vowel formant characteristics are matched, an increase in f0 by 8.5 Hz reduces EFR amplitude by 25 nV, phase coherence by 0.05 and signal-to-noise ratio by 3.5 dB, on average. The reduction in EFR characteristics was similar across ears of stimulation and the two montages used. These findings will help parse the influence of f0 or stimulus spectra on EFRs when both co-vary.

RevDate: 2021-07-02

Eravci FC, Yildiz BD, Özcan KM, et al (2021)

Acoustic parameter changes after bariatric surgery.

Logopedics, phoniatrics, vocology [Epub ahead of print].

OBJECTIVE: To investigate the acoustic parameter changes after weight loss in bariatric surgery patients.

MATERIALS AND METHODS: This prospective, longitudinal study was conducted with 15 patients with planned bariatric surgery, who were evaluated pre-operatively and at 6 months post-operatively. Fundamental frequency (F0), Formant frequency (F1, F2, F3, and F4), Frequency perturbation (Jitter), Amplitude perturbation (Shimmer) and Noise-to-Harmonics Ratio (NHR) parameters were evaluated for /a/, /e/, /i/, /o/, and /u/ vowels. Changes in the acoustic analysis parameters for each vowel were compared. The study group was separated into two groups according to whether the Mallampati score had not changed (Group 1) or had decreased (Group 2) and changes in the formant frequencies were compared between these groups.

RESULTS: A total of 15 patients with a median age of 40 ± 11 years completed the study. The median weight of the patients was 122 ± 14 kg pre-operatively and 80 ± 15 kg, post-operatively. BMI declined from 46 ± 4 to 31 ± 5 kg/m2. The Mallampati score decreased by one point in six patients and remained stable in nine. Of the acoustic voice analysis parameters of vowels, in general, fundamental frequency tended to decrease, and shimmer and jitter values tended to increase. Some of the formant frequencies were specifically affected by the weight loss and this showed statistical significance between Group 1 and Group 2.

CONCLUSION: The present study reveals that some specific voice characteristics might be affected by successful weight loss after bariatric surgery.HighlightsObesity reduces the size of the pharyngeal lumen at different levels.The supralaryngeal vocal tract size and configuration is a determinative factor in the features of the voice.Changes in the length and shape of the vocal tract, or height and position of the tongue can result in changes especially in formant frequencies in acoustic analysis.

RevDate: 2021-09-02

Yang J (2021)

Vowel development in young Mandarin-English bilingual children.

Phonetica, 78(3):241-272 pii:phon-2021-2006.

This study examined the development of vowel categories in young Mandarin -English bilingual children. The participants included 35 children aged between 3 and 4 years old (15 Mandarin-English bilinguals, six English monolinguals, and 14 Mandarin monolinguals). The bilingual children were divided into two groups: one group had a shorter duration (<1 year) of intensive immersion in English (Bi-low group) and one group had a longer duration (>1 year) of intensive immersion in English (Bi-high group). The participants were recorded producing one list of Mandarin words containing the vowels /a, i, u, y, ɤ/ and/or one list of English words containing the vowels /i, ɪ, e, ɛ, æ, u, ʊ, o, ɑ, ʌ/. Formant frequency values were extracted at five equidistant time locations (the 20-35-50-65-80% point) over the course of vowel duration. Cross-language and within-language comparisons were conducted on the midpoint formant values and formant trajectories. The results showed that children in the Bi-low group produced their English vowels into clusters and showed positional deviations from the monolingual targets. However, they maintained the phonetic features of their native vowel sounds well and mainly used an assimilatory process to organize the vowel systems. Children in the Bi-high group separated their English vowels well. They used both assimilatory and dissimilatory processes to construct and refine the two vowel systems. These bilingual children approximated monolingual English children to a better extent than the children in the Bi-low group. However, when compared to the monolingual peers, they demonstrated observable deviations in both L1 and L2.

RevDate: 2021-06-12

Lin Y, Cheng L, Wang Q, et al (2021)

Effects of Medical Masks on Voice Assessment During the COVID-19 Pandemic.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00163-6 [Epub ahead of print].

OBJECTIVE: Voice assessment is of great significance to the evaluation of voice quality. Our study aims to explore the effects of medical masks on healthy people in acoustic, aerodynamic and formant parameters during the COVID-19 pandemic. In addition, we also attempted to verify the differences between different sexes and ages.

METHODS: Fifty-three healthy participants (25 males and 28 females) were involved in our study. The acoustic parameters, including fundamental frequency (F0), sound pressure level (SPL), percentage of jitter (%), percentage of shimmer (%), noise to harmonic ratio (NHR) and cepstral peak prominence (CPP), aerodynamic parameter (maximum phonation time, MPT) and formant parameters (formant frequency, F1, F2, F3) without and with wearing medical masks were included. We further investigated the potential differences in the impact on different sexes and ages (≤45 years old and >45 years old).

RESULTS: While wearing medical masks, the SPL significantly increased (71.22±4.25 dB, 72.42±3.96 dB, P = 0.021). Jitter and shimmer significantly decreased (jitter 1.19±0.83, 0.87±0.67 P = 0.005; shimmer 4.49±2.20, 3.66±2.02 P = 0.002), as did F3 (2855±323.34 Hz, 2781.89±353.42 Hz P = 0.004). F0, MPT, F1 and F2 showed increasing trends without statistical significance, and NHR as well as CPP showed little change without and with wearing medical masks. There were no significant differences seen between males and females. Regarding to age, a significant difference in MPT was seen (>45-year-old 16.15±6.98 s, 15.38±7.02 s; ≤45-year-old 20.26±6.47 s, 21.44±6.98 s, P = 0.032).

CONCLUSION: Healthy participants showed a significantly higher SPL, a smaller perturbation and an evident decrease in F3 after wearing medical masks. These changes may result from the adjustment of the vocal tract and the filtration function of medical masks, leading to the stability of voices we recorded being overstated. The impacts of medical masks on sex were not evident, while the MPT in the >45-year-old group was influenced more than that in the ≤45-year-old group.

RevDate: 2021-07-12

Madrid AM, Walker KA, Smith SB, et al (2021)

Relationships between click auditory brainstem response and speech frequency following response with development in infants born preterm.

Hearing research, 407:108277.

The speech evoked frequency following response (sFFR) is used to study relationships between neural processing and functional aspects of speech and language that are not captured by click or toneburst evoked auditory brainstem responses (ABR). The sFFR is delayed, deviant, or weak in school age children having a variety of disorders, including autism, dyslexia, reading and language disorders, in relation to their typically developing peers. Much less is known about the developmental characteristics of sFFR, especially in preterm infants, who are at risk of having language delays. In term neonates, phase locking and spectral representation of the fundamental frequency is developed in the early days of life. Spectral representation of higher harmonics and latencies associated with transient portions of the stimulus are still developing in term infants through at least 10 months of age. The goal of this research was to determine whether sFFR could be measured in preterm infants and to characterize its developmental trajectory in the time and frequency domain. Click ABR and sFFR were measured in 28 preterm infants at ages 33 to 64 weeks gestational age. The sFFR could be measured in the majority of infants at 33 weeks gestational age, and the detectability of all sFFR waves was 100% by 64 weeks gestational age. The latency of all waves associated with the transient portion of the response (waves V, A, and O), and most waves (waves D and E) associated with the quasi-steady state decreased with increasing age. The interpeak wave A-O latency did not change with age, indicating that these waves share a neural generator, or the neural generators are developing at the same rate. The spectral amplitude of F0 and the lower frequencies of the first formant increased with age, but that for higher frequencies of the first formant and higher harmonics did not. The results suggest that the sFFR can be reliably recorded in preterm infants, including those cared for in the neonatal intensive care unit. These findings support that in preterm infants, F0 amplitude continues to develop within the first 6 months of life and develops before efficient representation of higher frequency harmonics. Further research is needed to determine if the sFFR in preterm infants is predictive of long-term language or learning disorders.

RevDate: 2021-05-28

Andrade PA, Frič M, Z Otčenášek (2021)

Assessment of Changes in Laryngeal Configuration and Voice Parameters Among Different Frequencies of Neuromuscular Electrical Stimulation (NMES) and Cumulative Effects of NMES in a Normophonic Subject: A Pilot Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00114-4 [Epub ahead of print].

INTRODUCTION: Neuromuscular electrical stimulation (NMES) is a complementary resource to voice therapy that can be used for the treatment of hypofunctional voice disorders. Although positive clinical studies have been reported, neutral and even potentially harmful effects of NMES are also described in the literature. Furthermore, in the studies examined by the authors, the use of different methods of NMES have been identified, which further contributes to the inconsistent results found among studies. Moreover, limited rationale is provided for the chosen NMES parameters such as electrode placement, frequency of NMES and length of treatment. The aims of this pilot study were to investigate the a) impact of different frequencies of NMES on glottal configuration and vocal fold vibration patterns and b) changes in laryngeal configuration and vocal output across 12 minutes of NMES.

METHOD: Three experiments were carried out looking at changes in laryngeal configuration and voice output using different imaging techniques (fibreoptic nasolaryngoscopy and high-speed video), acoustical analysis (F0, formant analysis, SPL, CPPS and LHSR values), electroglottography (EGG) and Relative Fundamental Frequency (RFF) analyses. Glottal parameters and acoustical measures were recorded before, during, and after stimulation. Data was collected at rest and during phonation.

RESULTS: Overall the results showed global changes in laryngeal configuration from normal to hyperfunctional (ie, increased RFF, SPL, CQ, and stiffness). Changes were more pronounced for lower frequencies of NMES and were significant within less than three minutes of application.

CONCLUSION: NMES is an effective resource for the activation of intrinsic laryngeal muscles producing significant levels of adduction within few minutes of application. Lower NMES frequencies produced greater muscle activation when compared to higher frequencies.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Souza PE, Ellis G, Marks K, et al (2021)

Does the Speech Cue Profile Affect Response to Amplitude Envelope Distortion?.

Journal of speech, language, and hearing research : JSLHR, 64(6):2053-2069.

Purpose A broad area of interest to our group is to understand the consequences of the "cue profile" (a measure of how well a listener can utilize audible temporal and/or spectral cues for listening scenarios in which a subset of cues is distorted. The study goal was to determine if listeners whose cue profile indicated that they primarily used temporal cues for recognition would respond differently to speech-envelope distortion than listeners who utilized both spectral and temporal cues. Method Twenty-five adults with sensorineural hearing loss participated in the study. The listener's cue profile was measured by analyzing identification patterns for a set of synthetic syllables in which envelope rise time and formant transitions were varied. A linear discriminant analysis quantified the relative contributions of spectral and temporal cues to identification patterns. Low-context sentences in noise were processed with time compression, wide-dynamic range compression, or a combination of time compression and wide-dynamic range compression to create a range of speech-envelope distortions. An acoustic metric, a modified version of the Spectral Correlation Index, was calculated to quantify envelope distortion. Results A binomial generalized linear mixed-effects model indicated that envelope distortion, the cue profile, the interaction between envelope distortion and the cue profile, and the pure-tone average were significant predictors of sentence recognition. Conclusions The listeners with good perception of spectro-temporal contrasts were more resilient to the detrimental effects of envelope compression than listeners who used temporal cues to a greater extent. The cue profile may provide information about individual listening that can direct choice of hearing aid parameters, especially those parameters that affect the speech envelope.

RevDate: 2021-07-27
CmpDate: 2021-07-27

Stilp CE, AA Assgari (2021)

Contributions of natural signal statistics to spectral context effects in consonant categorization.

Attention, perception & psychophysics, 83(6):2694-2708.

Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Dromey C, Richins M, T Low (2021)

Kinematic and Acoustic Changes to Vowels and Diphthongs in Bite Block Speech.

Journal of speech, language, and hearing research : JSLHR, 64(6):1794-1801.

Purpose We examined the effect of bite block insertion (BBI) on lingual movements and formant frequencies in corner vowel and diphthong production in a sentence context. Method Twenty young adults produced the corner vowels (/u/, /ɑ/, /æ/, /i/) and the diphthong /ɑɪ/ in sentence contexts before and after BBI. An electromagnetic articulograph measured the movements of the tongue back, middle, and front. Results There were significant decreases in the acoustic vowel articulation index and vowel space area following BBI. The kinematic vowel articulation index decreased significantly for the back and middle of the tongue but not for the front. There were no significant acoustic changes post-BBI for the diphthong, other than a longer transition duration. Diphthong kinematic changes after BBI included smaller movements for the back and middle of the tongue, but not the front. Conclusions BBI led to a smaller acoustic working space for the corner vowels. The adjustments made by the front of the tongue were sufficient to compensate for the BBI perturbation in the diphthong, resulting in unchanged formant trajectories. The back and middle of the tongue were likely biomechanically restricted in their displacement by the fixation of the jaw, whereas the tongue front showed greater movement flexibility.

RevDate: 2021-05-12

Onosson S, J Stewart (2021)

The Effects of Language Contact on Non-Native Vowel Sequences in Lexical Borrowings: The Case of Media Lengua.

Language and speech [Epub ahead of print].

Media Lengua (ML), a mixed language derived from Quichua and Spanish, exhibits a phonological system that largely conforms to that of Quichua acoustically. Yet, it incorporates a large number of vowel sequences from Spanish which do not occur in the Quichua system. This includes the use of mid-vowels, which are phonetically realized in ML as largely overlapping with the high-vowels in acoustic space. We analyze and compare production of vowel sequences by speakers of ML, Quichua, and Spanish through the use of generalized additive mixed models to determine statistically significant differences between vowel formant trajectories. Our results indicate that Spanish-derived ML vowel sequences frequently differ significantly from their Spanish counterparts, largely occupying a more central region of the vowel space and frequently exhibiting markedly reduced trajectories over time. In contrast, we find only one case where an ML vowel sequence differs significantly from its Quichua counterpart-and even in this case the difference from Spanish is substantially greater. Our findings show how the vowel system of ML successfully integrates novel vowel sequence patterns from Spanish into what is essentially Quichua phonology by markedly adapting their production, while still maintaining contrasts which are not expressed in Quichua.

RevDate: 2021-08-18

Xiao Y, Wang T, Deng W, et al (2021)

Data mining of an acoustic biomarker in tongue cancers and its clinical validation.

Cancer medicine, 10(11):3822-3835.

The promise of speech disorders as biomarkers in clinical examination has been identified in a broad spectrum of neurodegenerative diseases. However, to the best of our knowledge, a validated acoustic marker with established discriminative and evaluative properties has not yet been developed for oral tongue cancers. Here we cross-sectionally collected a screening dataset that included acoustic parameters extracted from 3 sustained vowels /ɑ/, /i/, /u/ and binary perceptual outcomes from 12 consonant-vowel syllables. We used a support vector machine with linear kernel function within this dataset to identify the formant centralization ratio (FCR) as a dominant predictor of different perceptual outcomes across gender and syllable. The Acoustic analysis, Perceptual evaluation and Quality of Life assessment (APeQoL) was used to validate the FCR in 33 patients with primary resectable oral tongue cancers. Measurements were taken before (pre-op) and four to six weeks after (post-op) surgery. The speech handicap index (SHI), a speech-specific questionnaire, was also administrated at these time points. Pre-op correlation analysis within the APeQoL revealed overall consistency and a strong correlation between FCR and SHI scores. FCRs also increased significantly with increasing T classification pre-operatively, especially for women. Longitudinally, the main effects of T classification, the extent of resection, and their interaction effects with time (pre-op vs. post-op) on FCRs were all significant. For pre-operative FCR, after merging the two datasets, a cut-off value of 0.970 produced an AUC of 0.861 (95% confidence interval: 0.785-0.938) for T3-4 patients. In sum, this study determined that FCR is an acoustic marker with the potential to detect disease and related speech function in oral tongue cancers. These are preliminary findings that need to be replicated in longitudinal studies and/or larger cohorts.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Chiu YF, Neel A, T Loux (2021)

Exploring the Acoustic Perceptual Relationship of Speech in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 64(5):1560-1570.

Purpose Auditory perceptual judgments are commonly used to diagnose dysarthria and assess treatment progress. The purpose of the study was to examine the acoustic underpinnings of perceptual speech abnormalities in individuals with Parkinson's disease (PD). Method Auditory perceptual judgments were obtained from sentences produced by 13 speakers with PD and five healthy older adults. Twenty young listeners rated overall ease of understanding, articulatory precision, voice quality, and prosodic adequacy on a visual analog scale. Acoustic measures associated with the speech subsystems of articulation, phonation, and prosody were obtained, including second formant transitions, articulation rate, cepstral and spectral measures of voice, and pitch variations. Regression analyses were performed to assess the relationships between perceptual judgments and acoustic variables. Results Perceptual impressions of Parkinsonian speech were related to combinations of several acoustic variables. Approximately 36%-49% of the variance in the perceptual ratings were explained by the acoustic measures indicating a modest acoustic perceptual relationship. Conclusions The relationships between perceptual ratings and acoustic signals in Parkinsonian speech are multifactorial and involve a variety of acoustic features simultaneously. The modest acoustic perceptual relationships, however, suggest that future work is needed to further examine the acoustic bases of perceptual judgments in dysarthria.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Parrell B, Ivry RB, Nagarajan SS, et al (2021)

Intact Correction for Self-Produced Vowel Formant Variability in Individuals With Cerebellar Ataxia Regardless of Auditory Feedback Availability.

Journal of speech, language, and hearing research : JSLHR, 64(6S):2234-2247.

Purpose Individuals with cerebellar ataxia (CA) caused by cerebellar degeneration exhibit larger reactive compensatory responses to unexpected auditory feedback perturbations than neurobiologically typical speakers, suggesting they may rely more on feedback control during speech. We test this hypothesis by examining variability in unaltered speech. Previous studies of typical speakers have demonstrated a reduction in formant variability (centering) observed during the initial phase of vowel production from vowel onset to vowel midpoint. Centering is hypothesized to reflect feedback-based corrections for self-produced variability and thus may provide a behavioral assay of feedback control in unperturbed speech in the same manner as the compensatory response does for feedback perturbations. Method To comprehensively compare centering in individuals with CA and controls, we examine centering in two vowels (/i/ and /ɛ/) under two contexts (isolated words and connected speech). As a control, we examine speech produced both with and without noise to mask auditory feedback. Results Individuals with CA do not show increased centering compared to age-matched controls, regardless of vowel, context, or masking. Contrary to previous results in neurobiologically typical speakers, centering was not affected by the presence of masking noise in either group. Conclusions The similar magnitude of centering seen with and without masking noise questions whether centering is driven by auditory feedback. However, if centering is at least partially driven by auditory/somatosensory feedback, these results indicate that the larger compensatory response to altered auditory feedback observed in individuals with CA may not reflect typical motor control processes during normal, unaltered speech production.

RevDate: 2021-04-17

Lã FMB, Silva LS, S Granqvist (2021)

Long-Term Average Spectrum Characteristics of Portuguese Fado-Canção from Coimbra.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00104-1 [Epub ahead of print].

Descriptions of acoustical characteristics of Fado, a Portuguese urban style sung in Lisbon and Oporto, are scarce, particularly concerning Fado-Canção, a related style sung in Coimbra. The present study aims at describing long-term average spectrum (LTAS) parameters of 16 professional singers while singing and reading the lyrics of a typical Fado-Canção. LTAS parameters were investigated in terms of: (1) equivalent sound level (Leq); (2) spectral differences between 3 frequency bands 0-2, 2-5, and 5-8 kHz; and (3) quantification of spectral prominence between 2 and 4 kHz, calculated as the level difference between the peak in this frequency region and a reference trendline between 1 and 5 kHz, henceforth Formant Cluster Prominence (FCP). Given that Fado-Canção, besides Fado and traditional styles, originated also from classical singing, and that previous studies on Fado suggest the absence of a singer's formant cluster, the averaged LTAS for all Fado-Canção singers was further compared to the LTAS of two world-touring opera baritones singing an operatic aria and a lied. Results show that Fado-Canção is commonly sung with a Leq of 86.4 dB and a FCP of about 10 dB, values significantly higher when compared to reading. The FCP in Fado-Canção, although smaller than for the two classical opera singers' examples (14.8 and 20 dB, respectively), suggests that the style preserved some of its original lyrical influence. However, because younger singers present higher energy in the 5-8 kHz region relative to the remaining frequency bands as compared to older singers, it seems that Fado-Canção may be drifting towards non-classical vocal practices. FCP seems to be a promising straightforward method to quantify the degree of formant clustering around the region of the singer's formant in LTAS, allowing comparisons between different singers and singing styles.

RevDate: 2021-08-18
CmpDate: 2021-08-18

Loni DY, S Subbaraman (2021)

Genetically related singers-acoustic feature analysis and impact on singer identification.

Journal of applied genetics, 62(3):459-467.

Studies relating music with genetics have been one of the fascinating fields of research. In this study, we have attempted to answer the most curious question-how acoustically close are the genetically related singers? The present study has investigated this perception using two genetically different relations-three female sibling singers and father-son singer relation. These are famous Indian playback singers and the acoustic features are extracted using the songs of Bollywood films. Three different sets of self-developed cappella database are used for the experimentation. Positive correlations among the major musical aptitudes-pitch, vibrato, formant, and harmonic spectral envelope for both the singer relationships-revealed the genetic impact on the acoustic features. Also, the investigation of timbre spectral feature proved it a significant acoustic feature that differentiates similar voices. With Spearman's correlation coefficient, we conclude that strong acoustical association was observed between the acoustic features of genetically related singers, especially the female sibling singers. This was further validated by correlating these singers with genetically unrelated singers. A human perception test performed using cover songs indicated the genetic impact in voice similarity, while the automatic singer identification system discriminated singers more accurately than the human listeners.

RevDate: 2021-04-10

Hsieh IH, WT Yeh (2021)

The Interaction Between Timescale and Pitch Contour at Pre-attentive Processing of Frequency-Modulated Sweeps.

Frontiers in psychology, 12:637289.

Speech comprehension across languages depends on encoding the pitch variations in frequency-modulated (FM) sweeps at different timescales and frequency ranges. While timescale and spectral contour of FM sweeps play important roles in differentiating acoustic speech units, relatively little work has been done to understand the interaction between the two acoustic dimensions at early cortical processing. An auditory oddball paradigm was employed to examine the interaction of timescale and pitch contour at pre-attentive processing of FM sweeps. Event-related potentials to frequency sweeps that vary in linguistically relevant pitch contour (fundamental frequency F0 vs. first formant frequency F1) and timescale (local vs. global) in Mandarin Chinese were recorded. Mismatch negativities (MMNs) were elicited by all types of sweep deviants. For local timescale, FM sweeps with F0 contours yielded larger MMN amplitudes than F1 contours. A reversed MMN amplitude pattern was obtained with respect to F0/F1 contours for global timescale stimuli. An interhemispheric asymmetry of MMN topography was observed corresponding to local and global-timescale contours. Falling but not rising frequency difference waveforms sweep contours elicited right hemispheric dominance. Results showed that timescale and pitch contour interacts with each other in pre-attentive auditory processing of FM sweeps. Findings suggest that FM sweeps, a type of non-speech signal, is processed at an early stage with reference to its linguistic function. That the dynamic interaction between timescale and spectral pattern is processed during early cortical processing of non-speech frequency sweep signal may be critical to facilitate speech encoding at a later stage.

RevDate: 2021-09-21

Wright E, Grawunder S, Ndayishimiye E, et al (2021)

Chest beats as an honest signal of body size in male mountain gorillas (Gorilla beringei beringei).

Scientific reports, 11(1):6879.

Acoustic signals that reliably indicate body size, which usually determines competitive ability, are of particular interest for understanding how animals assess rivals and choose mates. Whereas body size tends to be negatively associated with formant dispersion in animal vocalizations, non-vocal signals have received little attention. Among the most emblematic sounds in the animal kingdom is the chest beat of gorillas, a non-vocal signal that is thought to be important in intra and inter-sexual competition, yet it is unclear whether it reliably indicates body size. We examined the relationship among body size (back breadth), peak frequency, and three temporal characteristics of the chest beat: duration, number of beats and beat rate from sound recordings of wild adult male mountain gorillas. Using linear mixed models, we found that larger males had significantly lower peak frequencies than smaller ones, but we found no consistent relationship between body size and the temporal characteristics measured. Taken together with earlier findings of positive correlations among male body size, dominance rank and reproductive success, we conclude that the gorilla chest beat is an honest signal of competitive ability. These results emphasize the potential of non-vocal signals to convey important information in mammal communication.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Jekiel M, K Malarski (2021)

Musical Hearing and Musical Experience in Second Language English Vowel Acquisition.

Journal of speech, language, and hearing research : JSLHR, 64(5):1666-1682.

Purpose Former studies suggested that music perception can help produce certain accentual features in the first and second language (L2), such as intonational contours. What was missing in many of these studies was the identification of the exact relationship between specific music perception skills and the production of different accentual features in a foreign language. Our aim was to verify whether empirically tested musical hearing skills can be related to the acquisition of English vowels by learners of English as an L2 before and after a formal accent training course. Method Fifty adult Polish speakers of L2 English were tested before and after a two-semester accent training in order to observe the effect of musical hearing on the acquisition of English vowels. Their L2 English vowel formant contours produced in consonant-vowel-consonant context were compared with the target General British vowels produced by their pronunciation teachers. We juxtaposed these results with their musical hearing test scores and self-reported musical experience to observe a possible relationship between successful L2 vowel acquisition and musical aptitude. Results Preexisting rhythmic memory was reported as a significant predictor before training, while musical experience was reported as a significant factor in the production of more native-like L2 vowels after training. We also observed that not all vowels were equally acquired or affected by musical hearing or musical experience. The strongest estimate we observed was the closeness to model before training, suggesting that learners who already managed to acquire some features of a native-like accent were also more successful after training. Conclusions Our results are revealing in two aspects. First, the learners' former proficiency in L2 pronunciation is the most robust predictor in acquiring a native-like accent. Second, there is a potential relationship between rhythmic memory and L2 vowel acquisition before training, as well as years of musical experience after training, suggesting that specific musical skills and music practice can be an asset in learning a foreign language accent.

RevDate: 2021-06-01

Michell CT, T Nyman (2021)

Microbiomes of willow-galling sawflies: effects of host plant, gall type, and phylogeny on community structure and function.

Genome, 64(6):615-626.

While free-living herbivorous insects are thought to harbor microbial communities composed of transient bacteria derived from their diet, recent studies indicate that insects that induce galls on plants may be involved in more intimate host-microbe relationships. We used 16S rDNA metabarcoding to survey larval microbiomes of 20 nematine sawfly species that induce bud or leaf galls on 13 Salix species. The 391 amplicon sequence variants (ASVs) detected represented 69 bacterial genera in six phyla. Multi-variate statistical analyses showed that the structure of larval microbiomes is influenced by willow host species as well as by gall type. Nevertheless, a "core" microbiome composed of 58 ASVs is shared widely across the focal galler species. Within the core community, the presence of many abundant, related ASVs representing multiple distantly related bacterial taxa is reflected as a statistically significant effect of bacterial phylogeny on galler-microbe associations. Members of the core community have a variety of inferred functions, including degradation of phenolic compounds, nutrient supplementation, and production of plant hormones. Hence, our results support suggestions of intimate and diverse interactions between galling insects and microbes and add to a growing body of evidence that microbes may play a role in the induction of insect galls on plants.

RevDate: 2021-06-25
CmpDate: 2021-06-25

Zhang K, Sjerps MJ, G Peng (2021)

Integral perception, but separate processing: The perceptual normalization of lexical tones and vowels.

Neuropsychologia, 156:107839.

In tonal languages, speech variability arises in both lexical tone (i.e., suprasegmentally) and vowel quality (segmentally). Listeners can use surrounding speech context to overcome variability in both speech cues, a process known as extrinsic normalization. Although vowels are the main carriers of tones, it is still unknown whether the combined percept (lexical tone and vowel quality) is normalized integrally or in partly separate processes. Here we used electroencephalography (EEG) to investigate the time course of lexical tone normalization and vowel normalization to answer this question. Cantonese adults listened to synthesized three-syllable stimuli in which the identity of a target syllable - ambiguous between high vs. mid-tone (Tone condition) or between /o/ vs. /u/ (Vowel condition) - was dependent on either the tone range (Tone condition) or the formant range (Vowel condition) of the first two syllables. It was observed that the ambiguous tone was more often interpreted as a high-level tone when the context had a relatively low pitch than when it had a high pitch (Tone condition). Similarly, the ambiguous vowel was more often interpreted as /o/ when the context had a relatively low formant range than when it had a relatively high formant range (Vowel condition). These findings show the typical pattern of extrinsic tone and vowel normalization. Importantly, the EEG results of participants showing the contrastive normalization effect demonstrated that the effects of vowel normalization could already be observed within the N2 time window (190-350 ms), while the first reliable effect of lexical tone normalization on cortical processing was observable only from the P3 time window (220-500 ms) onwards. The ERP patterns demonstrate that the contrastive perceptual normalization of lexical tones and that of vowels occur at least in partially separate time windows. This suggests that the extrinsic normalization can operate at the level of phonemes and tonemes separately instead of operating on the whole syllable at once.

RevDate: 2021-08-27

Smith ML, MB Winn (2021)

Individual Variability in Recalibrating to Spectrally Shifted Speech: Implications for Cochlear Implants.

Ear and hearing, 42(5):1412-1427.

OBJECTIVES: Cochlear implant (CI) recipients are at a severe disadvantage compared with normal-hearing listeners in distinguishing consonants that differ by place of articulation because the key relevant spectral differences are degraded by the implant. One component of that degradation is the upward shifting of spectral energy that occurs with a shallow insertion depth of a CI. The present study aimed to systematically measure the effects of spectral shifting on word recognition and phoneme categorization by specifically controlling the amount of shifting and using stimuli whose identification specifically depends on perceiving frequency cues. We hypothesized that listeners would be biased toward perceiving phonemes that contain higher-frequency components because of the upward frequency shift and that intelligibility would decrease as spectral shifting increased.

DESIGN: Normal-hearing listeners (n = 15) heard sine wave-vocoded speech with simulated upward frequency shifts of 0, 2, 4, and 6 mm of cochlear space to simulate shallow CI insertion depth. Stimuli included monosyllabic words and /b/-/d/ and /∫/-/s/ continua that varied systematically by formant frequency transitions or frication noise spectral peaks, respectively. Recalibration to spectral shifting was operationally defined as shifting perceptual acoustic-phonetic mapping commensurate with the spectral shift. In other words, adjusting frequency expectations for both phonemes upward so that there is still a perceptual distinction, rather than hearing all upward-shifted phonemes as the higher-frequency member of the pair.

RESULTS: For moderate amounts of spectral shifting, group data suggested a general "halfway" recalibration to spectral shifting, but individual data suggested a notably different conclusion: half of the listeners were able to recalibrate fully, while the other halves of the listeners were utterly unable to categorize shifted speech with any reliability. There were no participants who demonstrated a pattern intermediate to these two extremes. Intelligibility of words decreased with greater amounts of spectral shifting, also showing loose clusters of better- and poorer-performing listeners. Phonetic analysis of word errors revealed certain cues were more susceptible to being compromised due to a frequency shift (place and manner of articulation), while voicing was robust to spectral shifting.

CONCLUSIONS: Shifting the frequency spectrum of speech has systematic effects that are in line with known properties of speech acoustics, but the ensuing difficulties cannot be predicted based on tonotopic mismatch alone. Difficulties are subject to substantial individual differences in the capacity to adjust acoustic-phonetic mapping. These results help to explain why speech recognition in CI listeners cannot be fully predicted by peripheral factors like electrode placement and spectral resolution; even among listeners with functionally equivalent auditory input, there is an additional factor of simply being able or unable to flexibly adjust acoustic-phonetic mapping. This individual variability could motivate precise treatment approaches guided by an individual's relative reliance on wideband frequency representation (even if it is mismatched) or limited frequency coverage whose tonotopy is preserved.

RevDate: 2021-08-12
CmpDate: 2021-08-12

Chen F, Zhang H, Ding H, et al (2021)

Neural coding of formant-exaggerated speech and nonspeech in children with and without autism spectrum disorders.

Autism research : official journal of the International Society for Autism Research, 14(7):1357-1374.

The presence of vowel exaggeration in infant-directed speech (IDS) may adapt to the age-appropriate demands in speech and language acquisition. Previous studies have provided behavioral evidence of atypical auditory processing towards IDS in children with autism spectrum disorders (ASD), while the underlying neurophysiological mechanisms remain unknown. This event-related potential (ERP) study investigated the neural coding of formant-exaggerated speech and nonspeech in 24 4- to 11-year-old children with ASD and 24 typically-developing (TD) peers. The EEG data were recorded using an alternating block design, in which each stimulus type (exaggerated/non-exaggerated sound) was presented with equal probability. ERP waveform analysis revealed an enhanced P1 for vowel formant exaggeration in the TD group but not in the ASD group. This speech-specific atypical processing in ASD was not found for the nonspeech stimuli which showed similar P1 enhancement in both ASD and TD groups. Moreover, the time-frequency analysis indicated that children with ASD showed differences in neural synchronization in the delta-theta bands for processing acoustic formant changes embedded in nonspeech. Collectively, the results add substantiating neurophysiological evidence (i.e., a lack of neural enhancement effect of vowel exaggeration) for atypical auditory processing of IDS in children with ASD, which may exert a negative effect on phonetic encoding and language learning. LAY SUMMARY: Atypical responses to motherese might act as a potential early marker of risk for children with ASD. This study investigated the neural responses to such socially relevant stimuli in the ASD brain, and the results suggested a lack of neural enhancement responding to the motherese even in individuals without intellectual disability.

RevDate: 2021-03-29

Oren L, Rollins M, Gutmark E, et al (2021)

How Face Masks Affect Acoustic and Auditory Perceptual Characteristics of the Singing Voice.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00091-6 [Epub ahead of print].

Wearing a face mask has been accepted as one of the most effective ways for slowing the spread of COVID-19. Yet information regarding the degree to which masks affect acoustics and perception associated with voice performers is scarce. This study examines these effects with common face masks, namely a neck gaiter, disposable surgical mask, and N95 mask, as well as a novel material that could be used as a mask (acoustic foam). A recorded excerpt from the "Star-Spangled Banner" was played through a miniature speaker placed inside the mouth of a masked manikin. Experienced listeners were asked to rate perceptual qualities of these singing stimuli by blindly comparing them with the same recording captured without a mask. Acoustic analysis showed that face masks affected the sound by enhancing or suppressing different frequency bands compared to no mask. Acoustic energy around the singer's formant was reduced when using surgical and N95 masks, which matches observations that these masks are more detrimental to the perceptions of singing voice compared with neck gaiter or acoustic foam. It suggests that singers can benefit from masks designed for minimal impact on auditory perception of the singing voice while maintaining reasonable efficacy of filtering efficiency.

RevDate: 2021-03-28

Havel M, Sundberg J, Traser L, et al (2021)

Effects of Nasalization on Vocal Tract Response Curve.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00065-5 [Epub ahead of print].

BACKGROUND: Earlier studies have shown that nasalization affects the radiated spectrum by modifying the vocal tract transfer function in a complex manner.

METHODS: Here we study this phenomenon by measuring sine-sweep response of 3-D models of the vowels /u, a, ᴂ, i/, derived from volumetric MR imaging, coupled by means of tubes of different lengths and diameters to a 3-D model of a nasal tract.

RESULTS: The coupling introduced a dip into the vocal tract transfer function. The dip frequency was close to the main resonance of the nasal tract, a result in agreement with the Fujimura & Lindqvist in vivo sweep tone measurements [Fujimura & Lindqvist, 1972]. With increasing size of the coupling tube the depth of the dip increased and the first formant peak either changed in frequency or was split by the dip. Only marginal effects were observed of the paranasal sinuses. For certain coupling tube sizes, the spectrum balance was changed, boosting the formant peaks in the 2 - 4 kHz range.

CONCLUSION: A velopharyngeal opening introduces a dip in the transfer function at the main resonance of the nasal tract. Its depth increases with the area of the opening and its frequency rises in some vowels.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Coughler C, Hamel EM, Cardy JO, et al (2021)

Compensation to Altered Auditory Feedback in Children With Developmental Language Disorder and Typical Development.

Journal of speech, language, and hearing research : JSLHR, 64(6S):2363-2376.

Purpose Developmental language disorder (DLD), an unexplained problem using and understanding spoken language, has been hypothesized to have an underlying auditory processing component. Auditory feedback plays a key role in speech motor control. The current study examined whether auditory feedback is used to regulate speech production in a similar way by children with DLD and their typically developing (TD) peers. Method Participants aged 6-11 years completed tasks measuring hearing, language, first formant (F1) discrimination thresholds, partial vowel space, and responses to altered auditory feedback with F1 perturbation. Results Children with DLD tended to compensate more than TD children for the positive F1 manipulation and compensated less than TD children in the negative shift condition. Conclusion Our findings suggest that children with DLD make atypical use of auditory feedback.

RevDate: 2021-09-17

Arenillas-Alcón S, Costa-Faidella J, Ribas-Prats T, et al (2021)

Neural encoding of voice pitch and formant structure at birth as revealed by frequency-following responses.

Scientific reports, 11(1):6660.

Detailed neural encoding of voice pitch and formant structure plays a crucial role in speech perception, and is of key importance for an appropriate acquisition of the phonetic repertoire in infants since birth. However, the extent to what newborns are capable of extracting pitch and formant structure information from the temporal envelope and the temporal fine structure of speech sounds, respectively, remains unclear. Here, we recorded the frequency-following response (FFR) elicited by a novel two-vowel, rising-pitch-ending stimulus to simultaneously characterize voice pitch and formant structure encoding accuracy in a sample of neonates and adults. Data revealed that newborns tracked changes in voice pitch reliably and no differently than adults, but exhibited weaker signatures of formant structure encoding, particularly at higher formant frequency ranges. Thus, our results indicate a well-developed encoding of voice pitch at birth, while formant structure representation is maturing in a frequency-dependent manner. Furthermore, we demonstrate the feasibility to assess voice pitch and formant structure encoding within clinical evaluation times in a hospital setting, and suggest the possibility to use this novel stimulus as a tool for longitudinal developmental studies of the auditory system.

RevDate: 2021-09-15
CmpDate: 2021-09-15

Emrani E, Ghaemi H, Labafchi A, et al (2021)

The Effect of Bimaxillary Orthognathic Surgery on Voice Characteristics in Skeletal Class 3 Deformity Patients: An Evaluation Using Acoustic Analysis.

The Journal of craniofacial surgery, 32(6):2129-2133.

ABSTRACT: The aim of this study was to analyze the effects of bimaxillary orthognathic surgery on the acoustic voice characteristics of skeletal class 3 patients. All healthy nonsyndromic patients with Class 3 deformity who were eligible for bimaxillary orthognathic surgery, were included in this before and after quasi-experimental study. This experiment's main intervention was mandibular setback surgery by bilateral sagittal split osteotomy plus maxillary advancement using LeFort 1 osteotomy. Age, sex, and intraoperative jaw movements were recorded. Acoustic analysis of voice samples (vowels /a/ and /i/) was performed with Praat software as outcome variables. The formant frequencies (F0, F1, F2, and F3) of these vowels were extracted 1 week preoperatively (T0), 1 and 6 months (T1, T2) postoperatively by a speech therapist. The significance level was set at 0.05 using SPSS 19. The study sample comprised 20 patients including 11 women (55%) and 9 men (45%) with a mean age of 31.95 ± 4.72 years. The average mandibular setback and maxillary advancement were 3.30 ± 0.86 and 2.85 ± 0.74 mm, respectively. The fundamental frequency (F0) and the first, second, and third formants (F1, F2, F3) of vowels /i/ and /a/ were significantly decreased over time intervals, postoperatively (P < 0.05). The finding revealed that bimaxillary orthognathic surgery (maxillary advancement and mandibular setback with bilateral sagittal split osteotomy) might reduce the acoustic formant parameters of voice to the normal frequency ranges, in patients with class 3 skeletal deformities. More clinical trials with greater sample sizes and long-term follow-ups are suggested in the future.

RevDate: 2021-05-24
CmpDate: 2021-05-24

König A, Riviere K, Linz N, et al (2021)

Measuring Stress in Health Professionals Over the Phone Using Automatic Speech Analysis During the COVID-19 Pandemic: Observational Pilot Study.

Journal of medical Internet research, 23(4):e24191.

BACKGROUND: During the COVID-19 pandemic, health professionals have been directly confronted with the suffering of patients and their families. By making them main actors in the management of this health crisis, they have been exposed to various psychosocial risks (stress, trauma, fatigue, etc). Paradoxically, stress-related symptoms are often underreported in this vulnerable population but are potentially detectable through passive monitoring of changes in speech behavior.

OBJECTIVE: This study aims to investigate the use of rapid and remote measures of stress levels in health professionals working during the COVID-19 outbreak. This was done through the analysis of participants' speech behavior during a short phone call conversation and, in particular, via positive, negative, and neutral storytelling tasks.

METHODS: Speech samples from 89 health care professionals were collected over the phone during positive, negative, and neutral storytelling tasks; various voice features were extracted and compared with classical stress measures via standard questionnaires. Additionally, a regression analysis was performed.

RESULTS: Certain speech characteristics correlated with stress levels in both genders; mainly, spectral (ie, formant) features, such as the mel-frequency cepstral coefficient, and prosodic characteristics, such as the fundamental frequency, appeared to be sensitive to stress. Overall, for both male and female participants, using vocal features from the positive tasks for regression yielded the most accurate prediction results of stress scores (mean absolute error 5.31).

CONCLUSIONS: Automatic speech analysis could help with early detection of subtle signs of stress in vulnerable populations over the phone. By combining the use of this technology with timely intervention strategies, it could contribute to the prevention of burnout and the development of comorbidities, such as depression or anxiety.

RevDate: 2021-03-20

Strycharczuk P, López-Ibáñez M, Brown G, et al (2020)

General Northern English. Exploring Regional Variation in the North of England With Machine Learning.

Frontiers in artificial intelligence, 3:48.

In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Niziolek CA, B Parrell (2021)

Responses to Auditory Feedback Manipulations in Speech May Be Affected by Previous Exposure to Auditory Errors.

Journal of speech, language, and hearing research : JSLHR, 64(6S):2169-2181.

Purpose Speakers use auditory feedback to guide their speech output, although individuals differ in the magnitude of their compensatory response to perceived errors in feedback. Little is known about the factors that contribute to the compensatory response or how fixed or flexible they are within an individual. Here, we test whether manipulating the perceived reliability of auditory feedback modulates speakers' compensation to auditory perturbations, as predicted by optimal models of sensorimotor control. Method Forty participants produced monosyllabic words in two separate sessions, which differed in the auditory feedback given during an initial exposure phase. In the veridical session exposure phase, feedback was normal. In the noisy session exposure phase, small, random formant perturbations were applied, reducing reliability of auditory feedback. In each session, a subsequent test phase introduced larger unpredictable formant perturbations. We assessed whether the magnitude of within-trial compensation for these larger perturbations differed across the two sessions. Results Compensatory responses to downward (though not upward) formant perturbations were larger in the veridical session than the noisy session. However, in post hoc testing, we found the magnitude of this effect is highly dependent on the choice of analysis procedures. Compensation magnitude was not predicted by other production measures, such as formant variability, and was not reliably correlated across sessions. Conclusions Our results, though mixed, provide tentative support that the feedback control system monitors the reliability of sensory feedback. These results must be interpreted cautiously given the potentially limited stability of auditory feedback compensation measures across analysis choices and across sessions. Supplemental Material https://doi.org/10.23641/asha.14167136.

RevDate: 2021-03-10

Riedinger M, Nagels A, Werth A, et al (2021)

Asymmetries in Accessing Vowel Representations Are Driven by Phonological and Acoustic Properties: Neural and Behavioral Evidence From Natural German Minimal Pairs.

Frontiers in human neuroscience, 15:612345.

In vowel discrimination, commonly found discrimination patterns are directional asymmetries where discrimination is faster (or easier) if differing vowels are presented in a certain sequence compared to the reversed sequence. Different models of speech sound processing try to account for these asymmetries based on either phonetic or phonological properties. In this study, we tested and compared two of those often-discussed models, namely the Featurally Underspecified Lexicon (FUL) model (Lahiri and Reetz, 2002) and the Natural Referent Vowel (NRV) framework (Polka and Bohn, 2011). While most studies presented isolated vowels, we investigated a large stimulus set of German vowels in a more naturalistic setting within minimal pairs. We conducted an mismatch negativity (MMN) study in a passive and a reaction time study in an active oddball paradigm. In both data sets, we found directional asymmetries that can be explained by either phonological or phonetic theories. While behaviorally, the vowel discrimination was based on phonological properties, both tested models failed to explain the found neural patterns comprehensively. Therefore, we additionally examined the influence of a variety of articulatory, acoustical, and lexical factors (e.g., formant structure, intensity, duration, and frequency of occurrence) but also the influence of factors beyond the well-known (perceived loudness of vowels, degree of openness) in depth via multiple regression analyses. The analyses revealed that the perceptual factor of perceived loudness has a greater impact than considered in the literature and should be taken stronger into consideration when analyzing preattentive natural vowel processing.

RevDate: 2021-07-07
CmpDate: 2021-06-30

Kim KS, L Max (2021)

Speech auditory-motor adaptation to formant-shifted feedback lacks an explicit component: Reduced adaptation in adults who stutter reflects limitations in implicit sensorimotor learning.

The European journal of neuroscience, 53(9):3093-3108.

The neural mechanisms underlying stuttering remain poorly understood. A large body of work has focused on sensorimotor integration difficulties in individuals who stutter, including recently the capacity for sensorimotor learning. Typically, sensorimotor learning is assessed with adaptation paradigms in which one or more sensory feedback modalities are experimentally perturbed in real time. Our own previous work on speech with perturbed auditory feedback revealed substantial auditory-motor learning limitations in both children and adults who stutter (AWS). It remains unknown, however, which subprocesses of sensorimotor learning are impaired. Indeed, new insights from research on upper limb motor control indicate that sensorimotor learning involves at least two distinct components: (a) an explicit component that includes intentional strategy use and presumably is driven by target error and (b) an implicit component that updates an internal model without awareness of the learner and presumably is driven by sensory prediction error. Here, we attempted to dissociate these components for speech auditory-motor learning in AWS versus adults who do not stutter (AWNS). Our formant-shift auditory-motor adaptation results replicated previous findings that such sensorimotor learning is limited in AWS. Novel findings are that neither control nor stuttering participants reported any awareness of changing their productions in response to the auditory perturbation and that neither group showed systematic drift in auditory target judgments made throughout the adaptation task. These results indicate that speech auditory-motor adaptation to formant-shifted feedback relies exclusively on implicit learning processes. Thus, limited adaptation in AWS reflects poor implicit sensorimotor learning. Speech auditory-motor adaptation to formant-shifted feedback lacks an explicit component: Reduced adaptation in adults who stutter reflects limitations in implicit sensorimotor learning.

RevDate: 2021-03-05

Stefanich S, J Cabrelli (2021)

The Effects of L1 English Constraints on the Acquisition of the L2 Spanish Alveopalatal Nasal.

Frontiers in psychology, 12:640354.

This study examines whether L1 English/L2 Spanish learners at different proficiency levels acquire a novel L2 phoneme, the Spanish palatal nasal /ɲ/. While alveolar /n/ is part of the Spanish and English inventories, /ɲ/, which consists of a tautosyllabic palatal nasal+glide element, is not. This crosslinguistic disparity presents potential difficulty for L1 English speakers due to L1 segmental and phonotactic constraints; the closest English approximation is the heterosyllabic sequence /nj/ (e.g., "canyon" /kænjn/ ['khæn.jn], cf. Spanish cañón "canyon" /kaɲon/ [ka.'ɲon]). With these crosslinguistic differences in mind, we ask: (1a) Do L1 English learners of L2 Spanish produce acoustically distinct Spanish /n/ and /ɲ/ and (1b) Does the distinction of /n/ and /ɲ/ vary by proficiency? In the case that learners distinguish /n/ and /ɲ/, the second question investigates the acoustic quality of /ɲ/ to determine (2a) if learners' L2 representation patterns with that of an L1 Spanish representation or if learners rely on an L1 representation (here, English /nj/) and (2b) if the acoustic quality of L2 Spanish /ɲ/ varies as a function of proficiency. Beginner (n = 9) and advanced (n = 8) L1 English/L2 Spanish speakers and a comparison group of 10 L1 Spanish/L2 English speakers completed delayed repetition tasks in which disyllabic nonce words were produced in a carrier phrase. English critical items contained an intervocalic heterosyllabic /nj/ sequence (e.g., ['phan.jə]); Spanish critical items consisted of items with either intervocalic onset /ɲ/ (e.g., ['xa.ɲa]) or /n/ ['xa.na]. We measured duration and formant contours of the following vocalic portion as acoustic indices of the /n/~/ɲ/ and /ɲ/ ~/nj/ distinctions. Results show that, while L2 Spanish learners produce an acoustically distinct /n/ ~ /ɲ/ contrast even at a low level of proficiency, the beginners produce an intermediate /ɲ/ that falls acoustically between their English /nj/ and the L1 Spanish /ɲ/ while the advanced learners' Spanish /ɲ/ and English /nj/ appear to be in the process of equivalence classification. We discuss these outcomes as they relate to the robustness of L1 phonological constraints in late L2 acquisition coupled with the role of perceptual cues, functional load, and questions of intelligibility.

RevDate: 2021-08-02
CmpDate: 2021-08-02

Tabas A, K von Kriegstein (2021)

Neural modelling of the encoding of fast frequency modulation.

PLoS computational biology, 17(3):e1008787.

Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Levy ES, Chang YM, Hwang K, et al (2021)

Perceptual and Acoustic Effects of Dual-Focus Speech Treatment in Children With Dysarthria.

Journal of speech, language, and hearing research : JSLHR, 64(6S):2301-2316.

Purpose Children with dysarthria secondary to cerebral palsy may experience reduced speech intelligibility and diminished communicative participation. However, minimal research has been conducted examining the outcomes of behavioral speech treatments in this population. This study examined the effect of Speech Intelligibility Treatment (SIT), a dual-focus speech treatment targeting increased articulatory excursion and vocal intensity, on intelligibility of narrative speech, speech acoustics, and communicative participation in children with dysarthria. Method American English-speaking children with dysarthria (n = 17) received SIT in a 3-week summer camplike setting at Columbia University. SIT follows motor-learning principles to train the child-friendly, dual-focus strategy, "Speak with your big mouth and strong voice." Children produced a story narrative at baseline, immediate posttreatment (POST), and at 6-week follow-up (FUP). Outcomes were examined via blinded listener ratings of ease of understanding (n = 108 adult listeners), acoustic analyses, and questionnaires focused on communicative participation. Results SIT resulted in significant increases in ease of understanding at POST, that were maintained at FUP. There were no significant changes to vocal intensity, speech rate, or vowel spectral characteristics, with the exception of an increase in second formant difference between vowels following SIT. Significantly enhanced communicative participation was evident at POST and FUP. Considerable variability in response to SIT was observed between children. Conclusions Dual-focus treatment shows promise for improving intelligibility and communicative participation in children with dysarthria, although responses to treatment vary considerably across children. Possible mechanisms underlying the intelligibility gains, enhanced communicative participation, and variability in treatment effects are discussed.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Howson PJ, MA Redford (2021)

The Acquisition of Articulatory Timing for Liquids: Evidence From Child and Adult Speech.

Journal of speech, language, and hearing research : JSLHR, 64(3):734-753.

Purpose Liquids are among the last sounds to be acquired by English-speaking children. The current study considers their acquisition from an articulatory timing perspective by investigating anticipatory posturing for /l/ versus /ɹ/ in child and adult speech. Method In Experiment 1, twelve 5-year-old, twelve 8-year-old, and 11 college-aged speakers produced carrier phrases with penultimate stress on monosyllabic words that had /l/, /ɹ/, or /d/ (control) as singleton onsets and /æ/ or /u/ as the vowel. Short-domain anticipatory effects were acoustically investigated based on schwa formant values extracted from the preceding determiner (= the) and dynamic formant values across the /ə#LV/ sequence. In Experiment 2, long-domain effects were perceptually indexed using a previously validated forward-gated audiovisual speech prediction task. Results Experiment 1 results indicated that all speakers distinguished /l/ from /ɹ/ along F3. Adults distinguished /l/ from /ɹ/ with a lower F2. Older children produced subtler versions of the adult pattern; their anticipatory posturing was also more influenced by the following vowel. Younger children did not distinguish /l/ from /ɹ/ along F2, but both liquids were distinguished from /d/ in the domains investigated. Experiment 2 results indicated that /ɹ/ was identified earlier than /l/ in gated adult speech; both liquids were identified equally early in 5-year-olds' speech. Conclusions The results are interpreted to suggest a pattern of early tongue-body retraction for liquids in /ə#LV/ sequences in children's speech. More generally, it is suggested that children must learn to inhibit the influence of vowels on liquid articulation to achieve an adultlike contrast between /l/ and /ɹ/ in running speech.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Raharjo I, Kothare H, Nagarajan SS, et al (2021)

Speech compensation responses and sensorimotor adaptation to formant feedback perturbations.

The Journal of the Acoustical Society of America, 149(2):1147.

Control of speech formants is important for the production of distinguishable speech sounds and is achieved with both feedback and learned feedforward control. However, it is unclear whether the learning of feedforward control involves the mechanisms of feedback control. Speakers have been shown to compensate for unpredictable transient mid-utterance perturbations of pitch and loudness feedback, demonstrating online feedback control of these speech features. To determine whether similar feedback control mechanisms exist in the production of formants, responses to unpredictable vowel formant feedback perturbations were examined. Results showed similar within-trial compensatory responses to formant perturbations that were presented at utterance onset and mid-utterance. The relationship between online feedback compensation to unpredictable formant perturbations and sensorimotor adaptation to consistent formant perturbations was further examined. Within-trial online compensation responses were not correlated with across-trial sensorimotor adaptation. A detailed analysis of within-trial time course dynamics across trials during sensorimotor adaptation revealed that across-trial sensorimotor adaptation responses did not result from an incorporation of within-trial compensation response. These findings suggest that online feedback compensation and sensorimotor adaptation are governed by distinct neural mechanisms. These findings have important implications for models of speech motor control in terms of how feedback and feedforward control mechanisms are implemented.

RevDate: 2021-07-05
CmpDate: 2021-07-05

Carignan C (2021)

A practical method of estimating the time-varying degree of vowel nasalization from acoustic features.

The Journal of the Acoustical Society of America, 149(2):911.

This paper presents a simple and easy-to-use method of creating a time-varying signal of the degree of nasalization in vowels, generated from acoustic features measured in oral and nasalized vowel contexts. The method is presented for separate models constructed using two sets of acoustic features: (1) an uninformed set of 13 Mel-frequency cepstral coefficients (MFCCs) and (2) a combination of the 13 MFCCs and a phonetically informed set of 20 acoustic features of vowel nasality derived from previous research. Both models are compared against two traditional approaches to estimating vowel nasalization from acoustics: A1-P0 and A1-P1, as well as their formant-compensated counterparts. Data include productions from six speakers of different language backgrounds, producing 11 different qualities within the vowel quadrilateral. The results generated from each of the methods are compared against nasometric measurements, representing an objective "ground truth" of the degree of nasalization. The results suggest that the proposed method is more robust than conventional acoustic approaches, generating signals which correlate strongly with nasometric measures across all vowel qualities and all speakers and accurately approximate the time-varying change in the degree of nasalization. Finally, an experimental example is provided to help researchers implement the method in their own study designs.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Chung H, G Weismer (2021)

Formant Trajectory Patterns of American English /l/ Produced by Adults and Children.

Journal of speech, language, and hearing research : JSLHR, 64(3):809-822.

Purpose Most acoustic and articulatory studies on /l/ have focused on either duration, formant frequencies, or tongue shape during the constriction interval. Only a limited set of data exists for the transition characteristics of /l/ to and from surrounding vowels. The aim of this study was to examine second formant (F2) transition characteristics of /l/ produced by young children and adults. This was to better understand articulatory behaviors in the production of /l/ and potential clinical applications of these data to typical and delayed /l/ development. Method Participants included 17 children with typically developing speech between the ages of 2 and 5 years, and 10 female adult speakers of Southern American English. Each subject produced single words containing pre- and postvocalic /l/ in two vowel contexts (/i, ɪ/ and /ɔ, ɑ/). F2 transitions, out of and into /l/ constriction intervals from the adjacent vowels, were analyzed for perceptually acceptable /l/ productions. The F2 transition extent, duration, and rate, as well as F2 loci data, were compared across age groups by vowel context for both pre- and postvocalic /l/. Results F2 transitions of adults' /l/ showed a great similarity across and within speakers. Those of young children showed greater variability, but became increasingly similar to those of adults with age. The F2 loci data seemed consistent with greater coarticulation among children than adults. This conclusion, however, must be regarded as preliminary due to the possible influence of different vocal tract size across ages and variability in the data. Conclusions The results suggest that adult patterns can serve as a reliable reference to which children's /l/ productions can be evaluated. The articulatory configurations associated with the /l/ constriction interval and the vocal tract movements into and out of that interval may provide insight into the underlying difficulties related to misarticulated /l/.

RevDate: 2021-05-06

Ng ML, HK Woo (2021)

Effect of total laryngectomy on vowel production: An acoustic study of vowels produced by alaryngeal speakers of Cantonese.

International journal of speech-language pathology [Epub ahead of print].

Purpose: To investigate the effect of total laryngectomy on vowel production, the present study examined the change in vowel articulation associated with different types of alaryngeal speech in comparison with laryngeal speech using novel derived formant metrics.Method: Six metrics derived from the first two formants (F1 and F2) including the First and Second Formant Range Ratios (F1RR and F2RR), triangular and pentagonal Vowel Space Area (tVSA and pVSA), Formant Centralisation Ratio (FCR) and Average Vowel Spacing (AVS) were measured from vowels (/i, y, ɛ, a, ɔ, œ, u/) produced by oesophageal (ES), tracheoesophageal (TE), electrolaryngeal (EL), pneumatic artificial laryngeal (PA) speakers, as well as laryngeal speakers.Result: Data revealed a general reduction in articulatory range and a tendency of vowel centralisation in Cantonese alaryngeal speakers. Significant articulatory difference was found for PA and EL compared with ES, TE, and laryngeal speakers.Conclusion: The discrepant results among alaryngeal speakers may be related to the difference in new sound source (external vs internal). Sensitivity and correlation analyses confirmed the use of the matrix of derived formant metrics provided a more comprehensive profile of the articulatory pattern in the alaryngeal population.

RevDate: 2021-02-22

Maryn Y, Wuyts FL, A Zarowski (2021)

Are Acoustic Markers of Voice and Speech Signals Affected by Nose-and-Mouth-Covering Respiratory Protective Masks?.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

BACKGROUND: Worldwide use of nose-and-mouth-covering respiratory protective mask (RPM) has become ubiquitous during COVID19 pandemic. Consequences of wearing RPMs, especially regarding perception and production of spoken communication, are gradually emerging. The present study explored how three prevalent RPMs affect various speech and voice sound properties.

METHODS: Pre-recorded sustained [a] vowels and read sentences from 47 subjects were played by a speech production model ('Voice Emitted by Spare Parts', or 'VESPA') in four conditions: without RPM (C1), with disposable surgical mask (C2), with FFP2 mask (C3), and with transparent plastic mask (C4). Differences between C1 and masked conditions were assessed with Dunnett's t test in 26 speech sound properties related to voice production (fundamental frequency, sound intensity level), voice quality (jitter percent, shimmer percent, harmonics-to-noise ratio, smoothed cepstral peak prominence, Acoustic Voice Quality Index), articulation and resonance (first and second formant frequencies, first and second formant bandwidths, spectral center of gravity, spectral standard deviation, spectral skewness, spectral kurtosis, spectral slope, and spectral energy in ten 1-kHz bands from 0 to 10 kHz).

RESULTS: C2, C3, and C4 significantly affected 10, 15, and 19 of the acoustic speech markers, respectively. Furthermore, absolute differences between unmasked and masked conditions were largest for C4 and smallest for C2.

CONCLUSIONS: All RPMs influenced more or less speech sound properties. However, this influence was least for surgical RPMs and most for plastic RPMs. Surgical RPMs are therefore preferred when spoken communication is priority next to respiratory protection.

RevDate: 2021-08-11
CmpDate: 2021-08-11

Cavalcanti JC, Eriksson A, PA Barbosa (2021)

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison.

PloS one, 16(2):e0246645.

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels' acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

RevDate: 2021-02-16

Lau HYC, RC Scherer (2021)

Objective Measures of Two Musical Interpretations of an Excerpt From Berlioz's "La mort d'Ophélie".

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00011-4 [Epub ahead of print].

OBJECTIVE/HYPOTHESIS: This study aimed to determine objective production differences relative to two emotional interpretations in performing an excerpt from a classical art song. The null hypothesis was proposed.

METHODS: The first author recorded an excerpt from an art song. The excerpt was sung with two contrasting musical interpretations: an "empathetic legato" approach, and a "sarcastic" approach characterized by emphatic attacks. Microphone, airflow, and electroglottography signals were digitized. The vowels were analyzed in terms of intensity, long term average spectra, fundamental frequency (fo), airflow vibrato rate and extent, vowel onset slope, intensity comparison of harmonic frequencies, and glottal measures based on electroglottograph waveforms. Four consonant tokens were analyzed relative to airflow, voice onset time, and production duration.

RESULTS & CONCLUSIONS: The emphatic performance had faster vowel onset, increased glottal adduction, increased intensity of harmonics in 2-3 kHz, increased intensity in the fourth and fifth formants, inferred subglottal pressure increase, increased airflow for /f/, and greater aspiration airflow for /p, t/. Vibrato extents for intensity, fo, and airflow were wider in the emphatic approach. Findings revealed larger EGGW25 and peak-to-peak amplitude values of the electroglottography waveform, suggesting greater vocal fold contact area and longer glottal closure for the emphatic approach. Long-term average spectrum analyses of the entire production displayed minor variation across all formant frequencies, suggesting an insignificant change in vocal tract shaping between the two approaches. This single-case objective study emphasizes the reality of physiological, aerodynamic, and acoustic production differences in the interpretive and pedagogical aspects of art song performance.

RevDate: 2021-07-28
CmpDate: 2021-07-28

Easwar V, Bridgwater E, D Purcell (2021)

The Influence of Vowel Identity, Vowel Production Variability, and Consonant Environment on Envelope Following Responses.

Ear and hearing, 42(3):662-672 pii:00003446-202105000-00017.

OBJECTIVES: The vowel-evoked envelope following response (EFR) is a useful tool for studying brainstem processing of speech in natural consonant-vowel productions. Previous work, however, demonstrates that the amplitude of EFRs is highly variable across vowels. To clarify factors contributing to the variability observed, the objectives of the present study were to evaluate: (1) the influence of vowel identity and the consonant context surrounding each vowel on EFR amplitude and (2) the effect of variations in repeated productions of a vowel on EFR amplitude while controlling for the consonant context.

DESIGN: In Experiment 1, EFRs were recorded in response to seven English vowels (/ij/, /Ι/, /ej/, /ε/, /æ/, /u/, and /JOURNAL/earher/04.03/00003446-202105000-00017/inline-graphic1/v/2021-04-30T105427Z/r/image-tiff/) embedded in each of four consonant contexts (/hVd/, /sVt/, /zVf/, and /JOURNAL/earher/04.03/00003446-202105000-00017/inline-graphic2/v/2021-04-30T105427Z/r/image-tiffVv/). In Experiment 2, EFRs were recorded in response to four different variants of one of the four possible vowels (/ij/, /ε/, /æ/, or /JOURNAL/earher/04.03/00003446-202105000-00017/inline-graphic3/v/2021-04-30T105427Z/r/image-tiff/), embedded in the same consonant-vowel-consonant environments used in Experiment 1. All vowels were edited to minimize formant transitions before embedding in a consonant context. Different talkers were used for the two experiments. Data from a total of 30 and 64 (16 listeners/vowel) young adults with normal hearing were included in Experiments 1 and 2, respectively. EFRs were recorded using a single-channel electrode montage between the vertex and nape of the neck while stimuli were presented monaurally.

RESULTS: In Experiment 1, vowel identity had a significant effect on EFR amplitude with the vowel /æ/ eliciting the highest amplitude EFRs (170 nV, on average), and the vowel /ej/ eliciting the lowest amplitude EFRs (106 nV, on average). The consonant context surrounding each vowel stimulus had no statistically significant effect on EFR amplitude. Similarly in Experiment 2, consonant context did not influence the amplitude of EFRs elicited by the vowel variants. Vowel identity significantly altered EFR amplitude with /ε/ eliciting the highest amplitude EFRs (104 nV, on average). Significant, albeit small, differences (<21 nV, on average) in EFR amplitude were evident between some variants of /ε/ and /u/.

CONCLUSION: Based on a comprehensive set of naturally produced vowel samples in carefully controlled consonant contexts, the present study provides additional evidence for the sensitivity of EFRs to vowel identity and variations in vowel production. The surrounding consonant context (after removal of formant transitions) has no measurable effect on EFRs, irrespective of vowel identity and variant. The sensitivity of EFRs to nuances in vowel acoustics emphasizes the need for adequate control and evaluation of stimuli proposed for clinical and research purposes.

RevDate: 2021-02-14

Hodges-Simeon CR, Grail GPO, Albert G, et al (2021)

Testosterone therapy masculinizes speech and gender presentation in transgender men.

Scientific reports, 11(1):3494.

Voice is one of the most noticeably dimorphic traits in humans and plays a central role in gender presentation. Transgender males seeking to align internal identity and external gender expression frequently undergo testosterone (T) therapy to masculinize their voices and other traits. We aimed to determine the importance of changes in vocal masculinity for transgender men and to determine the effectiveness of T therapy at masculinizing three speech parameters: fundamental frequency (i.e., pitch) mean and variation (fo and fo-SD) and estimated vocal tract length (VTL) derived from formant frequencies. Thirty transgender men aged 20 to 40 rated their satisfaction with traits prior to and after T therapy and contributed speech samples and salivary T. Similar-aged cisgender men and women contributed speech samples for comparison. We show that transmen viewed voice change as critical to transition success compared to other masculine traits. However, T therapy may not be sufficient to fully masculinize speech: while fo and fo-SD were largely indistinguishable from cismen, VTL was intermediate between cismen and ciswomen. fo was correlated with salivary T, and VTL associated with T therapy duration. This argues for additional approaches, such as behavior therapy and/or longer duration of hormone therapy, to improve speech transition.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Yang J, L Xu (2021)

Vowel Production in Prelingually Deafened Mandarin-Speaking Children With Cochlear Implants.

Journal of speech, language, and hearing research : JSLHR, 64(2):664-682.

Purpose The purpose of this study was to characterize the acoustic profile and to evaluate the intelligibility of vowel productions in prelingually deafened, Mandarin-speaking children with cochlear implants (CIs). Method Twenty-five children with CIs and 20 age-matched children with normal hearing (NH) were recorded producing a list of Mandarin disyllabic and trisyllabic words containing 20 Mandarin vowels [a, i, u, y, ɤ, ɿ, ʅ, ai, ei, ia, ie, ye, ua, uo, au, ou, iau, iou, uai, uei] located in the first consonant-vowel syllable. The children with CIs were all prelingually deafened and received unilateral implantation before 7 years of age with an average length of CI use of 4.54 years. In the acoustic analysis, the first two formants (F1 and F2) were extracted at seven equidistant time locations for the tested vowels. The durational and spectral features were compared between the CI and NH groups. In the vowel intelligibility task, the extracted vowel portions in both NH and CI children were presented to six Mandarin-speaking, NH adult listeners for identification. Results The acoustic analysis revealed that the children with CIs deviated from the NH controls in the acoustic features for both single vowels and compound vowels. The acoustic deviations were reflected in longer duration, more scattered vowel categories, smaller vowel space area, and distinct formant trajectories in the children with CIs in comparison to NH controls. The vowel intelligibility results showed that the recognition accuracy of the vowels produced by the children with CIs was significantly lower than that of the NH children. The confusion pattern of vowel recognition in the children with CIs generally followed that in the NH children. Conclusion Our data suggested that the prelingually deafened children with CIs, with a relatively long duration of CI experience, still showed measurable acoustic deviations and lower intelligibility in vowel productions in comparison to the NH children.

RevDate: 2021-03-25

Carl M, M Icht (2021)

Acoustic vowel analysis and speech intelligibility in young adult Hebrew speakers: Developmental dysarthria versus typical development.

International journal of language & communication disorders, 56(2):283-298.

BACKGROUND: Developmental dysarthria is a motor speech impairment commonly characterized by varying levels of reduced speech intelligibility. The relationship between intelligibility deficits and acoustic vowel space among these individuals has long been noted in the literature, with evidence of vowel centralization (e.g., in English and Mandarin). However, the degree to which this centralization occurs and the intelligibility-acoustic relationship is maintained in different vowel systems has yet to be studied thoroughly. In comparison with American English, the Hebrew vowel system is significantly smaller, with a potentially smaller vowel space area, a factor that may impact upon the comparisons of the acoustic vowel space and its correlation with speech intelligibility. Data on vowel space and speech intelligibility are particularly limited for Hebrew speakers with motor speech disorders.

AIMS: To determine the nature and degree of vowel space centralization in Hebrew-speaking adolescents and young adults with dysarthria, in comparison with typically developing (TD) peers, and to correlate these findings with speech intelligibility scores.

METHODS & PROCEDURES: Adolescents and young adults with developmental dysarthria (secondary to cerebral palsy (CP) and other motor deficits, n = 17) and their TD peers (n = 17) were recorded producing Hebrew corner vowels within single words. For intelligibility assessments, naïve listeners transcribed those words produced by speakers with CP, and intelligibility scores were calculated.

OUTCOMES & RESULTS: Acoustic analysis of vowel formants (F1, F2) revealed a centralization of vowel space among speakers with CP for all acoustic metrics of vowel formants, and mainly for the formant centralization ratio (FCR), in comparison with TD peers. Intelligibility scores were correlated strongly with the FCR metric for speakers with CP.

The main results, vowel space centralization for speakers with CP in comparison with TD peers, echo previous cross-linguistic results. The correlation of acoustic results with speech intelligibility carries clinical implications. Taken together, the results contribute to better characterization of the speech production deficit in Hebrew speakers with motor speech disorders. Furthermore, they may guide clinical decision-making and intervention planning to improve speech intelligibility. What this paper adds What is already known on the subject Speech production and intelligibility deficits among individuals with developmental dysarthria (e.g., secondary to CP) are well documented. These deficits have also been correlated with centralization of the acoustic vowel space, although primarily in English speakers. Little is known about the acoustic characteristics of vowels in Hebrew speakers with motor speech disorders, and whether correlations with speech intelligibility are maintained. What this paper adds to existing knowledge This study is the first to describe the acoustic characteristics of vowel space in Hebrew-speaking adolescents and young adults with developmental dysarthria. The results demonstrate a centralization of the acoustic vowel space in comparison with TD peers for all measures, as found in other languages. Correlation between acoustic measures and speech intelligibility scores were also documented. We discuss these results within the context of cross-linguistic comparisons. What are the potential or actual clinical implications of this work? The results confirm the use of objective acoustic measures in the assessment of individuals with motor speech disorders, providing such data for Hebrew-speaking adolescents and young adults. These measures can be used to determine the nature and severity of the speech deficit across languages, may guide intervention planning, as well as measure the effectiveness of intelligibility-based treatment programmes.

RevDate: 2021-05-26
CmpDate: 2021-05-26

Bakst S, CA Niziolek (2021)

Effects of syllable stress in adaptation to altered auditory feedback in vowels.

The Journal of the Acoustical Society of America, 149(1):708.

Unstressed syllables in English most commonly contain the vowel quality [ə] (schwa), which is cross-linguistically described as having a variable target. The present study examines whether speakers are sensitive to whether their auditory feedback matches their target when producing unstressed syllables. When speakers hear themselves producing formant-altered speech, they will change their motor plans so that their altered feedback is a better match to the target. If schwa has no target, then feedback mismatches in unstressed syllables may not drive a change in production. In this experiment, participants spoke disyllabic words with initial or final stress where the auditory feedback of F1 was raised (Experiment 1) or lowered (Experiment 2) by 100 mels. Both stressed and unstressed syllables showed adaptive changes in F1. In Experiment 1, initial-stress words showed larger adaptive decreases in F1 than final-stress words, but in Experiment 2, stressed syllables overall showed greater adaptive increases in F1 than unstressed syllables in all words, regardless of which syllable contained the primary stress. These results suggest that speakers are sensitive to feedback mismatches in both stressed and unstressed syllables, but that stress and metrical foot type may mediate the corrective response.

RevDate: 2021-01-26

Hakanpää T, Waaramaa T, AM Laukkanen (2021)

Training the Vocal Expression of Emotions in Singing: Effects of Including Acoustic Research-Based Elements in the Regular Singing Training of Acting Students.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00002-3 [Epub ahead of print].

OBJECTIVES: This study examines the effects of including acoustic research-based elements of the vocal expression of emotions in the singing lessons of acting students during a seven-week teaching period. This information may be useful in improving the training of interpretation in singing.

STUDY DESIGN: Experimental comparative study.

METHODS: Six acting students participated in seven weeks of extra training concerning voice quality in the expression of emotions in singing. Song samples were recorded before and after the training. A control group of six acting students were recorded twice within a seven-week period, during which they participated in ordinary training. All participants sang on the vowel [a:] and on a longer phrase expressing anger, sadness, joy, tenderness, and neutral states. The vowel and phrase samples were evaluated by 34 listeners for the perceived emotion. Additionally, the vowel samples were analyzed for formant frequencies (F1-F4), sound pressure level (SPL), spectral structure (Alpha ratio = SPL 1500-5000 Hz - SPL 50-1500 Hz), harmonic-to-noise ratio (HNR), and perturbation (jitter, shimmer).

RESULTS: The number of correctly perceived expressions improved in the test group's vowel samples, while no significant change was observed in the control group. The overall recognition was higher for the phrases than for the vowel samples. Of the acoustic parameters, F1 and SPL significantly differentiated emotions in both groups, and HNR specifically differentiated emotions in the test group. The Alpha ratio was found to statistically significantly differentiate emotion expression after training.

CONCLUSIONS: The expression of emotion in the singing voice improved after seven weeks of voice quality training. The F1, SPL, Alpha ratio, and HNR differentiated emotional expression. The variation in acoustic parameters became wider after training. Similar changes were not observed after seven weeks of ordinary voice training.

RevDate: 2021-04-21

Mendoza Ramos V, Paulyn C, Van den Steen L, et al (2021)

Effect of boost articulation therapy (BArT) on intelligibility in adults with dysarthria.

International journal of language & communication disorders, 56(2):271-282.

BACKGROUND: The articulatory accuracy of patients with dysarthria is one of the most affected speech dimensions with a high impact on speech intelligibility. Behavioural treatments of articulation can either involve direct or indirect approaches. The latter have been thoroughly investigated and are generally appreciated for their almost immediate effects on articulation and intelligibility. The number of studies on (short-term) direct articulation therapy is limited.

AIMS: To investigate the effects of short-term, boost articulation therapy (BArT) on speech intelligibility in patients with chronic or progressive dysarthria and the effect of severity of dysarthria on the outcome.

METHODS & PROCEDURES: The study consists of a two-group pre-/post-test design to assess speech intelligibility at phoneme and sentence level and during spontaneous speech, automatic speech and reading a phonetically balanced text. A total of 17 subjects with mild to severe dysarthria participated in the study and were randomly assigned to either a patient-tailored, intensive articulatory drill programme or an intensive minimal pair training. Both training programmes were based on the principles of motor learning. Each training programme consisted of five sessions of 45 min completed within one week.

OUTCOMES & RESULTS: Following treatment, a statistically significant increase of mean group intelligibility was shown at phoneme and sentence level, and in automatic sequences. This was supported by an acoustic analysis that revealed a reduction in formant centralization ratio. Within specific groups of severity, large and moderate positive effect sizes with Cohen's d were demonstrated.

BArT successfully improves speech intelligibility in patients with chronic or progressive dysarthria at different levels of the impairment. What this paper adds What is already known on the subject Behavioural treatment of articulation in patients with dysarthria mainly involves indirect strategies, which have shown positive effects on speech intelligibility. However, there is limited evidence on the short-term effects of direct articulation therapy at the segmental level of speech. This study investigates the effectiveness of BArT on speech intelligibility in patients with chronic or progressive dysarthria at all severity levels. What this paper adds to existing knowledge The intensive and direct articulatory therapy programmes developed and applied in this study intend to reduce the impairment instead of compensating it. This approach results in a significant improvement of speech intelligibility at different dysarthria severity levels in a short period of time while contributing to exploit and develop all available residual motor skills in persons with dysarthria. What are the potential or actual clinical implications of this work? The improvements in intelligibility demonstrate the effectiveness of a BArT at the segmental level of speech. This makes it to be considered a suitable approach in the treatment of patients with chronic or progressive dysarthria.

RevDate: 2021-08-17
CmpDate: 2021-08-17

Aung T, Goetz S, Adams J, et al (2021)

Low fundamental and formant frequencies predict fighting ability among male mixed martial arts fighters.

Scientific reports, 11(1):905.

Human voice pitch is highly sexually dimorphic and eminently quantifiable, making it an ideal phenotype for studying the influence of sexual selection. In both traditional and industrial populations, lower pitch in men predicts mating success, reproductive success, and social status and shapes social perceptions, especially those related to physical formidability. Due to practical and ethical constraints however, scant evidence tests the central question of whether male voice pitch and other acoustic measures indicate actual fighting ability in humans. To address this, we examined pitch, pitch variability, and formant position of 475 mixed martial arts (MMA) fighters from an elite fighting league, with each fighter's acoustic measures assessed from multiple voice recordings extracted from audio or video interviews available online (YouTube, Google Video, podcasts), totaling 1312 voice recording samples. In four regression models each predicting a separate measure of fighting ability (win percentages, number of fights, Elo ratings, and retirement status), no acoustic measure significantly predicted fighting ability above and beyond covariates. However, after fight statistics, fight history, height, weight, and age were used to extract underlying dimensions of fighting ability via factor analysis, pitch and formant position negatively predicted "Fighting Experience" and "Size" factor scores in a multivariate regression model, explaining 3-8% of the variance. Our findings suggest that lower male pitch and formants may be valid cues of some components of fighting ability in men.

RevDate: 2021-09-14

Bodaghi D, Jiang W, Xue Q, et al (2021)

Effect of Supraglottal Acoustics on Fluid-Structure Interaction During Human Voice Production.

Journal of biomechanical engineering, 143(4):.

A hydrodynamic/acoustic splitting method was used to examine the effect of supraglottal acoustics on fluid-structure interactions during human voice production in a two-dimensional computational model. The accuracy of the method in simulating compressible flows in typical human airway conditions was verified by comparing it to full compressible flow simulations. The method was coupled with a three-mass model of vocal fold lateral motion to simulate fluid-structure interactions during human voice production. By separating the acoustic perturbation components of the airflow, the method allows isolation of the role of supraglottal acoustics in fluid-structure interactions. The results showed that an acoustic resonance between a higher harmonic of the sound source and the first formant of the supraglottal tract occurred during normal human phonation when the fundamental frequency was much lower than the formants. The resonance resulted in acoustic pressure perturbation at the glottis which was of the same order as the incompressible flow pressure and found to affect vocal fold vibrations and glottal flow rate waveform. Specifically, the acoustic perturbation delayed the opening of the glottis, reduced the vertical phase difference of vocal fold vibrations, decreased flow rate and maximum flow deceleration rate (MFDR) at the glottal exit; yet, they had little effect on glottal opening. The results imply that the sound generation in the glottis and acoustic resonance in the supraglottal tract are coupled processes during human voice production and computer modeling of vocal fold vibrations needs to include supraglottal acoustics for accurate predictions.

RevDate: 2021-01-11

Feng M, DM Howard (2021)

The Dynamic Effect of the Valleculae on Singing Voice - An Exploratory Study Using 3D Printed Vocal Tracts.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30459-8 [Epub ahead of print].

BACKGROUND AND OBJECTIVES: The valleculae can be seen as a pair of side branches of the human vocal tract like the piriform fossae. While the acoustic properties of the piriform fossae have been explored in detail, there is little evidence of full exploration of the acoustic properties of the valleculae. A recent investigation (Vampola, Horáček, & Švec, 2015), using a finite element model of a single vowel /a/, suggests that the valleculae created two antiresonances and two resonances in the high frequency region (above 4kHz) along with those produced by the piriform sinuses. In the current study, we investigate, in multiple vowels, the acoustic influences of the valleculae in singing voice, using 3-D printed vocal tracts.

METHOD: MRI data were collected from an operatic tenor singing English vowels /a/, /u/, /i/. The images of each vowel were segmented and edited to create a pair of tracts, where one is the original and one had the valleculae digitally removed.The printed tracts were then placed atop a vocal tract organ loudspeaker, excited by white noise. Recordings were made with a microphone placed in front of the mouths of the tracts, to measure their frequency responses.

RESULTS: Dimensional changes were observed in valleculae of different vowels, with the long-term average spectra of the recordings illustrating clear differences between the frequency responses of the va-nova (valleculae - no valleculae) pairs, which varies with vowels.

CONCLUSION: The experiment demonstrates the dynamic1 nature of the shapes of the valleculae in the human vocal tract and its acoustic consequences. It provides evidence that the valleculae have similar acoustic properties to the piriform fossae but with larger variations, and in some cases can influence acoustically the frequency region below 4kHz. The results suggest that large volume valleculae have the potential to impede to some extent the acoustic effect of the singers formant cluster and small valleculae may do the reverse. Since the volume of the valleculae is observed to be largely dependent on tongue movement and also with changes to the uttered vowel, it can be assumed that the high frequency energy, including that within the singer's formant region, could be vowel dependent. Strategies to control valleculae volumes are likely to be highly relevant to voice pedagogy practice as well as singing performance.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Lovcevic I, Kalashnikova M, D Burnham (2020)

Acoustic features of infant-directed speech to infants with hearing loss.

The Journal of the Acoustical Society of America, 148(6):3399.

This study investigated the effects of hearing loss and hearing experience on the acoustic features of infant-directed speech (IDS) to infants with hearing loss (HL) compared to controls with normal hearing (NH) matched by either chronological or hearing age (experiment 1) and across development in infants with hearing loss as well as the relation between IDS features and infants' developing lexical abilities (experiment 2). Both experiments included detailed acoustic analyses of mothers' productions of the three corner vowels /a, i, u/ and utterance-level pitch in IDS and in adult-directed speech. Experiment 1 demonstrated that IDS to infants with HL was acoustically more variable than IDS to hearing-age matched infants with NH. Experiment 2 yielded no changes in IDS features over development; however, the results did show a positive relationship between formant distances in mothers' speech and infants' concurrent receptive vocabulary size, as well as between vowel hyperarticulation and infants' expressive vocabulary. These findings suggest that despite infants' HL and thus diminished access to speech input, infants with HL are exposed to IDS with generally similar acoustic qualities as are infants with NH. However, some differences persist, indicating that infants with HL might receive less intelligible speech.

RevDate: 2021-03-15
CmpDate: 2021-03-15

Nault DR, KG Munhall (2020)

Individual variability in auditory feedback processing: Responses to real-time formant perturbations and their relation to perceptual acuity.

The Journal of the Acoustical Society of America, 148(6):3709.

In this study, both between-subject and within-subject variability in speech perception and speech production were examined in the same set of speakers. Perceptual acuity was determined using an ABX auditory discrimination task, whereby speakers made judgments between pairs of syllables on a /ɛ/ to /æ/ acoustic continuum. Auditory feedback perturbations of the first two formants were implemented in a production task to obtain measures of compensation, normal speech production variability, and vowel spacing. Speakers repeated the word "head" 120 times under varying feedback conditions, with the final Hold phase involving the strongest perturbations of +240 Hz in F1 and -300 Hz in F2. Multiple regression analyses were conducted to determine whether individual differences in compensatory behavior in the Hold phase could be predicted by perceptual acuity, speech production variability, and vowel spacing. Perceptual acuity significantly predicted formant changes in F1, but not in F2. These results are discussed in consideration of the importance of using larger sample sizes in the field and developing new methods to explore feedback processing at the individual participant level. The potential positive role of variability in speech motor control is also considered.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Kothare H, Raharjo I, Ramanarayanan V, et al (2020)

Sensorimotor adaptation of speech depends on the direction of auditory feedback alteration.

The Journal of the Acoustical Society of America, 148(6):3682.

A hallmark feature of speech motor control is its ability to learn to anticipate and compensate for persistent feedback alterations, a process referred to as sensorimotor adaptation. Because this process involves adjusting articulation to counter the perceived effects of altering acoustic feedback, there are a number of factors that affect it, including the complex relationship between acoustics and articulation and non-uniformities of speech perception. As a consequence, sensorimotor adaptation is hypothesised to vary as a function of the direction of the applied auditory feedback alteration in vowel formant space. This hypothesis was tested in two experiments where auditory feedback was altered in real time, shifting the frequency values of the first and second formants (F1 and F2) of participants' speech. Shifts were designed on a subject-by-subject basis and sensorimotor adaptation was quantified with respect to the direction of applied shift, normalised for individual speakers. Adaptation was indeed found to depend on the direction of the applied shift in vowel formant space, independent of shift magnitude. These findings have implications for models of sensorimotor adaptation of speech.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Houle N, SV Levi (2020)

Acoustic differences between voiced and whispered speech in gender diverse speakers.

The Journal of the Acoustical Society of America, 148(6):4002.

Whispered speech is a naturally produced mode of communication that lacks a fundamental frequency. Several other acoustic differences exist between whispered and voiced speech, such as speaking rate (measured as segment duration) and formant frequencies. Previous research has shown that listeners are less accurate at identifying linguistic information (e.g., identifying a speech sound) and speaker information (e.g., reporting speaker gender) from whispered speech. To further explore differences between voiced and whispered speech, acoustic differences were examined across three datasets (hVd, sVd, and ʃVd) and three speaker groups (ciswomen, transwomen, cismen). Consistent with previous studies, vowel duration was generally longer in whispered speech and formant frequencies were shifted higher, although the magnitude of these differences depended on vowel and gender. Despite the increase in duration, the acoustic vowel space area (measured either with a vowel quadrilateral or with a convex hull) was smaller in the whispered speech, suggesting that larger vowel space areas are not an automatic consequence of a lengthened articulation. Overall, these findings are consistent with previous literature showing acoustic differences between voiced and whispered speech beyond the articulatory change of eliminating fundamental frequency.

RevDate: 2021-07-28
CmpDate: 2021-07-28

Ananthakrishnan S, Grinstead L, D Yurjevich (2020)

Human Frequency Following Responses to Filtered Speech.

Ear and hearing, 42(1):87-105 pii:00003446-202101000-00009.

OBJECTIVES: There is increasing interest in using the frequency following response (FFR) to describe the effects of varying different aspects of hearing aid signal processing on brainstem neural representation of speech. To this end, recent studies have examined the effects of filtering on brainstem neural representation of the speech fundamental frequency (f0) in listeners with normal hearing sensitivity by measuring FFRs to low- and high-pass filtered signals. However, the stimuli used in these studies do not reflect the entire range of typical cutoff frequencies used in frequency-specific gain adjustments during hearing aid fitting. Further, there has been limited discussion on the effect of filtering on brainstem neural representation of formant-related harmonics. Here, the effects of filtering on brainstem neural representation of speech fundamental frequency (f0) and harmonics related to first formant frequency (F1) were assessed by recording envelope and spectral FFRs to a vowel low-, high-, and band-pass filtered at cutoff frequencies ranging from 0.125 to 8 kHz.

DESIGN: FFRs were measured to a synthetically generated vowel stimulus /u/ presented in a full bandwidth and low-pass (experiment 1), high-pass (experiment 2), and band-pass (experiment 3) filtered conditions. In experiment 1, FFRs were measured to a synthetically generated vowel stimulus /u/ presented in a full bandwidth condition as well as 11 low-pass filtered conditions (low-pass cutoff frequencies: 0.125, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, and 8 kHz) in 19 adult listeners with normal hearing sensitivity. In experiment 2, FFRs were measured to the same synthetically generated vowel stimulus /u/ presented in a full bandwidth condition as well as 10 high-pass filtered conditions (high-pass cutoff frequencies: 0.125, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, and 6 kHz) in 7 adult listeners with normal hearing sensitivity. In experiment 3, in addition to the full bandwidth condition, FFRs were measured to vowel /u/ low-pass filtered at 2 kHz, band-pass filtered between 2-4 kHz and 4-6 kHz in 10 adult listeners with normal hearing sensitivity. A Fast Fourier Transform analysis was conducted to measure the strength of f0 and the F1-related harmonic relative to the noise floor in the brainstem neural responses obtained to the full bandwidth and filtered stimulus conditions.

RESULTS: Brainstem neural representation of f0 was reduced when the low-pass filter cutoff frequency was between 0.25 and 0.5 kHz; no differences in f0 strength were noted between conditions when the low-pass filter cutoff condition was at or greater than 0.75 kHz. While envelope FFR f0 strength was reduced when the stimulus was high-pass filtered at 6 kHz, there was no effect of high-pass filtering on brainstem neural representation of f0 when the high-pass filter cutoff frequency ranged from 0.125 to 4 kHz. There was a weakly significant global effect of band-pass filtering on brainstem neural phase-locking to f0. A trends analysis indicated that mean f0 magnitude in the brainstem neural response was greater when the stimulus was band-pass filtered between 2 and 4 kHz as compared to when the stimulus was band-pass filtered between 4 and 6 kHz, low-pass filtered at 2 kHz or presented in the full bandwidth condition. Last, neural phase-locking to f0 was reduced or absent in envelope FFRs measured to filtered stimuli that lacked spectral energy above 0.125 kHz or below 6 kHz. Similarly, little to no energy was seen at F1 in spectral FFRs obtained to low-, high-, or band-pass filtered stimuli that did not contain energy in the F1 region. For stimulus conditions that contained energy at F1, the strength of the peak at F1 in the spectral FFR varied little with low-, high-, or band-pass filtering.

CONCLUSIONS: Energy at f0 in envelope FFRs may arise due to neural phase-locking to low-, mid-, or high-frequency stimulus components, provided the stimulus envelope is modulated by at least two interacting harmonics. Stronger neural responses at f0 are measured when filtering results in stimulus bandwidths that preserve stimulus energy at F1 and F2. In addition, results suggest that unresolved harmonics may favorably influence f0 strength in the neural response. Lastly, brainstem neural representation of the F1-related harmonic measured in spectral FFRs obtained to filtered stimuli is related to the presence or absence of stimulus energy at F1. These findings add to the existing literature exploring the viability of the FFR as an objective technique to evaluate hearing aid fitting where stimulus bandwidth is altered by design due to frequency-specific gain applied by amplification algorithms.

RevDate: 2021-03-23

Parrell B, CA Niziolek (2021)

Increased speech contrast induced by sensorimotor adaptation to a nonuniform auditory perturbation.

Journal of neurophysiology, 125(2):638-647.

When auditory feedback is perturbed in a consistent way, speakers learn to adjust their speech to compensate, a process known as sensorimotor adaptation. Although this paradigm has been highly informative for our understanding of the role of sensory feedback in speech motor control, its ability to induce behaviorally relevant changes in speech that affect communication effectiveness remains unclear. Because reduced vowel contrast contributes to intelligibility deficits in many neurogenic speech disorders, we examine human speakers' ability to adapt to a nonuniform perturbation field that was designed to affect vowel distinctiveness, applying a shift that depended on the vowel being produced. Twenty-five participants were exposed to this "vowel centralization" feedback perturbation in which the first two formant frequencies were shifted toward the center of each participant's vowel space, making vowels less distinct from one another. Speakers adapted to this nonuniform shift, learning to produce corner vowels with increased vowel space area and vowel contrast to partially overcome the perceived centralization. The increase in vowel contrast occurred without a concomitant increase in duration and persisted after the feedback shift was removed, including after a 10-min silent period. These findings establish the validity of a sensorimotor adaptation paradigm to increase vowel contrast, showing that complex, nonuniform alterations to sensory feedback can successfully drive changes relevant to intelligible communication.NEW & NOTEWORTHY To date, the speech motor learning evoked in sensorimotor adaptation studies has had little ecological consequences for communication. By inducing complex, nonuniform acoustic errors, we show that adaptation can be leveraged to cause an increase in speech sound contrast, a change that has the capacity to improve intelligibility. This study is relevant for models of sensorimotor integration across motor domains, showing that complex alterations to sensory feedback can successfully drive changes relevant to ecological behavior.

RevDate: 2021-01-15

Pisanski K, P Sorokowski (2021)

Human Stress Detection: Cortisol Levels in Stressed Speakers Predict Voice-Based Judgments of Stress.

Perception, 50(1):80-87.

Despite recent evidence of a positive relationship between cortisol levels and voice pitch in stressed speakers, the extent to which human listeners can reliably judge stress from the voice remains unknown. Here, we tested whether voice-based judgments of stress co-vary with the free cortisol levels and vocal parameters of speakers recorded in a real-life stressful situation (oral examination) and baseline (2 weeks prior). Hormone and acoustic analyses indicated elevated salivary cortisol levels and corresponding changes in voice pitch, vocal tract resonances (formants), and speed of speech during stress. In turn, listeners' stress ratings correlated significantly with speakers' cortisol levels. Higher pitched voices were consistently perceived as more stressed; however, the influence of formant frequencies, vocal perturbation and noise parameters on stress ratings varied across contexts, suggesting that listeners utilize different strategies when assessing calm versus stressed speech. These results indicate that nonverbal vocal cues can convey honest information about a speaker's underlying physiological level of stress that listeners can, to some extent, detect and utilize, while underscoring the necessity to control for individual differences in the biological stress response.

RevDate: 2020-12-09

Albuquerque L, Oliveira C, Teixeira A, et al (2020)

A Comprehensive Analysis of Age and Gender Effects in European Portuguese Oral Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30412-4 [Epub ahead of print].

The knowledge about the age effects in speech acoustics is still disperse and incomplete. This study extends the analyses of the effects of age and gender on acoustics of European Portuguese (EP) oral vowels, in order to complement initial studies with limited sets of acoustic parameters, and to further investigate unclear or inconsistent results. A database of EP vowels produced by a group of 113 adults, aged between 35 and 97, was used. Duration, fundamental frequency (f0), formant frequencies (F1 to F3), and a selection of vowel space metrics (F1 and F2 range ratios, vowel articulation index [VAI] and formant centralization ratio [FCR]) were analyzed. To avoid the arguable division into age groups, the analyses considered age as a continuous variable. The most relevant age-related results included: vowel duration increase in both genders; a general tendency to formant frequencies decrease for females; changes that were consistent with vowel centralization for males, confirmed by the vowel space acoustic indexes; and no evidence of F3 decrease with age, in both genders. This study has contributed to knowledge on aging speech, providing new information for an additional language. The results corroborated that acoustic characteristics of speech change with age and present different patterns between genders.

RevDate: 2020-12-10

Van Soom M, B de Boer (2020)

Detrending the Waveforms of Steady-State Vowels.

Entropy (Basel, Switzerland), 22(3):.

Steady-state vowels are vowels that are uttered with a momentarily fixed vocal tract configuration and with steady vibration of the vocal folds. In this steady-state, the vowel waveform appears as a quasi-periodic string of elementary units called pitch periods. Humans perceive this quasi-periodic regularity as a definite pitch. Likewise, so-called pitch-synchronous methods exploit this regularity by using the duration of the pitch periods as a natural time scale for their analysis. In this work, we present a simple pitch-synchronous method using a Bayesian approach for estimating formants that slightly generalizes the basic approach of modeling the pitch periods as a superposition of decaying sinusoids, one for each vowel formant, by explicitly taking into account the additional low-frequency content in the waveform which arises not from formants but rather from the glottal pulse. We model this low-frequency content in the time domain as a polynomial trend function that is added to the decaying sinusoids. The problem then reduces to a rather familiar one in macroeconomics: estimate the cycles (our decaying sinusoids) independently from the trend (our polynomial trend function); in other words, detrend the waveform of steady-state waveforms. We show how to do this efficiently.

RevDate: 2021-08-03
CmpDate: 2021-03-29

Schild C, Aung T, Kordsmeyer TL, et al (2020)

Linking human male vocal parameters to perceptions, body morphology, strength and hormonal profiles in contexts of sexual selection.

Scientific reports, 10(1):21296.

Sexual selection appears to have shaped the acoustic signals of diverse species, including humans. Deep, resonant vocalizations in particular may function in attracting mates and/or intimidating same-sex competitors. Evidence for these adaptive functions in human males derives predominantly from perception studies in which vocal acoustic parameters were manipulated using specialist software. This approach affords tight experimental control but provides little ecological validity, especially when the target acoustic parameters vary naturally with other parameters. Furthermore, such experimental studies provide no information about what acoustic variables indicate about the speaker-that is, why attention to vocal cues may be favored in intrasexual and intersexual contexts. Using voice recordings with high ecological validity from 160 male speakers and biomarkers of condition, including baseline cortisol and testosterone levels, body morphology and strength, we tested a series of pre-registered hypotheses relating to both perceptions and underlying condition of the speaker. We found negative curvilinear and negative linear relationships between male fundamental frequency (fo) and female perceptions of attractiveness and male perceptions of dominance. In addition, cortisol and testosterone negatively interacted in predicting fo, and strength and measures of body size negatively predicted formant frequencies (Pf). Meta-analyses of the present results and those from two previous samples confirmed that fonegatively predicted testosterone only among men with lower cortisol levels. This research offers empirical evidence of possible evolutionary functions for attention to men's vocal characteristics in contexts of sexual selection.

RevDate: 2020-12-03

Leung Y, Oates J, Papp V, et al (2020)

Formant Frequencies of Adult Speakers of Australian English and Effects of Sex, Age, Geographical Location, and Vowel Quality.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30370-2 [Epub ahead of print].

AIMS: The primary aim of this study was to provide normative formant frequency (F) values for male and female speakers of Australian English. The secondary aim was to examine the effects of speaker sex, age, vowel quality, and geographical location on F.

METHOD: The first three monophthong formant frequencies (F1, F2, and F3) for 244 female and 135 male speakers aged 18-60 years from a recent large-scale corpus of Australian English were analysed on a passage reading task.

RESULTS: Mixed effects linear regression models suggested that speaker sex, speaker age, and vowel quality significantly predicted F1, F2, and F3 (P = 0.000). Effect sizes suggested that speaker sex and vowel quality contributed most to the variations in F1, F2, and F3 whereas speaker age and geographical location contributed a smaller amount.

CONCLUSION: Both clinicians and researchers are provided with normative F data for 18-60 year-old speakers of Australian English. Such data have increased internal and external validity relative to previous literature. F normative data for speakers of Australian English should be considered with reference to speaker sex and vowel but it may not be practically necessary to adjust for speaker age and geographical location.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Tabain M, Kochetov A, R Beare (2020)

An ultrasound and formant study of manner contrasts at four coronal places of articulation.

The Journal of the Acoustical Society of America, 148(5):3195.

This study examines consonant manner of articulation at four coronal places of articulation, using ultrasound and formant analyses of the Australian language Arrernte. Stop, nasal, and lateral articulations are examined at the dental, alveolar, retroflex, and alveo-palatal places of articulation: /t̪ n̪ l̪ / vs /t n l/ vs /ʈɳɭ/ vs /c ɲ ʎ/. Ultrasound data clearly show a more retracted tongue root for the lateral, and a more advanced tongue root for the nasal, as compared to the stop. However, the magnitude of the differences is much greater for the stop∼lateral contrast than for the stop∼nasal contrast. Acoustic results show clear effects on F1 in the adjacent vowels, in particular the preceding vowel, with F1 lower adjacent to nasals and higher adjacent to laterals, as compared to stops. Correlations between the articulatory and acoustic data are particularly strong for this formant. However, the retroflex place of articulation shows effects according to manner for higher formants as well, suggesting that a better understanding of retroflex acoustics for different manners of articulation is required. The study also suggests that articulatory symmetry and gestural economy are affected by the size of the phonemic inventory.

RevDate: 2021-06-22
CmpDate: 2021-06-21

Vampola T, Horáček J, Radolf V, et al (2020)

Influence of nasal cavities on voice quality: Computer simulations and experiments.

The Journal of the Acoustical Society of America, 148(5):3218.

Nasal cavities are known to introduce antiresonances (dips) in the sound spectrum reducing the acoustic power of the voice. In this study, a three-dimensional (3D) finite element (FE) model of the vocal tract (VT) of one female subject was created for vowels [a:] and [i:] without and with a detailed model of nasal cavities based on CT (Computer Tomography) images. The 3D FE models were then used for analyzing the resonances, antiresonances and the acoustic pressure response spectra of the VT. The computed results were compared with the measurements of a VT model for the vowel [a:], obtained from the FE model by 3D printing. The nasality affects mainly the lowest formant frequency and decreases its peak level. The results confirm the main effect of nasalization, i.e., that sound pressure level decreases in the frequency region of the formants F1-F2 and emphasizes the frequency region of the formants F3-F5 around the singer's formant cluster. Additionally, many internal local resonances in the nasal and paranasal cavities were found in the 3D FE model. Their effect on the acoustic output was found to be minimal, but accelerometer measurements on the walls of the 3D-printed model suggested they could contribute to structure vibrations.

RevDate: 2020-11-04

Ishikawa K, J Webster (2020)

The Formant Bandwidth as a Measure of Vowel Intelligibility in Dysphonic Speech.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30403-3 [Epub ahead of print].

OBJECTIVE: The current paper examined the impact of dysphonia on the bandwidth of the first two formants of vowels, and the relationship between the formant bandwidth and vowel intelligibility.

METHODS: Speaker participants of the study were 10 adult females with healthy voice and 10 adult females with dysphonic voice. Eleven vowels in American English were recorded in /h/-vowel-/d/ format. The vowels were presented to 10 native speakers of American English with normal hearing, who were asked to select a vowel they heard from a list of /h/-vowel-/d/ words. The vowels were acoustically analyzed to measure the bandwidth of the first and second formants (B1 and B2). Separate Wilcoxon rank sum tests were conducted for each vowel for normal and dysphonic speech because the differences in B1 and B2 were found to not be normally distributed. Spearman correlation tests were conducted to evaluate the association between the difference in formant bandwidths and vowel intelligibility between the healthy and dysphonic speakers.

RESULTS: B1 was significantly greater in dysphonic vowels for seven of the eleven vowels, and lesser for only one of the vowels. There was no statistically significant difference in B2 between the normal and dysphonic vowels, except for the vowel /i/. The difference in B1 between normal and dysphonic vowels strongly predicted the intelligibility difference.

CONCLUSION: Dysphonia significantly affects B1, and the difference in B1 may serve as an acoustic marker for the intelligibility reduction in dysphonic vowels. This acoustic-perceptual relationship should be confirmed by a larger-scale study in the future.

RevDate: 2020-11-04

Burckardt ES, Hillman RE, Murton O, et al (2020)

The Impact of Tonsillectomy on the Adult Singing Voice: Acoustic and Aerodynamic Measures.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30373-8 [Epub ahead of print].

OBJECTIVE: Singers undergoing tonsillectomy are understandably concerned about possible sequelae to their voice. The surgical risks of laryngeal damage from intubation and upper airway scarring are valid reasons for singers to carefully consider their options for treatment of tonsil-related symptoms. No prior studies have statistically assessed objective voice outcomes in a group of adult singers undergoing tonsillectomy. This study determined the impact of tonsillectomy on the adult singing voice by determining if there were statistically significant changes in preoperative versus postoperative acoustic, aerodynamic, and Voice-Related Quality of Life (VRQOL) measures.

STUDY DESIGN: Prospective cohort study.

SETTING: Tertiary Referral Academic Hospital SUBJECTS: Thirty singers undergoing tonsillectomy from 2012 to 2019.

METHODS: Acoustic recordings were obtained with Computerized Speech Lab (CSL) (Pentax CSL 4500) and analyzed with the Multidimensional Voice Program (MDVP) (Pentax MDVP) and Pratt Acoustic Analysis Software. Estimates of aerodynamic vocal efficiency were obtained and analyzed using the Phonatory Aerodynamic System (Pentax PAS 6600). Preoperative VRQOL scores were recorded, and singers were instructed to refrain from singing for 3 weeks following tonsillectomy. Repeat acoustic and aerodynamic measures as well as VRQOL scores were obtained at the first postoperative visit.

RESULTS: Average postoperative acoustic (jitter, shimmer, HNR) and aerodynamic (sound pressure level divided by subglottal pressure) parameters related to laryngeal phonatory function did not differ significantly from preoperative measures. The only statistically significant change in postoperative measures of resonance was a decrease in the 3rd formant (F3) for the /a/ vowel. Average postoperative VRQOL scores (79.8, SD18.7) improved significantly from preoperative VRQOL scores (89, SD12.2) (P = 0.007).

CONCLUSIONS: Tonsillectomy does not appear to alter laryngeal voice production in adult singers as measured by standard acoustic and aerodynamic parameters. The observed decrease in F3 for the /a/ vowel is hypothetically related to increasing the pharyngeal cross-sectional area by removing tonsillar tissue, but this would not be expected to appreciably impact the perceptual characteristics of the vowel. Singers' self-assessment (VRQOL) improved after tonsillectomy.

RevDate: 2021-06-14
CmpDate: 2021-06-14

Roberts B, RJ Summers (2020)

Informational masking of speech depends on masker spectro-temporal variation but not on its coherence.

The Journal of the Acoustical Society of America, 148(4):2416.

The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100-200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence.

RevDate: 2021-06-14
CmpDate: 2021-06-14

Nenadić F, Coulter P, Nearey TM, et al (2020)

Perception of vowels with missing formant peaks.

The Journal of the Acoustical Society of America, 148(4):1911.

Although the first two or three formant frequencies are considered essential cues for vowel identification, certain limitations of this approach have been noted. Alternative explanations have suggested listeners rely on other aspects of the gross spectral shape. A study conducted by Ito, Tsuchida, and Yano [(2001). J. Acoust. Soc. Am. 110, 1141-1149] offered strong support for the latter, as attenuation of individual formant peaks left vowel identification largely unaffected. In the present study, these experiments are replicated in two dialects of English. Although the results were similar to those of Ito, Tsuchida, and Yano [(2001). J. Acoust. Soc. Am. 110, 1141-1149], quantitative analyses showed that when a formant is suppressed, participant response entropy increases due to increased listener uncertainty. In a subsequent experiment, using synthesized vowels with changing formant frequencies, suppressing individual formant peaks led to reliable changes in identification of certain vowels but not in others. These findings indicate that listeners can identify vowels with missing formant peaks. However, such formant-peak suppression may lead to decreased certainty in identification of steady-state vowels or even changes in vowel identification in certain dynamically specified vowels.

RevDate: 2021-07-28
CmpDate: 2021-07-28

Easwar V, Birstler J, Harrison A, et al (2020)

The Accuracy of Envelope Following Responses in Predicting Speech Audibility.

Ear and hearing, 41(6):1732-1746.

OBJECTIVES: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs.

DESIGN: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value.

RESULTS: of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T (68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T (88.4%). When analyzed using Hotelling's T as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli.

CONCLUSIONS: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.

RevDate: 2020-12-26

Rakerd B, Hunter EJ, P Lapine (2019)

Resonance Effects and the Vocalization of Speech.

Perspectives of the ASHA special interest groups, 4(6):1637-1643.

Studies of the respiratory and laryngeal actions required for phonation are central to our understanding of both voice and voice disorders. The purpose of the present article is to highlight complementary insights about voice that have come from the study of vocal tract resonance effects.

RevDate: 2021-05-10

Jeanneteau M, Hanna N, Almeida A, et al (2020)

Using visual feedback to tune the second vocal tract resonance for singing in the high soprano range.

Logopedics, phoniatrics, vocology [Epub ahead of print].

PURPOSE: Over a range roughly C5-C6, sopranos usually tune their first vocal tract resonance (R1) to the fundamental frequency (fo) of the note sung: R1:fo tuning. Those who sing well above C6 usually adjust their second vocal tract resonance (R2) and use R2:fo tuning. This study investigated these questions: Can singers quickly learn R2:fo tuning when given suitable feedback? Can they subsequently use this tuning without feedback? And finally, if so, does this assist their singing in the high range?

METHODS: New computer software for the technique of resonance estimation by broadband excitation at the lips was used to provide real-time visual feedback on fo and vocal tract resonances. Eight sopranos participated. In a one-hour session, they practised adjusting R2 whilst miming (i.e. without phonating), and then during singing.

RESULTS: Six sopranos learned to tune R2 over a range of several semi-tones, when feedback was present. This achievement did not immediately extend their singing range. When the feedback was removed, two sopranos spontaneously used R2:fo tuning at the top of their range above C6.

CONCLUSIONS: With only one hour of training, singers can learn to adjust their vocal tract shape for R2:fo tuning when provided with visual feedback. One additional participant who spent considerable time with the software, acquired greater skill at R2:fo tuning and was able to extend her singing range. A simple version of the hardware used can be assembled using basic equipment and the software is available online.

RevDate: 2021-05-10

Ayres A, Winckler PB, Jacinto-Scudeiro LA, et al (2020)

Speech characteristics in individuals with myasthenia gravis: a case control study.

Logopedics, phoniatrics, vocology [Epub ahead of print].

INTRODUCTION: Myasthenia Gravis (MG) is an autoimmune disease. The characteristic symptoms of the disease are muscle weakness and fatigue. These symptoms affect de oral muscles causing dysarthria, affecting about 60% of patients with disease progression.

PURPOSE: Describe the speech pattern of patients with MG and comparing with healthy controls (HC).

MATERIAL AND METHODS: Case-control study. Participants were divided in MG group (MGG) with 38 patients MG diagnosed and HC with 18 individuals matched for age and sex. MGG was evaluated with clinical and motor scales and answered self-perceived questionnaires. Speech assessment of both groups included: recording of speech tasks, acoustic and auditory-perceptual analysis.

RESULTS: In the MGG, 68.24% of the patients were female, with average age of 50.21 years old (±16.47), 14.18 years (±9.52) of disease duration and a motor scale of 11.19 points (±8.79). The auditory-perceptual analysis verified that 47.36% (n = 18) participants in MGG presented mild dysarthria, 10.52% (n = 4) moderate dysarthria, with a high percentage of alterations in phonation (95.2%) and breathing (52.63%). The acoustic analysis verified a change in phonation, with significantly higher shimmer values in the MGG compared to the HC and articulation with a significant difference between the groups for the first formant of the /iu/ (p = <.001). No correlation was found between the diagnosis of speech disorder and the dysarthria self-perception questionnaire.

CONCLUSION: We found dysarthria mild in MG patients with changes in the motor bases phonation and breathing, with no correlation with severity and disease duration.

RevDate: 2021-05-14
CmpDate: 2021-05-14

Kim KS, Daliri A, Flanagan JR, et al (2020)

Dissociated Development of Speech and Limb Sensorimotor Learning in Stuttering: Speech Auditory-motor Learning is Impaired in Both Children and Adults Who Stutter.

Neuroscience, 451:1-21.

Stuttering is a neurodevelopmental disorder of speech fluency. Various experimental paradigms have demonstrated that affected individuals show limitations in sensorimotor control and learning. However, controversy exists regarding two core aspects of this perspective. First, it has been claimed that sensorimotor learning limitations are detectable only in adults who stutter (after years of coping with the disorder) but not during childhood close to the onset of stuttering. Second, it remains unclear whether stuttering individuals' sensorimotor learning limitations affect only speech movements or also unrelated effector systems involved in nonspeech movements. We report data from separate experiments investigating speech auditory-motor learning (N = 60) and limb visuomotor learning (N = 84) in both children and adults who stutter versus matched nonstuttering individuals. Both children and adults who stutter showed statistically significant limitations in speech auditory-motor adaptation with formant-shifted feedback. This limitation was more profound in children than in adults and in younger children versus older children. Between-group differences in the adaptation of reach movements performed with rotated visual feedback were subtle but statistically significant for adults. In children, even the nonstuttering groups showed limited visuomotor adaptation just like their stuttering peers. We conclude that sensorimotor learning is impaired in individuals who stutter, and that the ability for speech auditory-motor learning-which was already adult-like in 3-6 year-old typically developing children-is severely compromised in young children near the onset of stuttering. Thus, motor learning limitations may play an important role in the fundamental mechanisms contributing to the onset of this speech disorder.

RevDate: 2021-06-21
CmpDate: 2021-06-21

Lester-Smith RA, Daliri A, Enos N, et al (2020)

The Relation of Articulatory and Vocal Auditory-Motor Control in Typical Speakers.

Journal of speech, language, and hearing research : JSLHR, 63(11):3628-3642.

Purpose The purpose of this study was to explore the relationship between feedback and feedforward control of articulation and voice by measuring reflexive and adaptive responses to first formant (F 1) and fundamental frequency (f o) perturbations. In addition, perception of F 1 and f o perturbation was estimated using passive (listening) and active (speaking) just noticeable difference paradigms to assess the relation of auditory acuity to reflexive and adaptive responses. Method Twenty healthy women produced single words and sustained vowels while the F 1 or f o of their auditory feedback was suddenly and unpredictably perturbed to assess reflexive responses or gradually and predictably perturbed to assess adaptive responses. Results Typical speakers' reflexive responses to sudden perturbation of F 1 were related to their adaptive responses to gradual perturbation of F 1. Specifically, speakers with larger reflexive responses to sudden perturbation of F 1 had larger adaptive responses to gradual perturbation of F 1. Furthermore, their reflexive responses to sudden perturbation of F 1 were associated with their passive auditory acuity to F 1 such that speakers with better auditory acuity to F 1 produced larger reflexive responses to sudden perturbations of F 1. Typical speakers' adaptive responses to gradual perturbation of F 1 were not associated with their auditory acuity to F 1. Speakers' reflexive and adaptive responses to perturbation of f o were not related, nor were their responses related to either measure of auditory acuity to f o. Conclusion These findings indicate that there may be disparate feedback and feedforward control mechanisms for articulatory and vocal error correction based on auditory feedback.

RevDate: 2020-10-18

Pawelec ŁP, Graja K, A Lipowicz (2020)

Vocal Indicators of Size, Shape and Body Composition in Polish Men.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30352-0 [Epub ahead of print].

OBJECTIVES: From a human evolution perspective, identifying a link between physique and vocal quality could demonstrate dual signaling in terms of the health and biological condition of an individual. In this regard, this study investigates the relationship between men's body size, shape, and composition, and their vocal characteristics.

MATERIALS AND METHODS: Eleven anthropometric measurements, using seven indices, were carried out with 80 adult Polish male participants, while the speech analysis adopted a voice recording procedure that involved phonetically recording vowels /ɑː/, /ɛː/, /iː/, /ɔː/, /uː/ to define the voice acoustic components used in Praat software.

RESULTS: The relationship between voice parameters and body size/shape/composition was found. The analysis indicated that the formants and their derivatives were useful parameters for prediction of height, weight, neck, shoulder, waist, and hip circumferences. Fundamental frequency (F0) was negatively correlated with neck circumference at Adam's apple level and body height. Moreover neck circumference and F0 association was observed for the first time in this paper. The association between waist circumference and formant component showed a net effect. In addition, the formant parameters showed significant correlations with body shape, indicating a lower vocal timbre in men with a larger relative waist circumference.

DISCUSSION: Men with lower vocal pitch had wider necks, probably a result of larynx size. Furthermore, a greater waist circumference, presumably resulting from abdominal fat distribution in men, correlated with a lower vocal timbre. While these results are inconclusive, they highlight new directions for further research.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )