About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

30 Jun 2022 at 01:44
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 30 Jun 2022 at 01:44 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2022-06-24

Groll MD, Dahl KL, Cádiz MD, et al (2022)

Resynthesis of Transmasculine Voices to Assess Gender Perception as a Function of Testosterone Therapy.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The goal of this study was to use speech resynthesis to investigate the effects of changes to individual acoustic features on speech-based gender perception of transmasculine voice samples following the onset of hormone replacement therapy (HRT) with exogenous testosterone. We hypothesized that mean fundamental frequency (f o) would have the largest effect on gender perception of any single acoustic feature.

METHOD: Mean f o, f o contour, and formant frequencies were calculated for three pairs of transmasculine speech samples before and after HRT onset. Sixteen speech samples with unique combinations of these acoustic features from each pair of speech samples were resynthesized. Twenty young adult listeners evaluated each synthesized speech sample for gender perception and synthetic quality. Two analyses of variance were used to investigate the effects of acoustic features on gender perception and synthetic quality.

RESULTS: Of the three acoustic features, mean f o was the only single feature that had a statistically significant effect on gender perception. Differences between the speech samples before and after HRT onset that were not captured by changes in f o and formant frequencies also had a statistically significant effect on gender perception.

CONCLUSION: In these transmasculine voice samples, mean f o was the most important acoustic feature for voice masculinization as a result of HRT; future investigations in a larger number of transmasculine speakers and on the effects of behavioral therapy-based changes in concert with HRT is warranted.

RevDate: 2022-06-23

Ham J, Yoo HJ, Kim J, et al (2022)

Vowel speech recognition from rat electroencephalography using long short-term memory neural network.

PloS one, 17(6):e0270405 pii:PONE-D-21-40838.

Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen's κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.

RevDate: 2022-06-22

Pravitharangul N, Miyamoto JJ, Yoshizawa H, et al (2022)

Vowel sound production and its association with cephalometric characteristics in skeletal Class III subjects.

European journal of orthodontics pii:6613233 [Epub ahead of print].

BACKGROUND: This study aimed to evaluate differences in vowel production using acoustic analysis in skeletal Class III and Class I Japanese participants and to identify the correlation between vowel sounds and cephalometric variables in skeletal Class III subjects.

MATERIALS AND METHODS: Japanese males with skeletal Class III (ANB < 0°) and Class I skeletal anatomy (0.62° < ANB < 5.94°) were recruited (n = 18/group). Acoustic analysis of vowel sounds and cephalometric analysis of lateral cephalograms were performed. For sound analysis, an isolated Japanese vowel (/a/,/i/,/u/,/e/,/o/) pattern was recorded. Praat software was used to extract acoustic parameters such as fundamental frequency (F0) and the first four formants (F1, F2, F3, and F4). The formant graph area was calculated. Cephalometric values were obtained using ImageJ. Correlations between acoustic and cephalometric variables in skeletal Class III subjects were then investigated.

RESULTS: Skeletal Class III subjects exhibited significantly higher/o/F2 and lower/o/F4 values. Mandibular length, SNB, and overjet of Class III subjects were moderately negatively correlated with acoustic variables.

LIMITATIONS: This study did not take into account vertical skeletal patterns and tissue movements during sound production.

CONCLUSION: Skeletal Class III males produced different /o/ (back and rounded vowel), possibly owing to their anatomical positions or adaptive changes. Vowel production was moderately associated with cephalometric characteristics of Class III subjects. Thus, changes in speech after orthognathic surgery may be expected. A multidisciplinary team approach that included the input of a speech pathologist would be useful.

RevDate: 2022-06-21

Kabakoff H, Gritsyk O, Harel D, et al (2022)

Characterizing sensorimotor profiles in children with residual speech sound disorder: a pilot study.

Journal of communication disorders, 99:106230 pii:S0021-9924(22)00049-1 [Epub ahead of print].

PURPOSE: Children with speech errors who have reduced motor skill may be more likely to develop residual errors associated with lifelong challenges. Drawing on models of speech production that highlight the role of somatosensory acuity in updating motor plans, this pilot study explored the relationship between motor skill and speech accuracy, and between somatosensory acuity and motor skill in children. Understanding the connections among sensorimotor measures and speech outcomes may offer insight into how somatosensation and motor skill cooperate during speech production, which could inform treatment decisions for this population.

METHOD: Twenty-five children (ages 9-14) produced syllables in an /ɹ/ stimulability task before and after an ultrasound biofeedback treatment program targeting rhotics. We first tested whether motor skill (as measured by two ultrasound-based metrics of tongue shape complexity) predicted acoustically measured accuracy (the normalized difference between the second and third formant frequencies). We then tested whether somatosensory acuity (as measured by an oral stereognosis task) predicted motor skill, while controlling for auditory acuity.

RESULTS: One measure of tongue shape complexity was a significant predictor of accuracy, such that higher tongue shape complexity was associated with lower accuracy at pre-treatment but higher accuracy at post-treatment. Based on the same measure, children with better somatosensory acuity produced /ɹ/ tongue shapes that were more complex, but this relationship was only present at post-treatment.

CONCLUSION: The predicted relationships among somatosensory acuity, motor skill, and acoustically measured /ɹ/ production accuracy were observed after treatment, but unexpectedly did not hold before treatment. The surprising finding that greater tongue shape complexity was associated with lower accuracy at pre-treatment highlights the importance of evaluating tongue shape patterns (e.g., using ultrasound) prior to treatment, and has the potential to suggest that children with high tongue shape complexity at pre-treatment may be good candidates for ultrasound-based treatment.

RevDate: 2022-06-21

González-Alvarez J, R Sos-Peña (2022)

Perceiving Body Height From Connected Speech: Higher Fundamental Frequency Is Associated With the Speaker's Height.

Perceptual and motor skills [Epub ahead of print].

To a certain degree, human listeners can perceive a speaker's body size from their voice. The speaker's voice pitch or fundamental frequency (Fo) and the vocal formant frequencies are the voice parameters that have been most intensively studied in past body size perception research (particularly for body height). Artificially lowering the Fo of isolated vowels from male speakers improved listeners' accuracy of binary (i.e., tall vs not tall) body height perceptions. This has been explained by the theory that a denser harmonic spectrum provided by a low pitch improved the perceptual resolution of formants that aid formant-based size assessments. In the present study, we extended this research using connected speech (i.e., words and sentences) pronounced by speakers of both sexes. Unexpectedly, we found that raising Fo, not lowering it, increased the participants' perceptual performance in two binary discrimination tasks of body size. We explain our new finding in the temporal domain by the dynamic and time-varying acoustic properties of connected speech. Increased Fo might increase the sampling density of sound wave acoustic cycles and provide more detailed information, such as higher resolution, on the envelope shape.

RevDate: 2022-06-17

Sugiyama Y (2022)

Identification of Minimal Pairs of Japanese Pitch Accent in Noise-Vocoded Speech.

Frontiers in psychology, 13:887761.

The perception of lexical pitch accent in Japanese was assessed using noise-excited vocoder speech, which contained no fundamental frequency (f o) or its harmonics. While prosodic information such as in lexical stress in English and lexical tone in Mandarin Chinese is known to be encoded in multiple acoustic dimensions, such multidimensionality is less understood for lexical pitch accent in Japanese. In the present study, listeners were tested under four different conditions to investigate the contribution of non-f o properties to the perception of Japanese pitch accent: noise-vocoded speech stimuli consisting of 10 3-ERBN-wide bands and 15 2-ERBN-wide bands created from a male and female speaker. Results found listeners were able to identify minimal pairs of final-accented and unaccented words at a rate better than chance in all conditions, indicating the presence of secondary cues to Japanese pitch accent. Subsequent analyses were conducted to investigate if the listeners' ability to distinguish minimal pairs was correlated with duration, intensity or formant information. The results found no strong or consistent correlation, suggesting the possibility that listeners used different cues depending on the information available in the stimuli. Furthermore, the comparison of the current results with equivalent studies in English and Mandarin Chinese suggest that, although lexical prosodic information exists in multiple acoustic dimensions in Japanese, the primary cue is more salient than in other languages.

RevDate: 2022-06-14

Preisig B, Riecke L, A Hervais-Adelman (2022)

Speech sound categorization: The contribution of non-auditory and auditory cortical regions.

NeuroImage pii:S1053-8119(22)00494-3 [Epub ahead of print].

Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.

RevDate: 2022-06-13

Sayyahi F, V Boulenger (2022)

A temporal-based therapy for children with inconsistent phonological disorder: A case-series.

Clinical linguistics & phonetics [Epub ahead of print].

Deficits in temporal auditory processing, and in particular higher gap detection thresholds have been reported in children with inconsistent phonological disorder (IPD). Here we hypothesized that providing these children with extra time for phoneme identification may in turn enhance their phonological planning abilities for production, and accordingly improve not only consistency but also accuracy of their speech. We designed and tested a new temporal-based therapy, inspired by Core Vocabulary Therapy and called it T-CVT, where we digitally lengthened formant transitions between phonemes of words used for therapy. This allowed to target both temporal auditory processing and word phonological planning. Four preschool Persian native children with IPD received T-CVT for eight weeks. We measured changes in speech consistency (% inconsistency) and accuracy (percentage of consonants correct PCC) to assess the effects of the intervention. Therapy significantly improved both consistency and accuracy of word production in the four children: % inconsistency decreased from 59% on average before therapy to 2% post-T-CVT, and PCC increased from 61% to 92% on average. Consistency and accuracy were furthermore maintained or even still improved at three-month follow-up (2% inconsistency and 99% PCC). Results in a nonword repetition task showed the generalization of these effects to non-treated material: % inconsistency for nonwords decreased from 67% to 10% post-therapy, and PCC increased from 63% to 90%. These preliminary findings support the efficacy of the T-CVT intervention for children with IPD who show temporal auditory processing deficits as reflected by higher gap detection thresholds.

RevDate: 2022-06-08

Di Dona G, Scaltritti M, S Sulpizio (2022)

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli.

The European journal of neuroscience [Epub ahead of print].

The present study investigated whether listeners can form abstract voice representations while ignoring constantly changing phonological information and if they can use the resulting information to facilitate voice-change detection. Further, the study aimed at understanding whether the use of abstraction is restricted to the speech domain, or can be deployed also in non-speech contexts. We ran an EEG experiment including one passive and one active oddball task, each featuring a speech and a rotated-speech condition. In the speech condition, participants heard constantly changing vowels uttered by a male speaker (standard stimuli) which were infrequently replaced by vowels uttered by a female speaker with higher pitch (deviant stimuli). In the rotated-speech condition, participants heard rotated vowels, in which the natural formant structure of speech was disrupted. In the passive task, the Mismatch Negativity was elicited after the presentation of the deviant voice in both conditions, indicating that listeners could successfully group together different stimuli into a formant-invariant voice representation. In the active task, participants showed shorter RTs, higher accuracy and a larger P3b in the speech condition with respect to the rotated-speech condition. Results showed that whereas at a pre-attentive level the cognitive system can track pitch regularities while presumably ignoring constantly changing formant information both in speech and in rotated-speech, at an attentive level the use of such information is facilitated for speech. This facilitation was also testified by a stronger synchronization in the theta band (4-7 Hz), potentially pointing towards differences in encoding/retrieval processes.

RevDate: 2022-06-06

Hampsey E, Meszaros M, Skirrow C, et al (2022)

Protocol for Rhapsody: a longitudinal observational study examining the feasibility of speech phenotyping for remote assessment of neurodegenerative and psychiatric disorders.

BMJ open, 12(6):e061193 pii:bmjopen-2022-061193.

INTRODUCTION: Neurodegenerative and psychiatric disorders (NPDs) confer a huge health burden, which is set to increase as populations age. New, remotely delivered diagnostic assessments that can detect early stage NPDs by profiling speech could enable earlier intervention and fewer missed diagnoses. The feasibility of collecting speech data remotely in those with NPDs should be established.

METHODS AND ANALYSIS: The present study will assess the feasibility of obtaining speech data, collected remotely using a smartphone app, from individuals across three NPD cohorts: neurodegenerative cognitive diseases (n=50), other neurodegenerative diseases (n=50) and affective disorders (n=50), in addition to matched controls (n=75). Participants will complete audio-recorded speech tasks and both general and cohort-specific symptom scales. The battery of speech tasks will serve several purposes, such as measuring various elements of executive control (eg, attention and short-term memory), as well as measures of voice quality. Participants will then remotely self-administer speech tasks and follow-up symptom scales over a 4-week period. The primary objective is to assess the feasibility of remote collection of continuous narrative speech across a wide range of NPDs using self-administered speech tasks. Additionally, the study evaluates if acoustic and linguistic patterns can predict diagnostic group, as measured by the sensitivity, specificity, Cohen's kappa and area under the receiver operating characteristic curve of the binary classifiers distinguishing each diagnostic group from each other. Acoustic features analysed include mel-frequency cepstrum coefficients, formant frequencies, intensity and loudness, whereas text-based features such as number of words, noun and pronoun rate and idea density will also be used.

ETHICS AND DISSEMINATION: The study received ethical approval from the Health Research Authority and Health and Care Research Wales (REC reference: 21/PR/0070). Results will be disseminated through open access publication in academic journals, relevant conferences and other publicly accessible channels. Results will be made available to participants on request.


RevDate: 2022-06-06

Coughler C, Quinn de Launay KL, Purcell DW, et al (2022)

Pediatric Responses to Fundamental and Formant Frequency Altered Auditory Feedback: A Scoping Review.

Frontiers in human neuroscience, 16:858863.

Purpose: The ability to hear ourselves speak has been shown to play an important role in the development and maintenance of fluent and coherent speech. Despite this, little is known about the developing speech motor control system throughout childhood, in particular if and how vocal and articulatory control may differ throughout development. A scoping review was undertaken to identify and describe the full range of studies investigating responses to frequency altered auditory feedback in pediatric populations and their contributions to our understanding of the development of auditory feedback control and sensorimotor learning in childhood and adolescence.

Method: Relevant studies were identified through a comprehensive search strategy of six academic databases for studies that included (a) real-time perturbation of frequency in auditory input, (b) an analysis of immediate effects on speech, and (c) participants aged 18 years or younger.

Results: Twenty-three articles met inclusion criteria. Across studies, there was a wide variety of designs, outcomes and measures used. Manipulations included fundamental frequency (9 studies), formant frequency (12), frequency centroid of fricatives (1), and both fundamental and formant frequencies (1). Study designs included contrasts across childhood, between children and adults, and between typical, pediatric clinical and adult populations. Measures primarily explored acoustic properties of speech responses (latency, magnitude, and variability). Some studies additionally examined the association of these acoustic responses with clinical measures (e.g., stuttering severity and reading ability), and neural measures using electrophysiology and magnetic resonance imaging.

Conclusion: Findings indicated that children above 4 years generally compensated in the opposite direction of the manipulation, however, in several cases not as effectively as adults. Overall, results varied greatly due to the broad range of manipulations and designs used, making generalization challenging. Differences found between age groups in the features of the compensatory vocal responses, latency of responses, vocal variability and perceptual abilities, suggest that maturational changes may be occurring in the speech motor control system, affecting the extent to which auditory feedback is used to modify internal sensorimotor representations. Varied findings suggest vocal control develops prior to articulatory control. Future studies with multiple outcome measures, manipulations, and more expansive age ranges are needed to elucidate findings.

RevDate: 2022-05-31

Wang X, T Wang (2022)

Voice Recognition and Evaluation of Vocal Music Based on Neural Network.

Computational intelligence and neuroscience, 2022:3466987.

Artistic voice is the artistic life of professional voice users. In the process of selecting and cultivating artistic performing talents, the evaluation of voice even occupies a very important position. Therefore, an appropriate evaluation of the artistic voice is crucial. With the development of art education, how to scientifically evaluate artistic voice training methods and fairly select artistic voice talents is an urgent need for objective evaluation of artistic voice. The current evaluation methods for artistic voices are time-consuming, laborious, and highly subjective. In the objective evaluation of artistic voice, the selection of evaluation acoustic parameters is very important. Attempt to extract the average energy, average frequency error, and average range error of singing voice by using speech analysis technology as the objective evaluation acoustic parameters, use neural network method to objectively evaluate the singing quality of artistic voice, and compare with the subjective evaluation of senior professional teachers. In this paper, voice analysis technology is used to extract the first formant, third formant, fundamental frequency, sound range, fundamental frequency perturbation, first formant perturbation, third formant perturbation, and average energy of singing acoustic parameters. By using BP neural network methods, the quality of singing was evaluated objectively and compared with the subjective evaluation of senior vocal professional teachers. The results show that the BP neural network method can accurately and objectively evaluate the quality of singing voice by using the evaluation parameters, which is helpful in scientifically guiding the selection and training of artistic voice talents.

RevDate: 2022-05-13

Tomaschek F, M Ramscar (2022)

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing.

Frontiers in psychology, 13:754395.

The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.

RevDate: 2022-05-09

Haiduk F, WT Fitch (2022)

Understanding Design Features of Music and Language: The Choric/Dialogic Distinction.

Frontiers in psychology, 13:786899.

Music and spoken language share certain characteristics: both consist of sequences of acoustic elements that are combinatorically combined, and these elements partition the same continuous acoustic dimensions (frequency, formant space and duration). However, the resulting categories differ sharply: scale tones and note durations of small integer ratios appear in music, while speech uses phonemes, lexical tone, and non-isochronous durations. Why did music and language diverge into the two systems we have today, differing in these specific features? We propose a framework based on information theory and a reverse-engineering perspective, suggesting that design features of music and language are a response to their differential deployment along three different continuous dimensions. These include the familiar propositional-aesthetic ('goal') and repetitive-novel ('novelty') dimensions, and a dialogic-choric ('interactivity') dimension that is our focus here. Specifically, we hypothesize that music exhibits specializations enhancing coherent production by several individuals concurrently-the 'choric' context. In contrast, language is specialized for exchange in tightly coordinated turn-taking-'dialogic' contexts. We examine the evidence for our framework, both from humans and non-human animals, and conclude that many proposed design features of music and language follow naturally from their use in distinct dialogic and choric communicative contexts. Furthermore, the hybrid nature of intermediate systems like poetry, chant, or solo lament follows from their deployment in the less typical interactive context.

RevDate: 2022-05-06

Hall A, Kawai K, Graber K, et al (2021)

Acoustic analysis of surgeons' voices to assess change in the stress response during surgical in situ simulation.

BMJ simulation & technology enhanced learning, 7(6):471-477 pii:bmjstel-2020-000727.

Introduction: Stress may serve as an adjunct (challenge) or hindrance (threat) to the learning process. Determining the effect of an individual's response to situational demands in either a real or simulated situation may enable optimisation of the learning environment. Studies of acoustic analysis suggest that mean fundamental frequency and formant frequencies of voice vary with an individual's response during stressful events. This hypothesis is reviewed within the otolaryngology (ORL) simulation environment to assess whether acoustic analysis could be used as a tool to determine participants' stress response and cognitive load in medical simulation. Such an assessment could lead to optimisation of the learning environment.

Methodology: ORL simulation scenarios were performed to teach the participants teamwork and refine clinical skills. Each was performed in an actual operating room (OR) environment (in situ) with a multidisciplinary team consisting of ORL surgeons, OR nurses and anaesthesiologists. Ten of the scenarios were led by an ORL attending and ten were led by an ORL fellow. The vocal communication of each of the 20 individual leaders was analysed using a long-term pitch analysis PRAAT software (autocorrelation method) to obtain mean fundamental frequency (F0) and first four formant frequencies (F1, F2, F3 and F4). In reviewing individual scenarios, each leader's voice was analysed during a non-stressful environment (WHO sign-out procedure) and compared with their voice during a stressful portion of the scenario (responding to deteriorating oxygen saturations in the manikin).

Results: The mean unstressed F0 for the male voice was 161.4 Hz and for the female voice was 217.9 Hz. The mean fundamental frequency of speech in the ORL fellow (lead surgeon) group increased by 34.5 Hz between the scenario's baseline and stressful portions. This was significantly different to the mean change of -0.5 Hz noted in the attending group (p=0.01). No changes were seen in F1, F2, F3 or F4.

Conclusions: This study demonstrates a method of acoustic analysis of the voices of participants taking part in medical simulations. It suggests acoustic analysis of participants may offer a simple, non-invasive, non-intrusive adjunct in evaluating and titrating the stress response during simulation.

RevDate: 2022-05-02

Jarollahi F, Valadbeigi A, Jalaei B, et al (2022)

Comparing Sound-Field Speech-Auditory Brainstem Response Components between Cochlear Implant Users with Different Speech Recognition in Noise Scores.

Iranian journal of child neurology, 16(2):93-105.

Objectives: Many studies have suggested that cochlear implant (CI) users vary in terms of speech recognition in noise. Studies in this field attribute this variety partly to subcortical auditory processing. Studying speech-Auditory Brainstem Response (speech-ABR) provides good information about speech processing; thus, this work was designed to compare speech-ABR components between two groups of CI users with good and poor speech recognition in noise scores.

Materials & Methods: The present study was conducted on two groups of CI users aged 8-10 years old. The first group (CI-good) consisted of 15 children with prelingual CI who had good speech recognition in noise performance. The second group (CI-poor) was matched with the first group, but they had poor speech recognition in noise performance. The speech-ABR test in a sound-field presentation was performed for all the participants.

Results: The speech-ABR response showed more delay in C, D, E, F, O latencies in CI-poor than CI-good users (P <0.05), meanwhile no significant difference was observed in initial wave (V(t= -0.293, p= 0.771 and A (t= -1.051, p= 0.307). Analysis in spectral-domain showed a weaker representation of fundamental frequency as well as the first formant and high-frequency component of speech stimuli in the CI users with poor auditory performance.

Conclusions: Results revealed that CI users who showed poor auditory performance in noise performance had deficits in encoding the periodic portion of speech signals at the brainstem level. Also, this study could be as physiological evidence for poorer pitch processing in CI users with poor speech recognition in noise performance.

RevDate: 2022-04-22

Houle N, Goudelias D, Lerario MP, et al (2022)

Effect of Anchor Term on Auditory-Perceptual Ratings of Feminine and Masculine Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

BACKGROUND: Studies investigating auditory perception of gender expression vary greatly in the specific terms applied to gender expression in rating scales.

PURPOSE: This study examined the effects of different anchor terms on listeners' auditory perceptions of gender expression in phonated and whispered speech. Additionally, token and speaker cues were examined to identify predictors of the auditory-perceptual ratings.

METHOD: Inexperienced listeners (n = 105) completed an online rating study in which they were asked to use one of five visual analog scales (VASs) to rate cis men, cis women, and transfeminine speakers in both phonated and whispered speech. The VASs varied by anchor term (very female/very male, feminine/masculine, feminine female/masculine male, very feminine/not at all feminine, and not at all masculine/very masculine).

RESULTS: Linear mixed-effects models revealed significant two-way interactions of gender expression by anchor term and gender expression by condition. In general, the feminine female/masculine male scale resulted in the most extreme ratings (closest to the end points), and the feminine/masculine scale resulted in the most central ratings. As expected, for all speakers, whispered speech was rated more centrally than phonated speech. Additionally, ratings of phonated speech were predicted by mean fundamental frequency (f o) within each speaker group and by smoothed cepstral peak prominence in cisgender speakers. In contrast, ratings of whispered speech, which lacks an f o, were predicted by indicators of vocal tract resonance (second formant and speaker height).

CONCLUSIONS: The current results indicate that differences in the terms applied to rating scales limit generalization of results across studies. Identifying the patterns across listener ratings of gender expression provide a rationale for researchers and clinicians when making choices about terms. Additionally, beyond f o and vocal tract resonance, predictors of listener ratings vary based on the anchor terms used to describe gender expression.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19617564.

RevDate: 2022-04-14

Kırbac A, Turkyılmaz MD, S Yağcıoglu (2022)

Gender Effects on Binaural Speech Auditory Brainstem Response.

The journal of international advanced otology, 18(2):125-130.

BACKGROUND: The speech auditory brainstem response is a tool that provides direct information on how speech sound is temporally and spectrally coded by the auditory brainstem. Speech auditory brainstem response is influenced by many variables, but the effect of gender is unclear, particularly in the binaural recording. Studies on speech auditory brainstem response evoked by binaural stimulation are limited, but gender studies are even more limited and contradictory. This study aimed at examining the effect of gender on speech auditory brainstem response in adults.

METHODS: Time- and frequency-domain analyses of speech auditory brainstem response recordings of 30 healthy participants (15 women and 15 men) aged 18-35 years with normal hearing and no musical education were obtained. For each adult, speech auditory brainstem response was recorded with the syllable /da/ presented binaurally. Peaks of time (V, A, C, D, E, F, and O) and frequency (fundamental frequency, first formant frequency, and high frequency) domains of speech auditory brainstem response were compared between men and women.

RESULTS: V, A, and F peak latencies of women were significantly shorter than those of men (P< .05). However, no difference was found in the peak amplitude of the time (P > .05) or frequency domain between women and men (P > .05).

CONCLUSION: Gender differences in binaural speech auditory brainstem response are significant in adults, particularly in the time domain. When speech stimuli are used for auditory brainstem responses, normative data specific to gender are required. Preliminary normative data from this study could serve as a reference for future studies on binaural speech auditory brainstem response among Turkish adults.

RevDate: 2022-04-13

Cangokce Yasar O, Ozturk S, Kemal O, et al (2021)

Effects of Subthalamic Nucleus Deep Brain Stimulation Surgery on Voice and Formant Frequencies of Vowels in Turkish.

Turkish neurosurgery [Epub ahead of print].

AIM: This study aimed to investigate the effects of deep brain stimulation (DBS) of the subthalamic nucleus (STN) on acoustic characteristics of voice production in Turkish patients with Parkinson's disease (PD).

MATERIAL AND METHODS: This study recruited 20 patients diagnosed with PD. Voice samples were recorded under the "stimulation on" and "stimulation off" conditions of STN-DBS. Acoustic recordings of the patients were made during the production of vowels /a/, /o/, and /i/ and repetition of the syllables /pa/-/ta/-/ka/. Acoustic analyses were performed using Praat.

RESULTS: A significant difference in the parameters was observed among groups for vowels. A positive significant difference was observed between preoperative med-on and postoperative med-on/stim-on groups for /a/ and the postoperative med-on/stim-on and postoperative med-on/stim-off groups for /o/ and /i/ for frequency perturbation (jitter) and noise-to-harmonics ratio. No significant difference was noted between the preoperative med-on and postoperative med-on/stim-off groups for any vowels.

CONCLUSION: STN-DBS surgery has an acute positive effect on voice. Studies on formant frequency analysis in STN-DBS may be expanded with both articulation and intelligibility tests to enable us to combine patient abilities in various perspectives and to obtain precise results.

RevDate: 2022-04-11

Quatieri TF, Talkar T, JS Palmer (2020)

A Framework for Biomarkers of COVID-19 Based on Coordination of Speech-Production Subsystems.

IEEE open journal of engineering in medicine and biology, 1:203-206.

Goal: We propose a speech modeling and signal-processing framework to detect and track COVID-19 through asymptomatic and symptomatic stages. Methods: The approach is based on complexity of neuromotor coordination across speech subsystems involved in respiration, phonation and articulation, motivated by the distinct nature of COVID-19 involving lower (i.e., bronchial, diaphragm, lower tracheal) versus upper (i.e., laryngeal, pharyngeal, oral and nasal) respiratory tract inflammation, as well as by the growing evidence of the virus' neurological manifestations. Preliminary results: An exploratory study with audio interviews of five subjects provides Cohen's d effect sizes between pre-COVID-19 (pre-exposure) and post-COVID-19 (after positive diagnosis but presumed asymptomatic) using: coordination of respiration (as measured through acoustic waveform amplitude) and laryngeal motion (fundamental frequency and cepstral peak prominence), and coordination of laryngeal and articulatory (formant center frequencies) motion. Conclusions: While there is a strong subject-dependence, the group-level morphology of effect sizes indicates a reduced complexity of subsystem coordination. Validation is needed with larger more controlled datasets and to address confounding influences such as different recording conditions, unbalanced data quantities, and changes in underlying vocal status from pre-to-post time recordings.

RevDate: 2022-04-08

Dahl KL, François FA, Buckley DP, et al (2022)

Voice and Speech Changes in Transmasculine Individuals Following Circumlaryngeal Massage and Laryngeal Reposturing.

American journal of speech-language pathology [Epub ahead of print].

PURPOSE: The purpose of this study was to measure the short-term effects of circumlaryngeal massage and laryngeal reposturing on acoustic and perceptual characteristics of voice in transmasculine individuals.

METHOD: Fifteen transmasculine individuals underwent one session of sequential circumlaryngeal massage and laryngeal reposturing with a speech-language pathologist. Voice recordings were collected at three time points-baseline, postmassage, and postreposturing. Fundamental frequency (f o), formant frequencies, and relative fundamental frequency (RFF; an acoustic correlate of laryngeal tension) were measured. Estimates of vocal tract length (VTL) were derived from formant frequencies. Twelve listeners rated the perceived masculinity of participants' voices at each time point. Repeated-measures analyses of variance measured the effect of time point on f o, estimated VTL, RFF, and perceived voice masculinity. Significant effects were evaluated with post hoc Tukey's tests.

RESULTS: Between baseline and end of the session, f o decreased, VTL increased, and participant voices were perceived as more masculine, all with statistically significant differences. RFF did not differ significantly at any time point. Outcomes were highly variable at the individual level.

CONCLUSION: Circumlaryngeal massage and laryngeal reposturing have short-term effects on select acoustic (f o, estimated VTL) and perceptual characteristics (listener-assigned voice masculinity) of voice in transmasculine individuals.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19529299.

RevDate: 2022-04-04

Zhang G, Shao J, Zhang C, et al (2022)

The Perception of Lexical Tone and Intonation in Whispered Speech by Mandarin-Speaking Congenital Amusics.

Journal of speech, language, and hearing research : JSLHR, 65(4):1331-1348.

PURPOSE: A fundamental feature of human speech is variation, including the manner of phonation, as exemplified in the case of whispered speech. In this study, we employed whispered speech to examine an unresolved issue about congenital amusia, a neurodevelopmental disorder of musical pitch processing, which also affects speech pitch processing such as lexical tone and intonation perception. The controversy concerns whether amusia is a pitch-processing disorder or can affect speech processing beyond pitch.

METHOD: We examined lexical tone and intonation recognition in 19 Mandarin-speaking amusics and 19 matched controls in phonated and whispered speech, where fundamental frequency (f o) information is either present or absent.

RESULTS: The results revealed that the performance of congenital amusics was inferior to that of controls in lexical tone identification in both phonated and whispered speech. These impairments were also detected in identifying intonation (statements/questions) in phonated and whispered modes. Across the experiments, regression models revealed that f o and non-f o (duration, intensity, and formant frequency) acoustic cues predicted tone and intonation recognition in phonated speech, whereas non-f o cues predicted tone and intonation recognition in whispered speech. There were significant differences between amusics and controls in the use of both f o and non-f o cues.

CONCLUSION: The results provided the first evidence that the impairments of amusics in lexical tone and intonation identification prevail into whispered speech and support the hypothesis that the deficits of amusia extend beyond pitch processing.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19302275.

RevDate: 2022-04-01

Carl M, Levy ES, M Icht (2022)

Speech treatment for Hebrew-speaking adolescents and young adults with developmental dysarthria: A comparison of mSIT and Beatalk.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Individuals with developmental dysarthria typically demonstrate reduced functioning of one or more of the speech subsystems, which negatively impacts speech intelligibility and communication within social contexts. A few treatment approaches are available for improving speech production and intelligibility among individuals with developmental dysarthria. However, these approaches have only limited application and research findings among adolescents and young adults.

AIMS: To determine and compare the effectiveness of two treatment approaches, the modified Speech Intelligibility Treatment (mSIT) and the Beatalk technique, on speech production and intelligibility among Hebrew-speaking adolescents and young adults with developmental dysarthria.

METHODS & PROCEDURES: Two matched groups of adolescents and young adults with developmental dysarthria participated in the study. Each received one of the two treatments, mSIT or Beatalk, over the course of 9 weeks. Measures of speech intelligibility, articulatory accuracy, voice and vowel acoustics were assessed both pre- and post-treatment.

OUTCOMES & RESULTS: Both the mSIT and Beatalk groups demonstrated gains in at least some of the outcome measures. Participants in the mSIT group exhibited improvement in speech intelligibility and voice measures, while participants in the Beatalk group demonstrated increased articulatory accuracy and gains in voice measures from pre- to post-treatment. Significant increases were noted post-treatment for first formant values for select vowels.

Results of this preliminary study are promising for both treatment approaches. The differentiated results indicate their distinct application to speech intelligibility deficits. The current findings also hold clinical significance for treatment among adolescents and young adults with motor speech disorders and application for a language other than English.

WHAT THIS PAPER ADDS: What is already known on the subject Developmental dysarthria (e.g., secondary to cerebral palsy) is a motor speech disorder that negatively impacts speech intelligibility, and thus communication participation. Select treatment approaches are available with the aim of improving speech intelligibility in individuals with developmental dysarthria; however, these approaches are limited in number and have only seldomly been applied specifically to adolescents and young adults. What this paper adds to existing knowledge The current study presents preliminary data regarding two treatment approaches, the mSIT and Beatalk technique, administered to Hebrew-speaking adolescents and young adults with developmental dysarthria in a group setting. Results demonstrate the initial effectiveness of the treatment approaches, with different gains noted for each approach across speech and voice domains. What are the potential or actual clinical implications of this work? The findings add to the existing literature on potential treatment approaches aiming to improve speech production and intelligibility among individuals with developmental dysarthria. The presented approaches also show promise for group-based treatments as well as the potential for improvement among adolescents and young adults with motor speech disorders.

RevDate: 2022-03-30

Sen A, Thakkar H, Vincent V, et al (2022)

Endothelial colony forming cells' tetrahydrobiopterin level in coronary artery disease patients and its association with circulating endothelial progenitor cells.

Canadian journal of physiology and pharmacology [Epub ahead of print].

Endothelial colony forming cells (ECFCs) participate in neovascularization. Endothelial nitric oxide synthase (eNOS) derived NO· helps in homing of endothelial progenitor cells (EPCs) at the site of vascular injury. The enzyme cofactor tetrahydrobiopterin (BH4) stabilizes the catalytic active state of eNOS. Association of intracellular ECFCs biopterins and ratio of reduced to oxidized biopterin (BH4:BH2) with circulatory EPCs and ECFCs functionality have not been studied. We investigated ECFCs biopterin levels and its association with circulatory EPCs as well as ECFCs proliferative potential in terms of day of appearance in culture. Circulatory EPCs were enumerated by flowcytometry in 53 coronary artery disease (CAD) patients and 42 controls. ECFCs were cultured, characterized, and biopterin levels assessed by high performance liquid chromatography. Appearance of ECFCs' colony and their number were recorded. Circulatory EPCs were significantly lower in CAD and ECFCs appeared in 56% and 33% of CAD and control subjects, respectively. Intracellular BH4 and BH4:BH2 were significantly reduced in CAD. BH4:BH2 was positively correlated with circulatory EPCs (p = 0.01), and negatively with day of appearance of ECFCs (p = 0.04). Circulatory EPCs negatively correlated with ECFCs appearance (p = 0.02). These findings suggest the role of biopterins in maintaining circulatory EPCs and functional integrity of ECFCs.

RevDate: 2022-03-28

Ho GY, Kansy IK, Klavacs KA, et al (2022)

3Effect of FFP2+3 masks on voice range profile measurement and voice acoustics in routine voice diagnostics.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000524299 [Epub ahead of print].

INTRODUCTION: Voice diagnostics including voice range profile measurement (VRP) and acoustic voice analysis is essential in laryngology and phoniatrics. Due to Covid-19 pandemic, wearing of filtering face masks (FFP2/3) is recommended when high risk aerosol generating procedures like singing and speaking are being performed. Goal of this study was to compare VRP parameters when performed without and with FFP2/3 masks. Further, formant analysis for sustained vowels, singer's formant and analysis of reading standard text samples were performed without/with FFP2/3 masks.

METHODS: 20 subjects (6 male and 14 female) were enrolled in this study with an average age of 36±16 y (mean ± SD). 14 patients were rated as euphonic/not hoarse and 6 patients as mildly hoarse. All subjects underwent the VRP measurements, vowel and text recordings without/with FFP2/3 mask using the software DiVAS by XION medical (Berlin, Germany). Voice range of singing voice, equivalent of voice extension measure (eVEM), fundamental frequency (F0), sound pressure level (SPL) of soft speaking and shouting were calculated and analyzed. Maximum phonation time (MPT) and jitter-% were included for Dysphonia Severity Index (DSI) measurement. Analyses of singer's formant were performed. Spectral analyses of sustained vowels /a:/, /i:/ and /u:/ (first=F1 and second=F2 formants), intensity of long term average spectrum (LTAS) and alpha-ratio (α-ratio) were calculated using the freeware praat.

RESULTS: For all subjects the mean values of routine voice parameters without/with mask were analyzed: no significant differences were found in results of singing voice range, eVEM. SPL and frequency of soft speaking/shouting, except significant lower mean SPL of shouting with FFP2/3 mask, in particular that of the female subjects (p=0.002). Results of MPT, jitter and DSI without/with FFP2/3 mask showed no significant differences. Further mean values analyzed without/with mask were: ratio singer's formant/loud singing, with lower ratio with FFP2/3 mask (p=0.001). F1 and F2 of /a:/, /i:/, /u:/, with no significant differences of the results, with the exception of F2 of /i:/ with lower value with FFP2/3 mask (p=0.005). With the exceptions mentioned, the t-test revealed no significant differences for each of the routine parameters tested in the recordings without and with wearing a FFP2/3 mask.

CONCLUSION: It can be concluded, that VRP measurements including DSI performed with FFP2/3 masks provide reliable data in clinical routine with respect to voice condition/constitution. Spectral analyses of sustained vowel, text and singer's formant will be affected by wearing FFP2/3 masks.

RevDate: 2022-03-28

Chauvette L, Fournier P, A Sharp (2022)

The frequency-following response to assess the neural representation of spectral speech cues in older adults.

Hearing research, 418:108486 pii:S0378-5955(22)00057-0 [Epub ahead of print].

Older adults often present difficulties understanding speech that cannot be explained by age-related changes in sound audibility. Psychoacoustic and electrophysiologic studies have linked these suprathreshold difficulties to age-related deficits in the auditory processing of temporal and spectral sound information. These studies suggest the existence of an age-related temporal processing deficit in the central auditory system, but the existence of such deficit in the spectral domain remains understudied. The FFR is an electrophysiological evoked response that assesses the ability of the neural auditory system to reproduce the spectral and temporal patterns of a sound. The main goal of this short review is to investigate if the FFR can identify and measure spectral processing deficits in the elderly compared to younger adults (for both, without hearing loss or competing noise). Furthermore, we want to determine what stimuli and analyses have been used in the literature to assess the neural encoding of spectral cues in older adults. Almost all reviewed articles showed an age-related decline in the auditory processing of spectral acoustic information. Even when using different speech and non-speech stimuli, studies reported an age-related decline at the fundamental frequency, at the first formant, and at other harmonic components using different metrics, such as the response's amplitude, inter-trial phase coherence, signal-to-response correlation, and signal-to-noise ratio. These results suggest that older adults may present age-related spectral processing difficulties, but further FFR studies are needed to clarify the effect of advancing age on the neural encoding of spectral speech cues. Spectral processing research on aging would benefit from using a broader variety of stimuli and from rigorously controlling for hearing thresholds even in the absence of disabling hearing loss. Advances in the understanding of the effect of age on FFR measures of spectral encoding could lead to the development of new clinical tools, with possible applications in the field of hearing aid fitting.

RevDate: 2022-03-22

Zaltz Y, L Kishon-Rabin (2022)

Difficulties Experienced by Older Listeners in Utilizing Voice Cues for Speaker Discrimination.

Frontiers in psychology, 13:797422.

Human listeners are assumed to apply different strategies to improve speech recognition in background noise. Young listeners with normal hearing (NH), e.g., have been shown to follow the voice of a particular speaker based on the fundamental (F0) and formant frequencies, which are both influenced by the gender, age, and size of the speaker. However, the auditory and cognitive processes that underlie the extraction and discrimination of these voice cues across speakers may be subject to age-related decline. The present study aimed to examine the utilization of F0 and formant cues for voice discrimination (VD) in older adults with hearing expected for their age. Difference limens (DLs) for VD were estimated in 15 healthy older adults (65-78 years old) and 35 young adults (18-35 years old) using only F0 cues, only formant frequency cues, and a combination of F0 + formant frequencies. A three-alternative forced-choice paradigm with an adaptive-tracking threshold-seeking procedure was used. Wechsler backward digit span test was used as a measure of auditory working memory. Trail Making Test (TMT) was used to provide cognitive information reflecting a combined effect of processing speed, mental flexibility, and executive control abilities. The results showed that (a) the mean VD thresholds of the older adults were poorer than those of the young adults for all voice cues, although larger variability was observed among the older listeners; (b) both age groups found the formant cues more beneficial for VD, compared to the F0 cues, and the combined (F0 + formant) cues resulted in better thresholds, compared to each cue separately; (c) significant associations were found for the older adults in the combined F0 + formant condition between VD and TMT scores, and between VD and hearing sensitivity, supporting the notion that a decline with age in both top-down and bottom-up mechanisms may hamper the ability of older adults to discriminate between voices. The present findings suggest that older listeners may have difficulty following the voice of a specific speaker and thus implementing doing so as a strategy for listening amid noise. This may contribute to understanding their reported difficulty listening in adverse conditions.

RevDate: 2022-03-15

Paulino CEB, Silva HJD, Gomes AOC, et al (2022)

Relationship Between Oropharyngeal Geometry and Vocal Parameters in Subjects With Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00021-2 [Epub ahead of print].

OBJECTIVE: To verify whether the dimensions of different segments of the oropharyngeal cavity have different proportions between Parkinson's disease patients and vocally healthy subjects and investigate whether the measurements of these subjects' oropharyngeal geometry associate with their acoustic measurements of voice.

METHOD: Quantitative, descriptive, cross-sectional, and retrospective study with secondary data, approved by the Human Research Ethics Committee under no. 4.325.029. We used vocal samples and data from the oropharyngeal geometry of 40 subjects - 20 with Parkinson's disease stages I to III and 20 who formed the control group, matched for sex and age. Each group had 10 males and 10 females, mean age of 61 years (±6.0). Formant (F1, F2, and F3) and cepstral measures of the sustained vowel /ε/ were extracted and arranged in the database to determine their values using Praat software. The data were descriptively analyzed, with statistics generated with R software. The proportion of oropharyngeal geometry measurements was arranged by mean values and coefficients of variation. Pearson's linear correlation test was applied to relate voice parameters to oropharyngeal geometry, considering P < 0.05, and linear regression test, to justify F2.

RESULTS: The Parkinson's disease group showed a linear relationship between oral cavity length and F1 in males (P = 0.04) and between glottal area and F2 in females (P = 0.00); linear relationships were established according to age in both groups, and a regression model for F2 was estimated (R2 = 0.61). There was no difference between pathological and healthy voices; there was a difference in the proportional relationship of oropharyngeal geometry between the groups.

CONCLUSION: The proportional relationship of oropharyngeal geometry differs between the Parkinson's disease group and the control group, as well as the relationship between oropharyngeal geometry and formant and cepstral values of voice according to the subjects' sex and age.

RevDate: 2022-03-11

Jüchter C, Beutelmann R, GM Klump (2022)

Speech sound discrimination by Mongolian gerbils.

Hearing research, 418:108472 pii:S0378-5955(22)00043-0 [Epub ahead of print].

The present study establishes the Mongolian gerbil (Meriones unguiculatus) as a model for investigating the perception of human speech sounds. We report data on the discrimination of logatomes (CVCs - consonant-vowel-consonant combinations with outer consonants /b/, /d/, /s/ and /t/ and central vowels /a/, /aː/, /ɛ/, /eː/, /ɪ/, /iː/, /ɔ/, /oː/, /ʊ/ and /uː/, VCVs - vowel-consonant-vowel combinations with outer vowels /a/, /ɪ/ and /ʊ/ and central consonants /b/, /d/, /f/, /g/, /k/, /l/, /m/, /n/, /p/, /s/, /t/ and /v/) by gerbils. Four gerbils were trained to perform an oddball target detection paradigm in which they were required to discriminate a deviant CVC or VCV in a sequence of CVC or VCV standards, respectively. The experiments were performed with an ICRA-1 noise masker with speech-like spectral properties, and logatomes of multiple speakers were presented at various signal-to-noise ratios. Response latencies were measured to generate perceptual maps employing multidimensional scaling, which visualize the gerbils' internal maps of the sounds. The dimensions of the perceptual maps were correlated to multiple phonetic features of the speech sounds for evaluating which features of vowels and consonants are most important for the discrimination. The perceptual representation of vowels and consonants in gerbils was similar to that of humans, although gerbils needed higher signal-to-noise ratios for the discrimination of speech sounds than humans. The gerbils' discrimination of vowels depended on differences in the frequencies of the first and second formant determined by tongue height and position. Consonants were discriminated based on differences in combinations of their articulatory features. The similarities in the perception of logatomes by gerbils and humans renders the gerbil a suitable model for human speech sound discrimination.

RevDate: 2022-03-08

Tamura T, Tanaka Y, Watanabe Y, et al (2022)

Relationships between maximum tongue pressure and second formant transition in speakers with different types of dysarthria.

PloS one, 17(3):e0264995 pii:PONE-D-21-32058.

The effects of muscle weakness on speech are currently not fully known. We investigated the relationships between maximum tongue pressure and second formant transition in adults with different types of dysarthria. It focused on the slope in the second formant transition because it reflects the tongue velocity during articulation. Sixty-three Japanese speakers with dysarthria (median age, 68 years; interquartile range, 58-77 years; 44 men and 19 women) admitted to acute and convalescent hospitals were included. Thirty neurologically normal speakers aged 19-85 years (median age, 22 years; interquartile range, 21.0-23.8 years; 14 men and 16 women) were also included. The relationship between the maximum tongue pressure and speech function was evaluated using correlation analysis in the dysarthria group. Speech intelligibility, the oral diadochokinesis rate, and the second formant slope were based on the impaired speech index. More than half of the speakers had mild to moderate dysarthria. Speakers with dysarthria showed significantly lower maximum tongue pressure, speech intelligibility, oral diadochokinesis rate, and second formant slope than neurologically normal speakers. Only the second formant slope was significantly correlated with the maximum tongue pressure (r = 0.368, p = 0.003). The relationship between the second formant slope and maximum tongue pressure showed a similar correlation in the analysis of subgroups divided by sex. The oral diadochokinesis rate, which is related to the speed of articulation, is affected by voice on/off, mandibular opening/closing, and range of motion. In contrast, the second formant slope was less affected by these factors. These results suggest that the maximum isometric tongue strength is associated with tongue movement speed during articulation.

RevDate: 2022-03-07

Georgiou GP (2022)

Acoustic markers of vowels produced with different types of face masks.

Applied acoustics. Acoustique applique. Angewandte Akustik, 191:108691.

The wide spread of SARS-CoV-2 led to the extensive use of face masks in public places. Although masks offer significant protection from infectious droplets, they also impact verbal communication by altering speech signal. The present study examines how two types of face masks affect the speech properties of vowels. Twenty speakers were recorded producing their native vowels in a /pVs/ context, maintaining a normal speaking rate. Speakers were asked to produce the vowels in three conditions: (a) with a surgical mask, (b) with a cotton mask, and (c) without a mask. The speakers' output was analyzed through Praat speech acoustics software. We fitted three linear mixed-effects models to investigate the mask-wearing effects on the first formant (F1), second formant (F2), and duration of vowels. The results demonstrated that F1 and duration of vowels remained intact in the masked conditions compared to the unmasked condition, while F2 was altered for three out of five vowels (/e a u/) in the surgical mask and two out of five vowels (/e a/) in the cotton mask. So, both types of masks altered to some extent speech signal and they mostly affected the same vowel qualities. It is concluded that some acoustic properties are more sensitive than other to speech signal modification when speech is filtered through masks, while various sounds are affected in a different way. The findings may have significant implications for second/foreign language instructors who teach pronunciation and for speech therapists who teach sounds to individuals with language disorders.

RevDate: 2022-03-04

Anikin A, Pisanski K, D Reby (2022)

Static and dynamic formant scaling conveys body size and aggression.

Royal Society open science, 9(1):211496 pii:rsos211496.

When producing intimidating aggressive vocalizations, humans and other animals often extend their vocal tracts to lower their voice resonance frequencies (formants) and thus sound big. Is acoustic size exaggeration more effective when the vocal tract is extended before, or during, the vocalization, and how do listeners interpret within-call changes in apparent vocal tract length? We compared perceptual effects of static and dynamic formant scaling in aggressive human speech and nonverbal vocalizations. Acoustic manipulations corresponded to elongating or shortening the vocal tract either around (Experiment 1) or from (Experiment 2) its resting position. Gradual formant scaling that preserved average frequencies conveyed the impression of smaller size and greater aggression, regardless of the direction of change. Vocal tract shortening from the original length conveyed smaller size and less aggression, whereas vocal tract elongation conveyed larger size and more aggression, and these effects were stronger for static than for dynamic scaling. Listeners familiarized with the speaker's natural voice were less often 'fooled' by formant manipulations when judging speaker size, but paid more attention to formants when judging aggressive intent. Thus, within-call vocal tract scaling conveys emotion, but a better way to sound large and intimidating is to keep the vocal tract consistently extended.

RevDate: 2022-03-03

Haider CL, Suess N, Hauswald A, et al (2022)

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker.

NeuroImage pii:S1053-8119(22)00173-2 [Epub ahead of print].

Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.

RevDate: 2022-03-02

Hoyer P, Riedler M, Unterhofer C, et al (2022)

Vocal Tract and Subglottal Impedance in High Performance Singing: A Case Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00015-7 [Epub ahead of print].

OBJECTIVES/HYPOTHESIS: The respiratory process is important in vocal training and in professional singing, the airflow is highly important. It is hypothesized that subglottal resonances are important to the singing voice in high performance singing.

STUDY DESIGN: Single subject, prospective.

METHOD: A professional soprano singer shaped her vocal tract to form the vowels [a], [e], [i], [o], and [u] at the pitch d4. We measured phonated vowels and the vocal tract impedance spectra with a deterministic noise supplied by an iPhone buzzer in the range of 200 to 4,000 Hz at closed glottis, during exhalation and during inhalation while maintaining the shape of the vocal tract.

RESULTS: Measurements of the phonated vowels before and after the different glottal adjustments were highly reproducible. Vocal tract resonances and the ones resulting during respiration are reported. The impedance spectra show vowel dependent resonances with closed and open glottis. The formants of the vocal spectra are explained by including both, the vocal tract, and the subglottal resonances.

CONCLUSION: The findings indicate that subglottal resonances influence the first formant as well as the singers's formant cluster in high-performance singing. The instrumental setup used for the impedance measurement allows a simple and lightweight procedure for a measurement of vocal tract and subglottal resonances.

RevDate: 2022-03-02

Saba JN, JHL Hansen (2022)

The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.

The Journal of the Acoustical Society of America, 151(2):1007.

Natural compensation of speech production in challenging listening environments is referred to as the Lombard effect (LE). The resulting acoustic differences between neutral and Lombard speech have been shown to provide intelligibility benefits for normal hearing (NH) and cochlear implant (CI) listeners alike. Motivated by this outcome, three LE perturbation approaches consisting of pitch, duration, formant, intensity, and spectral contour modifications were designed specifically for CI listeners to combat speech-in-noise performance deficits. Experiment 1 analyzed the effects of loudness, quality, and distortion of approaches on speech intelligibility with and without formant-shifting. Significant improvements of +9.4% were observed in CI listeners without the formant-shifting approach at +5 dB signal-to-noise ratio (SNR) large-crowd-noise (LCN) when loudness was controlled, however, performance was found to be significantly lower for NH listeners. Experiment 2 evaluated the non-formant-shifting approach with additional spectral contour and high pass filtering to reduce spectral smearing and decrease distortion observed in Experiment 1. This resulted in significant intelligibility benefits of +30.2% for NH and +21.2% for CI listeners at 0 and +5 dB SNR LCN, respectively. These results suggest that LE perturbation may be useful as front-end speech modification approaches to improve intelligibility for CI users in noise.

RevDate: 2022-02-15

Nguyen DD, Chacon A, Payten C, et al (2022)

Acoustic characteristics of fricatives, amplitude of formants and clarity of speech produced without and with a medical mask.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Previous research has found that high-frequency energy of speech signals decreased while wearing face masks. However, no study has examined the specific spectral characteristics of fricative consonants and vowels and the perception of clarity of speech in mask wearing.

AIMS: To investigate acoustic-phonetic characteristics of fricative consonants and vowels and auditory perceptual rating of clarity of speech produced with and without wearing a face mask.

METHODS & PROCEDURES: A total of 16 healthcare workers read the Rainbow Passage using modal phonation in three conditions: without a face mask, with a standard surgical mask and with a KN95 mask (China GB2626-2006, a medical respirator with higher barrier level than the standard surgical mask). Speech samples were acoustically analysed for root mean square (RMS) amplitude (ARMS) and spectral moments of four fricatives /f/, /s/, /ʃ/ and /z/; and amplitude of the first three formants (A1, A2 and A3) measured from the reading passage and extracted vowels. Auditory perception of speech clarity was performed. Data were compared across mask and non-mask conditions using linear mixed models.

OUTCOMES & RESULTS: The ARMS of all included fricatives was significantly lower in surgical mask and KN95 mask compared with non-mask condition. Centre of gravity of /f/ decreased in both surgical and KN95 mask while other spectral moments did not show systematic significant linear trends across mask conditions. None of the formant amplitude measures was statistically different across conditions. Speech clarity was significantly poorer in both surgical and KN95 mask conditions.

Speech produced while wearing either a surgical mask or KN95 mask was associated with decreased fricative amplitude and poorer speech clarity.

WHAT THIS PAPER ADDS: What is already known on the subject Previous studies have shown that the overall spectral levels in high frequency ranges and intelligibility are decreased for speech produced with a face mask. It is unclear how different types of the speech signals that is, fricatives and vowels are presented in speech produced with wearing either a medical surgical or KN95 mask. It is also unclear whether ratings of speech clarity are similar for speech produced with these face masks. What this paper adds to existing knowledge Speech data collected using a real-world, clinical and non-laboratory-controlled settings showed differences in the amplitude of fricatives and speech clarity ratings between non-mask and mask-wearing conditions. Formant amplitude did not show significant differences in mask-wearing conditions compared with non-mask. What are the potential or actual clinical implications of this work? Wearing a surgical mask or a KN95 mask had different effects on consonants and vowels. It appeared from the findings in this study that these masks only affected fricative consonants and did not affect vowel production. The poorer speech clarity in these mask-wearing conditions has important implications for speech perception in communication between clinical staff and between medical officers and patients in clinics, and between people in everyday situations. The impact of these masks on speech perception may be more pronounced in people with hearing impairment and communication disorders. In voice evaluation and/or therapy sessions, the effects of wearing a medical mask can occur bidirectionally for both the clinician and the patient. The patient may find it more challenging to understand the speech conveyed by the clinician while the clinician may not perceptually assess patient's speech and voice accurately. Given the significant correlation between clarity ratings and fricative amplitude, improving fricative signals would be useful to improve speech clarity while wearing these medical face masks.

RevDate: 2022-02-10

Gábor A, Kaszás N, Faragó T, et al (2022)

The acoustic bases of human voice identity processing in dogs.

Animal cognition [Epub ahead of print].

Speech carries identity-diagnostic acoustic cues that help individuals recognize each other during vocal-social interactions. In humans, fundamental frequency, formant dispersion and harmonics-to-noise ratio serve as characteristics along which speakers can be reliably separated. The ability to infer a speaker's identity is also adaptive for members of other species (like companion animals) for whom humans (as owners) are relevant. The acoustic bases of speaker recognition in non-humans are unknown. Here, we tested whether dogs can recognize their owner's voice and whether they rely on the same acoustic parameters for such recognition as humans use to discriminate speakers. Stimuli were pre-recorded sentences spoken by the owner and control persons, played through loudspeakers placed behind two non-transparent screens (with each screen hiding a person). We investigated the association between acoustic distance of speakers (examined along several dimensions relevant in intraspecific voice identification) and dogs' behavior. Dogs chose their owner's voice more often than that of control persons', suggesting that they can identify it. Choosing success and time spent looking in the direction of the owner's voice were positively associated, showing that looking time is an index of the ease of choice. Acoustic distance of speakers in mean fundamental frequency and jitter were positively associated with looking time, indicating that the shorter the acoustic distance between speakers with regard to these parameters, the harder the decision. So, dogs use these cues to discriminate their owner's voice from unfamiliar voices. These findings reveal that dogs use some but probably not all acoustic parameters that humans use to identify speakers. Although dogs can detect fine changes in speech, their perceptual system may not be fully attuned to identity-diagnostic cues in the human voice.

RevDate: 2022-02-07

Rishiq D, Harkrider AW, Springer C, et al (2022)

Effects of Spectral Shaping on Speech Auditory Brainstem Responses to Stop Consonant-Vowel Syllables.

Journal of the American Academy of Audiology [Epub ahead of print].

BACKGROUND: Spectral shaping is employed by hearing aids to make consonantal information, such as formant transitions, audible for listeners with hearing loss. How manipulations of the stimuli, such as spectral shaping, may alter encoding in the auditory brainstem has not been thoroughly studied.

PURPOSE: To determine how spectral shaping of synthetic consonant-vowel (CV) syllables, varying in their second formant (F2) onset frequency, may affect encoding of the syllables in the auditory brainstem.

RESEARCH DESIGN: We employed a repeated measure design.

STUDY SAMPLE: Sixteen young adults (mean = 20.94 years, 6 males) and 11 older adults (mean = 58.60 years, 4 males) participated in this study.

DATA COLLECTION AND ANALYSIS: Speech-evoked auditory brainstem responses (speech-ABRs) were obtained from each participant using three CV exemplars selected from synthetic stimuli generated for a /ba-da-ga/ continuum. Brainstem responses were also recorded to corresponding three CV exemplars that were spectrally shaped to decrease low-frequency information and provide gain for middle and high frequencies according to a Desired Sensation Level function. In total, six grand average waveforms [3 phonemes (/ba/, /da/, /ga/) X 2 shaping conditions (unshaped, shaped)] were produced for each participant. Peak latencies and amplitudes, referenced to pre-stimulus baseline, were identified for 15 speech-ABR peaks. Peaks were marked manually using the program cursor on each individual waveform. Repeated-measures ANOVAs were used to determine the effects of shaping on the latencies and amplitudes of the speech-ABR peaks.

RESULTS: Shaping effects produced changes within participants in ABR latencies and amplitudes involving onset and major peaks of the speech-ABR waveform for certain phonemes. Specifically, data from onset peaks showed that shaping decreased latency for /ga/ in older listeners, and decreased amplitude onset for /ba/ in younger listeners. Shaping also increased the amplitudes of major peaks for /ga/ stimuli in both groups.

CONCLUSIONS: Encoding of speech in the ABR waveform may be more complex and multidimensional than a simple demarcation of source and filter information, and may also be influenced by cue intensity and age. These results suggest a more complex subcortical encoding of vocal tract filter information in the ABR waveform, which may also be influenced by cue intensity and age.

RevDate: 2022-02-04

Easwar V, Boothalingam S, E Wilson (2022)

Sensitivity of Vowel-Evoked Envelope Following Responses to Spectra and Level of Preceding Phoneme Context.

Ear and hearing pii:00003446-900000000-98357 [Epub ahead of print].

OBJECTIVE: Vowel-evoked envelope following responses (EFRs) could be a useful noninvasive tool for evaluating neural activity phase-locked to the fundamental frequency of voice (f0). Vowel-evoked EFRs are often elicited by vowels in consonant-vowel syllables or words. Considering neural activity is susceptible to temporal masking, EFR characteristics elicited by the same vowel may vary with the features of the preceding phoneme. To this end, the objective of the present study was to evaluate the influence of the spectral and level characteristics of the preceding phoneme context on vowel-evoked EFRs.

DESIGN: EFRs were elicited by a male-spoken /i/ (stimulus; duration = 350 msec), modified to elicit two EFRs, one from the region of the first formant (F1) and one from the second and higher formants (F2+). The stimulus, presented at 65 dB SPL, was preceded by one of the four contexts: /∫/, /m/, /i/ or a silent gap of duration equal to that of the stimulus. The level of the context phonemes was either 50 or 80 dB SPL, 15 dB lower and higher than the level of the stimulus /i/. In a control condition, EFRs to the stimulus /i/ were elicited in isolation without any preceding phoneme contexts. The stimulus and the contexts were presented monaurally to a randomly chosen test ear in 21 young adults with normal hearing. EFRs were recorded using single-channel electroencephalogram between the vertex and the nape.

RESULTS: A repeated measures analysis of variance indicated a significant three-way interaction between context type (/∫/, /i/, /m/, silent gap), level (50, 80 dB SPL), and EFR-eliciting formant (F1, F2+). Post hoc analyses indicated no influence of the preceding phoneme context on F1-elicited EFRs. Relative to a silent gap as the preceding context, F2+-elicited EFRs were attenuated by /∫/ and /m/ presented at 50 and 80 dB SPL, as well as by /i/ presented at 80 dB SPL. The average attenuation ranged from 14.9 to 27.9 nV. When the context phonemes were presented at matched levels of 50 or 80 dB SPL, F2+-elicited EFRs were most often attenuated when preceded by /∫/. At 80 dB SPL, relative to the silent preceding gap, the average attenuation was 15.7 nV, and at 50 dB SPL, relative to the preceding context phoneme /i/, the average attenuation was 17.2 nV.

CONCLUSION: EFRs elicited by the second and higher formants of /i/ are sensitive to the spectral and level characteristics of the preceding phoneme context. Such sensitivity, measured as an attenuation in the present study, may influence the comparison of EFRs elicited by the same vowel in different consonant-vowel syllables or words. However, the degree of attenuation with realistic context levels exceeded the minimum measurable change only 12% of the time. Although the impact of the preceding context is statistically significant, it is likely to be clinically insignificant a majority of the time.

RevDate: 2022-02-03

Chiu C, Weng Y, BW Chen (2021)

Tongue Postures and Tongue Centers: A Study of Acoustic-Articulatory Correspondences Across Different Head Angles.

Frontiers in psychology, 12:768754.

Recent research on body and head positions has shown that postural changes may induce varying degrees of changes on acoustic speech signals and articulatory gestures. While the preservation of formant profiles across different postures is suitably accounted for by the two-tube model and perturbation theory, it remains unclear whether it is resulted from the accommodation of tongue postures. Specifically, whether the tongue accommodates the changes in head angle to maintain the target acoustics is yet to be determined. The present study examines vowel acoustics and their correspondence with the articulatory maneuvers of the tongue, including both tongue postures and movements of the tongue center, across different head angles. The results show that vowel acoustics, including pitch and formants, are largely unaffected by upward or downward tilting of the head. These preserved acoustics may be attributed to the lingual gestures that compensate for the effects of gravity. Our results also reveal that the tongue postures in response to head movements appear to be vowel-dependent, and the tongue center may serve as an underlying drive that covariates with the head angle changes. These results imply a close relationship between vowel acoustics and tongue postures as well as a target-oriented strategy for different head angles.

RevDate: 2022-02-02

Merritt B, T Bent (2022)

Revisiting the acoustics of speaker gender perception: A gender expansive perspective.

The Journal of the Acoustical Society of America, 151(1):484.

Examinations of speaker gender perception have primarily focused on the roles of fundamental frequency (fo) and formant frequencies from structured speech tasks using cisgender speakers. Yet, there is evidence to suggest that fo and formants do not fully account for listeners' perceptual judgements of gender, particularly from connected speech. This study investigated the perceptual importance of fo, formant frequencies, articulation, and intonation in listeners' judgements of gender identity and masculinity/femininity from spontaneous speech from cisgender male and female speakers as well as transfeminine and transmasculine speakers. Stimuli were spontaneous speech samples from 12 speakers who are cisgender (6 female and 6 male) and 12 speakers who are transgender (6 transfeminine and 6 transmasculine). Listeners performed a two-alternative forced choice (2AFC) gender identification task and masculinity/femininity rating task in two experiments that manipulated which acoustic cues were available. Experiment 1 confirmed that fo and formant frequency manipulations were insufficient to alter listener judgements across all speakers. Experiment 2 demonstrated that articulatory cues had greater weighting than intonation cues on the listeners' judgements when the fo and formant frequencies were in a gender ambiguous range. These findings counter the assumptions that fo and formant manipulations are sufficient to effectively alter perceived speaker gender.

RevDate: 2022-02-01

Kim Y, Chung H, A Thompson (2022)

Acoustic and Articulatory Characteristics of English Semivowels /ɹ, l, w/ Produced by Adult Second-Language Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study presents the results of acoustic and kinematic analyses of word-initial semivowels (/ɹ, l, w/) produced by second-language (L2) speakers of English whose native language is Korean. In addition, the relationship of acoustic and kinematic measures to the ratings of foreign accent was examined by correlation analyses.

METHOD: Eleven L2 speakers and 10 native speakers (first language [L1]) of English read The Caterpillar passage. Acoustic and kinematic data were simultaneously recorded using an electromagnetic articulography system. In addition to speaking rate, two acoustic measures (ratio of third-formant [F3] frequency to second-formant [F2] frequency and duration of steady states of F2) and two kinematic measures (lip aperture and duration of lingual maximum hold) were obtained from individual target sounds. To examine the degree of contrast among the three sounds, acoustic and kinematic Euclidean distances were computed on the F2-F3 and x-y planes, respectively.

RESULTS: Compared with L1 speakers, L2 speakers exhibited a significantly slower speaking rate. For the three semivowels, L2 speakers showed a reduced F3/F2 ratio during constriction, increased lip aperture, and reduced acoustic Euclidean distances among semivowels. Additionally, perceptual ratings of foreign accent were significantly correlated with three measures: duration of steady F2, acoustic Euclidean distance, and kinematic Euclidean distance.

CONCLUSIONS: The findings provide acoustic and kinematic evidence for challenges that L2 speakers experience in the production of English semivowels, especially /ɹ/ and /w/. The robust and consistent finding of reduced contrasts among semivowels and their correlations with perceptual accent ratings suggests using sound contrasts as a potentially effective approach to accent modification paradigms.

RevDate: 2022-01-30

Takemoto N, Sanuki T, Esaki S, et al (2022)

Rabbit model with vocal fold hyperadduction.

Auris, nasus, larynx pii:S0385-8146(22)00026-8 [Epub ahead of print].

OBJECTIVE: Adductor spasmodic dysphonia (AdSD) is caused by hyperadduction of the vocal folds during phonation, resulting in a strained voice. Animal models are not yet used to elucidate this intractable disease because AdSD has a difficult pathology without a definitive origin. For the first step, we established an animal model with vocal fold hyperadduction and evaluated its validity by assessing laryngeal function.

METHODS: In this experimental animal study, three adult Japanese 20-week-old rabbits were used. The models were created using a combination of cricothyroid approximation, forced airflow, and electrical stimulation of the recurrent laryngeal nerves (RLNs). Cricothyroid approximation was added to produce a glottal slit. Thereafter, both RLNs were electrically stimulated to induce vocal fold hyperadduction. Finally, the left RLN was transected to relieve hyperadduction. The sound, endoscopic images, and subglottal pressure were recorded, and acoustic analysis was performed.

RESULTS: Subglottal pressure increased significantly, and the strained sound was produced after the electrical stimulation of the RLNs. After transecting the left RLN, the subglottal pressure decreased significantly, and the strained sound decreased. Acoustic analysis revealed an elevation of the standard deviation of F0 (SDF0) and degree of voice breaks (DVB) through stimulation of the RLNs, and degradation of SDF0 and DVB through RLN transection. Formant bands in the sound spectrogram were interrupted by the stimulation and appeared again after the RLN section.

CONCLUSION: This study developed a rabbit model with vocal fold hyperadduction . The subglottal pressure and acoustic analysis of this model resembled the characteristics of patients with AdSD. This model could be helpful to elucidate the pathology of the larynx caused by hyperadduction, and evaluate and compare the treatments for strained phonation.

RevDate: 2022-01-28

Heeringa AN, C Köppl (2022)

Auditory nerve fiber discrimination and representation of naturally-spoken vowels in noise.

eNeuro pii:ENEURO.0474-21.2021 [Epub ahead of print].

To understand how vowels are encoded by auditory nerve fibers, a number of representation schemes have been suggested that extract the vowel's formant frequencies from auditory nerve-fiber spiking patterns. The current study aims to apply and compare these schemes for auditory nerve-fiber responses to naturally-spoken vowels in a speech-shaped background noise. Responses to three vowels were evaluated; based on behavioral experiments in the same species, two of these were perceptually difficult to discriminate from each other (/e/vs/i/) and one was perceptually easy to discriminate from the other two (/a:/).Single-unit auditory nerve fibers were recorded from ketamine/xylazine-anesthetized Mongolian gerbils of either sex (n = 8). First, single-unit discrimination between the three vowels was studied. Compared to the perceptually easy discriminations, the average spike timing-based discrimination values were significantly lower for the perceptually difficult vowel discrimination. This was not true for an average rate-based discrimination metric, the rate d-prime. Consistently, spike timing-based representation schemes, plotting the temporal responses of all recorded units as a function of their best frequency, i.e. dominant component schemes, average localized interval rate, and fluctuation profiles, revealed representation of the vowel's formant frequencies, whereas no such representation was apparent in the rate-based excitation pattern.Making use of perceptual discrimination data, this study reveals that discrimination difficulties of naturally-spoken vowels in speech-shaped noise originate peripherally and can be studied in the spike timing patterns of single auditory nerve fibers.Significance statementUnderstanding speech in noisy environments is an everyday challenge. This study investigates how single auditory nerve fibers, recorded in the Mongolian gerbil, discriminate and represent naturally-spoken vowels in a noisy background approximating real-life situations. Neural discrimination metrics were compared to the known behavioral performance by the same species, comparing easy to difficult vowel discriminations. A spike-timing-based discrimination metric agreed well with perceptual performance, while mean discharge rate was a poor predictor. Furthermore, only spike-timing-based, but not the rate-based, representation schemes revealed peaks at the formant frequencies, which are paramount for perceptual vowel identification and discrimination. This study reveals that vowel discrimination difficulties in noise originate peripherally and can be studied in the spike-timing patterns of single auditory nerve fibers.

RevDate: 2022-01-25

Yüksel M (2022)

Reliability and Efficiency of Pitch-Shifting Plug-Ins in Voice and Hearing Research.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Auditory feedback perturbation with voice pitch manipulation has been widely used in previous studies. There are several hardware and software tools for such manipulations, but audio plug-ins developed for music, movies, and radio applications that operate in digital audio workstations may be extremely beneficial and are easy to use, accessible, and cost effective. However, it is unknown whether these plug-ins can perform similarly to tools that have been described in previous literature. Hence, this study aimed to evaluate the reliability and efficiency of these plug-ins.

METHOD: Six different plug-ins were used at +1 and -1 st pitch shifting with formant correction on and off to pitch shift the sustained /ɑ/ voice recording sample of 12 healthy participants (six cisgender males and six cisgender females). Pitch-shifting accuracy, formant shifting amount, intensity changes, and total latency values were reported.

RESULTS: Some variability was observed between different plug-ins and pitch shift settings. One plug-in managed to perform similarly in all four measured aspects with well-known hardware and software units with 1-cent pitch-shifting accuracy, low latency values, negligible intensity difference, and preserved formants. Other plug-ins performed similarly in some respects.

CONCLUSIONS: Audio plug-ins may be used effectively in pitch-shifting applications. Researchers and clinicians can access these plug-ins easily and test whether the features also fit their aims.

RevDate: 2022-01-21

Suess N, Hauswald A, Reisinger P, et al (2022)

Cortical Tracking of Formant Modulations Derived from Silently Presented Lip Movements and Its Decline with Age.

Cerebral cortex (New York, N.Y. : 1991) pii:6513733 [Epub ahead of print].

The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.

RevDate: 2022-01-17

Almaghrabi SA, Thewlis D, Thwaites S, et al (2022)

The reproducibility of bio-acoustic features is associated with sample duration, speech task and gender.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society, PP: [Epub ahead of print].

Bio-acoustic properties of speech show evolving value in analyzing psychiatric illnesses. Obtaining a sufficient speech sample length to quantify these properties is essential, but the impact of sample duration on the stability of bio-acoustic features has not been systematically explored. We aimed to evaluate bio-acoustic features' reproducibility against changes in speech durations and tasks. We extracted source, spectral, formant, and prosodic features in 185 English-speaking adults (98 w, 87 m) for reading-a-story and counting tasks. We compared features at 25% of the total sample duration of the reading task to those obtained from non-overlapping randomly selected sub-samples shortened to 75%, 50%, and 25% of total duration using intraclass correlation coefficients. We also compared the features extracted from entire recordings to those measured at 25% of the duration and features obtained from 50% of the duration. Further, we compared features extracted from reading-a-story to counting tasks. Our results show that the number of reproducible features (out of 125) decreased stepwise with duration reduction. Spectral shape, pitch, and formants reached excellent reproducibility. Mel-frequency cepstral coefficients (MFCCs), loudness, and zero-crossing rate achieved excellent reproducibility only at a longer duration. Reproducibility of source, MFCC derivatives, and voicing probability (VP) was poor. Significant gender differences existed in jitter, MFCC first-derivative, spectral skewness, pitch, VP, and formants. Around 97% of features in both genders were not reproducible across speech tasks, in part due to the short counting task duration. In conclusion, bio-acoustic features are less reproducible in shorter samples and are affected by gender.

RevDate: 2022-01-10

Gaines JL, Kim KS, Parrell B, et al (2021)

Discrete constriction locations describe a comprehensive range of vocal tract shapes in the Maeda model.

JASA express letters, 1(12):124402.

The Maeda model was used to generate a large set of vocoid-producing vocal tract configurations. The resulting dataset (a) produced a comprehensive range of formant frequencies and (b) displayed discrete tongue body constriction locations (palatal, velar/uvular, and lower pharyngeal). The discrete parameterization of constriction location across the vowel space suggests this is likely a fundamental characteristic of the human vocal tract, and not limited to any specific set of vowel contrasts. These findings suggest that in addition to established articulatory-acoustic constraints, fundamental biomechanical constraints of the vocal tract may also explain such discreteness.

RevDate: 2022-01-06

Cheng FY, Xu C, Gold L, et al (2021)

Rapid Enhancement of Subcortical Neural Responses to Sine-Wave Speech.

Frontiers in neuroscience, 15:747303.

The efferent auditory nervous system may be a potent force in shaping how the brain responds to behaviorally significant sounds. Previous human experiments using the frequency following response (FFR) have shown efferent-induced modulation of subcortical auditory function online and over short- and long-term time scales; however, a contemporary understanding of FFR generation presents new questions about whether previous effects were constrained solely to the auditory subcortex. The present experiment used sine-wave speech (SWS), an acoustically-sparse stimulus in which dynamic pure tones represent speech formant contours, to evoke FFRSWS. Due to the higher stimulus frequencies used in SWS, this approach biased neural responses toward brainstem generators and allowed for three stimuli (/bɔ/, /bu/, and /bo/) to be used to evoke FFRSWS before and after listeners in a training group were made aware that they were hearing a degraded speech stimulus. All SWS stimuli were rapidly perceived as speech when presented with a SWS carrier phrase, and average token identification reached ceiling performance during a perceptual training phase. Compared to a control group which remained naïve throughout the experiment, training group FFRSWS amplitudes were enhanced post-training for each stimulus. Further, linear support vector machine classification of training group FFRSWS significantly improved post-training compared to the control group, indicating that training-induced neural enhancements were sufficient to bolster machine learning classification accuracy. These results suggest that the efferent auditory system may rapidly modulate auditory brainstem representation of sounds depending on their context and perception as non-speech or speech.

RevDate: 2022-01-04

Meykadeh A, Golfam A, Nasrabadi AM, et al (2021)

First Event-Related Potentials Evidence of Auditory Morphosyntactic Processing in a Subject-Object-Verb Nominative-Accusative Language (Farsi).

Frontiers in psychology, 12:698165.

While most studies on neural signals of online language processing have focused on a few-usually western-subject-verb-object (SVO) languages, corresponding knowledge on subject-object-verb (SOV) languages is scarce. Here we studied Farsi, a language with canonical SOV word order. Because we were interested in the consequences of second-language acquisition, we compared monolingual native Farsi speakers and equally proficient bilinguals who had learned Farsi only after entering primary school. We analyzed event-related potentials (ERPs) to correct and morphosyntactically incorrect sentence-final syllables in a sentence correctness judgment task. Incorrect syllables elicited a late posterior positivity at 500-700 ms after the final syllable, resembling the P600 component, as previously observed for syntactic violations at sentence-middle positions in SVO languages. There was no sign of a left anterior negativity (LAN) preceding the P600. Additionally, we provide evidence for a real-time discrimination of phonological categories associated with morphosyntactic manipulations (between 35 and 135 ms), manifesting the instantaneous neural response to unexpected perturbations. The L2 Farsi speakers were indistinguishable from L1 speakers in terms of performance and neural signals of syntactic violations, indicating that exposure to a second language at school entry may results in native-like performance and neural correlates. In nonnative (but not native) speakers verbal working memory capacity correlated with the late posterior positivity and performance accuracy. Hence, this first ERP study of morphosyntactic violations in a spoken SOV nominative-accusative language demonstrates ERP effects in response to morphosyntactic violations and the involvement of executive functions in non-native speakers in computations of subject-verb agreement.

RevDate: 2021-12-30

Yamada Y, Shinkawa K, Nemoto M, et al (2021)

Automatic Assessment of Loneliness in Older Adults Using Speech Analysis on Responses to Daily Life Questions.

Frontiers in psychiatry, 12:712251.

Loneliness is a perceived state of social and emotional isolation that has been associated with a wide range of adverse health effects in older adults. Automatically assessing loneliness by passively monitoring daily behaviors could potentially contribute to early detection and intervention for mitigating loneliness. Speech data has been successfully used for inferring changes in emotional states and mental health conditions, but its association with loneliness in older adults remains unexplored. In this study, we developed a tablet-based application and collected speech responses of 57 older adults to daily life questions regarding, for example, one's feelings and future travel plans. From audio data of these speech responses, we automatically extracted speech features characterizing acoustic, prosodic, and linguistic aspects, and investigated their associations with self-rated scores of the UCLA Loneliness Scale. Consequently, we found that with increasing loneliness scores, speech responses tended to have less inflections, longer pauses, reduced second formant frequencies, reduced variances of the speech spectrum, more filler words, and fewer positive words. The cross-validation results showed that regression and binary-classification models using speech features could estimate loneliness scores with an R 2 of 0.57 and detect individuals with high loneliness scores with 95.6% accuracy, respectively. Our study provides the first empirical results suggesting the possibility of using speech data that can be collected in everyday life for the automatic assessments of loneliness in older adults, which could help develop monitoring technologies for early detection and intervention for mitigating loneliness.

RevDate: 2021-12-20

Zheng Z, Li K, Feng G, et al (2021)

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition.

Frontiers in neuroscience, 15:744959.

Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43-84.82%, 76.27-95.24%, and 96.58%, respectively; for consonant recognition 35.49-63.77%, 67.75-78.87%, and 87.87%; for lexical tone recognition 60.80-97.15%, 73.16-96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Conclusion: Regions that contributed most for vowel recognition was Region 2 (502-1,022 Hz) that contains first formant (F1) information; Region 5 (3,856-7,562 Hz) contributed most to consonant recognition; Region 1 (80-502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.

RevDate: 2021-12-13

Cap H, Deleporte P, Joachim J, et al (2008)

Male vocal behavior and phylogeny in deer.

Cladistics : the international journal of the Willi Hennig Society, 24(6):917-931.

The phylogenetic relationships among 11 species of the Cervidae family were inferred from an analysis of male vocalizations. Eighteen characters, including call types (e.g. antipredator barks, mating loudcalls) and acoustic characteristics (call composition, fundamental frequency and formant frequencies), were used for phylogeny inference. The resulting topology and the phylogenetic consistency of behavioral characters were compared with those of current molecular phylogenies of Cervidae and with separate and simultaneous parsimony analyses of molecular and behavioral data. Our results indicate that male vocalizations constitute plausible phylogenetic characters in this taxon. Evolutionary scenarios for the vocal characters are discussed in relation with associated behaviors.

RevDate: 2021-12-03

Sundberg J, Lindblom B, AM Hefele (2021)

Voice source, formant frequencies and vocal tract shape in overtone singing. A case study.

Logopedics, phoniatrics, vocology [Epub ahead of print].

Purpose: In overtone singing a singer produces two pitches simultaneously, a low-pitched, continuous drone plus a melody played on the higher, flutelike and strongly enhanced overtones of the drone. The purpose of this study was to analyse underlying acoustical, phonatory and articulatory phenomena.Methods: The voice source was analyzed by inverse filtering the sound, the articulation from a dynamic MRI video of the vocal tract profile, and the lip opening from a frontal-view video recording. Vocal tract cross-distances were measured in the MR recording and converted to area functions, the formant frequencies of which computed.Results: Inverse filtering revealed that the overtone enhancement resulted from a close clustering of formants 2 and 3. The MRI material showed that for low enhanced overtone frequencies (FE) the tongue tip was raised and strongly retracted, while for high FE the tongue tip was less retracted but forming a longer constriction. Thus, the tongue configuration changed from an apical/anterior to a dorsal/posterior articulation. The formant frequencies derived from the area functions matched almost perfectly those used for the inverse filtering. Further, analyses of the area functions revealed that the second formant frequency was strongly dependent on the back cavity, and the third on the front cavity, which acted like a Helmholtz resonator, tuned by the tongue tip position and lip opening.Conclusions: This type of overtone singing can be fully explained by the well-established source-filter theory of voice production, as recently found by Bergevin et al. [1] for another type of overtone singing.

RevDate: 2021-12-02

Roberts B, Summers RJ, PJ Bailey (2021)

Mandatory dichotic integration of second-formant information: Contralateral sine bleats have predictable effects on consonant place judgments.

The Journal of the Acoustical Society of America, 150(5):3693.

Speech-on-speech informational masking arises because the interferer disrupts target processing (e.g., capacity limitations) or corrupts it (e.g., intrusions into the target percept); the latter should produce predictable errors. Listeners identified the consonant in monaural buzz-excited three-formant analogues of approximant-vowel syllables, forming a place of articulation series (/w/-/l/-/j/). There were two 11-member series; the vowel was either high-front or low-back. Series members shared formant-amplitude contours, fundamental frequency, and F1+F3 frequency contours; they were distinguished solely by the F2 frequency contour before the steady portion. Targets were always presented in the left ear. For each series, F2 frequency and amplitude contours were also used to generate interferers with altered source properties-sine-wave analogues of F2 (sine bleats) matched to their buzz-excited counterparts. Accompanying each series member with a fixed mismatched sine bleat in the contralateral ear produced systematic and predictable effects on category judgments; these effects were usually largest for bleats involving the fastest rate or greatest extent of frequency change. Judgments of isolated sine bleats using the three place labels were often unsystematic or arbitrary. These results indicate that informational masking by interferers involved corruption of target processing as a result of mandatory dichotic integration of F2 information, despite the grouping cues disfavoring this integration.

RevDate: 2021-12-02

Lodermeyer A, Bagheri E, Kniesburges S, et al (2021)

The mechanisms of harmonic sound generation during phonation: A multi-modal measurement-based approach.

The Journal of the Acoustical Society of America, 150(5):3485.

Sound generation during voiced speech remains an open research topic because the underlying process within the human larynx is hardly accessible for direct measurements. In the present study, harmonic sound generation during phonation was investigated with a model that replicates the fully coupled fluid-structure-acoustic interaction (FSAI). The FSAI was captured using a multi-modal approach by measuring the flow and acoustic source fields based on particle image velocimetry, as well as the surface velocity of the vocal folds based on laser vibrometry and high-speed imaging. Strong harmonic sources were localized near the glottis, as well as further downstream, during the presence of the supraglottal jet. The strongest harmonic content of the vocal fold surface motion was verified for the area near the glottis, which directly interacts with the glottal jet flow. Also, the acoustic back-coupling of the formant frequencies onto the harmonic oscillation of the vocal folds was verified. These findings verify that harmonic sound generation is the result of a strong interrelation between the vocal fold motion, modulated flow field, and vocal tract geometry.

RevDate: 2021-12-02

Barreda S, PF Assmann (2021)

Perception of gender in children's voices.

The Journal of the Acoustical Society of America, 150(5):3949.

To investigate the perception of gender from children's voices, adult listeners were presented with /hVd/ syllables, in isolation and in sentence context, produced by children between 5 and 18 years. Half the listeners were informed of the age of the talker during trials, while the other half were not. Correct gender identifications increased with talker age; however, performance was above chance even for age groups where the cues most often associated with gender differentiation (i.e., average fundamental frequency and formant frequencies) were not consistently different between boys and girls. The results of acoustic models suggest that cues were used in an age-dependent manner, whether listeners were explicitly told the age of the talker or not. Overall, results are consistent with the hypothesis that talker age and gender are estimated jointly in the process of speech perception. Furthermore, results show that the gender of individual talkers can be identified accurately well before reliable anatomical differences arise in the vocal tracts of females and males. In general, results support the notion that the transmission of gender information from voice depends substantially on gender-dependent patterns of articulation, rather than following deterministically from anatomical differences between male and female talkers.

RevDate: 2021-11-27

Hedwig D, Poole J, P Granli (2021)

Does Social Complexity Drive Vocal Complexity? Insights from the Two African Elephant Species.

Animals : an open access journal from MDPI, 11(11): pii:ani11113071.

The social complexity hypothesis (SCH) for communication states that the range and frequency of social interactions drive the evolution of complex communication systems. Surprisingly, few studies have empirically tested the SHC for vocal communication systems. Filling this gap is important because a co-evolutionary runaway process between social and vocal complexity may have shaped the most intricate communication system, human language. We here propose the African elephant Loxodonta spec. as an excellent study system to investigate the relationships between social and vocal complexity. We review how the distinct differences in social complexity between the two species of African elephants, the forest elephant L. cyclotis and the savanna elephant L. africana, relate to repertoire size and structure, as well as complex communication skills in the two species, such as call combination or intentional formant modulation including the trunk. Our findings suggest that Loxodonta may contradict the SCH, as well as other factors put forth to explain patterns of vocal complexity across species. We propose that life history traits, a factor that has gained little attention as a driver of vocal complexity, and the extensive parental care associated with a uniquely low and slow reproductive rate, may have led to the emergence of pronounced vocal complexity in the forest elephant despite their less complex social system compared to the savanna elephant. Conclusions must be drawn cautiously, however. A better understanding of vocal complexity in the genus Loxodonta will depend on continuing advancements in remote data collection technologies to overcome the challenges of observing forest elephants in their dense rainforest habitat, as well as the availability of directly comparable data and methods, quantifying both structural and contextual variability in the production of rumbles and other vocalizations in both species of African elephants.

RevDate: 2021-11-23

Du X, Zhang X, Wang Y, et al (2021)

Highly sensitive detection of plant growth regulators by using terahertz time-domain spectroscopy combined with metamaterials.

Optics express, 29(22):36535-36545.

The rapid and sensitive detection of plant-growth-regulator (PGR) residue is essential for ensuring food safety for consumers. However, there are many disadvantages in current approaches to detecting PGR residue. In this paper, we demonstrate a highly sensitive PGR detection method by using terahertz time-domain spectroscopy combined with metamaterials. We propose a double formant metamaterial resonator based on a split-ring structure with titanium-gold nanostructure. The metamaterial resonator is a split-ring structure composed of a titanium-gold nanostructure based on polyimide film as the substrate. Also, terahertz spectral response and electric field distribution of metamaterials under different analyte thickness and refractive index were investigated. The simulation results showed that the theoretical sensitivity of resonance peak 1 and peak 2 of the refractive index sensor based on our designed metamaterial resonator approaches 780 and 720 gigahertz per refractive index unit (GHz/RIU), respectively. In experiments, a rapid solution analysis platform based on the double formant metamaterial resonator was set up and PGR residues in aqueous solution were directly and rapidly detected through terahertz time-domain spectroscopy. The results showed that metamaterials can successfully detect butylhydrazine and N-N diglycine at a concentration as low as 0.05 mg/L. This study paves a new way for sensitive, rapid, low-cost detection of PGRs. It also means that the double formant metamaterial resonator has significant potential for other applications in terahertz sensing.

RevDate: 2021-11-22

Li P, Ross CF, ZX Luo (2021)

Morphological disparity and evolutionary transformations in the primate hyoid apparatus.

Journal of human evolution, 162:103094 pii:S0047-2484(21)00146-9 [Epub ahead of print].

The hyoid apparatus plays an integral role in swallowing, respiration, and vocalization in mammals. Most placental mammals have a rod-shaped basihyal connected to the basicranium via both soft tissues and a mobile bony chain-the anterior cornu-whereas anthropoid primates have broad, shield-like or even cup-shaped basihyals suspended from the basicranium by soft tissues only. How the unique anthropoid hyoid morphology evolved is unknown, and hyoid morphology of nonanthropoid primates is poorly documented. Here we use phylogenetic comparative methods and linear morphometrics to address knowledge gaps in hyoid evolution among primates and their euarchontan outgroups. We find that dermopterans have variable reduction of cornu elements. Cynocephalus volans are sexually dimorphic in hyoid morphology. Tupaia and all lemuroids except Daubentonia have a fully ossified anterior cornu connecting a rod-shaped basihyal to the basicranium; this is the ancestral mammalian pattern that is also characteristic of the last common ancestor of Primates. Haplorhines exhibit a reduced anterior cornu, and anthropoids underwent further increase in basihyal aspect ratio values and in relative basihyal volume. Convergent with haplorhines, lorisoid strepsirrhines independently evolved a broad basihyal and reduced anterior cornua. While a reduced anterior cornu is hypothesized to facilitate vocal tract lengthening and lower formant frequencies in some mammals, our results suggest vocalization adaptations alone are unlikely to drive the iterative reduction of anterior cornua within Primates. Our new data on euarchontan hyoid evolution provide an anatomical basis for further exploring the form-function relationships of the hyoid across different behaviors, including vocalization, chewing, and swallowing.

RevDate: 2021-11-20

Xu L, Luo J, Xie D, et al (2021)

Reverberation Degrades Pitch Perception but Not Mandarin Tone and Vowel Recognition of Cochlear Implant Users.

Ear and hearing pii:00003446-900000000-98400 [Epub ahead of print].

OBJECTIVES: The primary goal of this study was to investigate the effects of reverberation on Mandarin tone and vowel recognition of cochlear implant (CI) users and normal-hearing (NH) listeners. To understand the performance of Mandarin tone recognition, this study also measured participants' pitch perception and the availability of temporal envelope cues in reverberation.

DESIGN: Fifteen CI users and nine NH listeners, all Mandarin speakers, were asked to recognize Mandarin single-vowels produced in four lexical tones and rank harmonic complex tones in pitch with different reverberation times (RTs) from 0 to 1 second. Virtual acoustic techniques were used to simulate rooms with different degrees of reverberation. Vowel duration and correlation between amplitude envelope and fundamental frequency (F0) contour were analyzed for different tones as a function of the RT.

RESULTS: Vowel durations of different tones significantly increased with longer RTs. Amplitude-F0 correlation remained similar for the falling Tone 4 but greatly decreased for the other tones in reverberation. NH listeners had robust pitch-ranking, tone recognition, and vowel recognition performance as the RT increased. Reverberation significantly degraded CI users' pitch-ranking thresholds but did not significantly affect the overall scores of tone and vowel recognition with CIs. Detailed analyses of tone confusion matrices showed that CI users reduced the flat Tone-1 responses but increased the falling Tone-4 responses in reverberation, possibly due to the falling amplitude envelope of late reflections after the original vowel segment. CI users' tone recognition scores were not correlated with their pitch-ranking thresholds.

CONCLUSIONS: NH listeners can reliably recognize Mandarin tones in reverberation using salient pitch cues from spectral and temporal fine structures. However, CI users have poorer pitch perception using F0-related amplitude modulations that are reduced in reverberation. Reverberation distorts speech amplitude envelopes, which affect the distribution of tone responses but not the accuracy of tone recognition with CIs. Recognition of vowels with stationary formant trajectories is not affected by reverberation for both NH listeners and CI users, regardless of the available spectral resolution. Future studies should test how the relatively stable vowel and tone recognition may contribute to sentence recognition in reverberation of Mandarin-speaking CI users.

RevDate: 2021-11-15

Melchor J, Vergara J, Figueroa T, et al (2021)

Formant-Based Recognition of Words and Other Naturalistic Sounds in Rhesus Monkeys.

Frontiers in neuroscience, 15:728686.

In social animals, identifying sounds is critical for communication. In humans, the acoustic parameters involved in speech recognition, such as the formant frequencies derived from the resonance of the supralaryngeal vocal tract, have been well documented. However, how formants contribute to recognizing learned sounds in non-human primates remains unclear. To determine this, we trained two rhesus monkeys to discriminate target and non-target sounds presented in sequences of 1-3 sounds. After training, we performed three experiments: (1) We tested the monkeys' accuracy and reaction times during the discrimination of various acoustic categories; (2) their ability to discriminate morphing sounds; and (3) their ability to identify sounds consisting of formant 1 (F1), formant 2 (F2), or F1 and F2 (F1F2) pass filters. Our results indicate that macaques can learn diverse sounds and discriminate from morphs and formants F1 and F2, suggesting that information from few acoustic parameters suffice for recognizing complex sounds. We anticipate that future neurophysiological experiments in this paradigm may help elucidate how formants contribute to the recognition of sounds.

RevDate: 2021-11-15

Cartei V, Reby D, Garnham A, et al (2022)

Peer audience effects on children's vocal masculinity and femininity.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 377(1841):20200397.

Existing evidence suggests that children from around the age of 8 years strategically alter their public image in accordance with known values and preferences of peers, through the self-descriptive information they convey. However, an important but neglected aspect of this 'self-presentation' is the medium through which such information is communicated: the voice itself. The present study explored peer audience effects on children's vocal productions. Fifty-six children (26 females, aged 8-10 years) were presented with vignettes where a fictional child, matched to the participant's age and sex, is trying to make friends with a group of same-sex peers with stereotypically masculine or feminine interests (rugby and ballet, respectively). Participants were asked to impersonate the child in that situation and, as the child, to read out loud masculine, feminine and gender-neutral self-descriptive statements to these hypothetical audiences. They also had to decide which of those self-descriptive statements would be most helpful for making friends. In line with previous research, boys and girls preferentially selected masculine or feminine self-descriptive statements depending on the audience interests. Crucially, acoustic analyses of fundamental frequency and formant frequency spacing revealed that children also spontaneously altered their vocal productions: they feminized their voices when speaking to members of the ballet club, while they masculinized their voices when speaking to members of the rugby club. Both sexes also feminized their voices when uttering feminine sentences, compared to when uttering masculine and gender-neutral sentences. Implications for the hitherto neglected role of acoustic qualities of children's vocal behaviour in peer interactions are discussed. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.

RevDate: 2021-11-15

Pisanski K, Anikin A, D Reby (2022)

Vocal size exaggeration may have contributed to the origins of vocalic complexity.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 377(1841):20200401.

Vocal tract elongation, which uniformly lowers vocal tract resonances (formant frequencies) in animal vocalizations, has evolved independently in several vertebrate groups as a means for vocalizers to exaggerate their apparent body size. Here, we propose that smaller speech-like articulatory movements that alter only individual formants can serve a similar yet less energetically costly size-exaggerating function. To test this, we examine whether uneven formant spacing alters the perceived body size of vocalizers in synthesized human vowels and animal calls. Among six synthetic vowel patterns, those characterized by the lowest first and second formant (the vowel /u/ as in 'boot') are consistently perceived as produced by the largest vocalizer. Crucially, lowering only one or two formants in animal-like calls also conveys the impression of a larger body size, and lowering the second and third formants simultaneously exaggerates perceived size to a similar extent as rescaling all formants. As the articulatory movements required for individual formant shifts are minor compared to full vocal tract extension, they represent a rapid and energetically efficient mechanism for acoustic size exaggeration. We suggest that, by favouring the evolution of uneven formant patterns in vocal communication, this deceptive strategy may have contributed to the origins of the phonemic diversification required for articulated speech. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.

RevDate: 2021-11-15

Grawunder S, Uomini N, Samuni L, et al (2022)

Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 377(1841):20200455.

The origins of human speech are obscure; it is still unclear what aspects are unique to our species or shared with our evolutionary cousins, in part due to a lack of a common framework for comparison. We asked what chimpanzee and human vocal production acoustics have in common. We examined visible supra-laryngeal articulators of four major chimpanzee vocalizations (hoos, grunts, barks, screams) and their associated acoustic structures, using techniques from human phonetic and animal communication analysis. Data were collected from wild adult chimpanzees, Taï National Park, Ivory Coast. Both discriminant and principal component classification procedures revealed classification of call types. Discriminating acoustic features include voice quality and formant structure, mirroring phonetic features in human speech. Chimpanzee lip and jaw articulation variables also offered similar discrimination of call types. Formant maps distinguished call types with different vowel-like sounds. Comparing our results with published primate data, humans show less F1-F2 correlation and further expansion of the vowel space, particularly for [i] sounds. Unlike recent studies suggesting monkeys achieve human vowel space, we conclude from our results that supra-laryngeal articulatory capacities show moderate evolutionary change, with vowel space expansion continuing through hominoid evolution. Studies on more primate species will be required to substantiate this. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.

RevDate: 2021-11-10

Davatz GC, Yamasaki R, Hachiya A, et al (2021)

Source and Filter Acoustic Measures of Young, Middle-Aged and Elderly Adults for Application in Vowel Synthesis.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00304-0 [Epub ahead of print].

INTRODUCTION: The output sound has important changes throughout life due to anatomical and physiological modifications in the larynx and vocal tract. Understanding the young adult to the elderly speech acoustic characteristics may assist in the synthesis of representative voices of men and women of different age groups.

OBJECTIVE: To obtain the fundamental frequency (f0), formant frequencies (F1, F2, F3, F4), and bandwidth (B1, B2, B3, B4) values extracted from the sustained vowel /a/ of young, middle-aged, and elderly adults who are Brazilian Portuguese speakers; to present the application of these parameters in vowel synthesis.

STUDY DESIGN: Prospective study.

METHODS: The acoustic analysis of tokens of the 162 sustained vowel /a/ produced by vocally healthy adults, men, and women, between 18 and 80 years old, was performed. The adults were divided into three groups: young adults (18 to 44 years old); middle-aged adults (45 to 59 years old) and, elderly adults (60 to 80 years old). The f0, F1, F2, F3, F4, B1, B2, B3, B4 were extracted from the audio signals. Their average values were applied to a source-filter mathematical model to perform vowel synthesis in each age group both men and woman.

RESULTS: Young women had higher f0 than middle-aged and elderly women. Elderly women had lower F1 than middle-aged women. Young women had higher F2 than elderly women. For the men's output sound, the source-filter acoustic measures were statistically equivalent among the age groups. Average values of the f0, F1, F2, F3, F4, B1, and B2 were higher in women. The sound waves distance in signals, the position of formant frequencies and the dimension of the bandwidths visible in spectra of the synthesized sounds represent the average values extracted from the volunteers' emissions for the sustained vowel /a/ in Brazilian Portuguese.

CONCLUSION: Sustained vowel /a/ produced by women presented different values of f0,F1 and F2 between age groups, which was not observed for men. In addition to the f0 and the formant frequencies, the bandwidths were also different between women and men. The synthetic vowels available represent the acoustic changes found for each sex as a function of age.

RevDate: 2021-11-04

Rowe HP, Stipancic KL, Lammert AC, et al (2021)

Validation of an Acoustic-Based Framework of Speech Motor Control: Assessing Criterion and Construct Validity Using Kinematic and Perceptual Measures.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose This study investigated the criterion (analytical and clinical) and construct (divergent) validity of a novel, acoustic-based framework composed of five key components of motor control: Coordination, Consistency, Speed, Precision, and Rate. Method Acoustic and kinematic analyses were performed on audio recordings from 22 subjects with amyotrophic lateral sclerosis during a sequential motion rate task. Perceptual analyses were completed by two licensed speech-language pathologists, who rated each subject's speech on the five framework components and their overall severity. Analytical and clinical validity were assessed by comparing performance on the acoustic features to their kinematic correlates and to clinician ratings of the five components, respectively. Divergent validity of the acoustic-based framework was then assessed by comparing performance on each pair of acoustic features to determine whether the features represent distinct articulatory constructs. Bivariate correlations and partial correlations with severity as a covariate were conducted for each comparison. Results Results revealed moderate-to-strong analytical validity for every acoustic feature, both with and without controlling for severity, and moderate-to-strong clinical validity for all acoustic features except Coordination, without controlling for severity. When severity was included as a covariate, the strong associations for Speed and Precision became weak. Divergent validity was supported by weak-to-moderate pairwise associations between all acoustic features except Speed (second-formant [F2] slope of consonant transition) and Precision (between-consonant variability in F2 slope). Conclusions This study demonstrated that the acoustic-based framework has potential as an objective, valid, and clinically useful tool for profiling articulatory deficits in individuals with speech motor disorders. The findings also suggest that compared to clinician ratings, instrumental measures are more sensitive to subtle differences in articulatory function. With further research, this framework could provide more accurate and reliable characterizations of articulatory impairment, which may eventually increase clinical confidence in the diagnosis and treatment of patients with different articulatory phenotypes.

RevDate: 2021-11-03

Abur D, Subaciute A, Daliri A, et al (2021)

Feedback and Feedforward Auditory-Motor Processes for Voice and Articulation in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Unexpected and sustained manipulations of auditory feedback during speech production result in "reflexive" and "adaptive" responses, which can shed light on feedback and feedforward auditory-motor control processes, respectively. Persons with Parkinson's disease (PwPD) have shown aberrant reflexive and adaptive responses, but responses appear to differ for control of vocal and articulatory features. However, these responses have not been examined for both voice and articulation in the same speakers and with respect to auditory acuity and functional speech outcomes (speech intelligibility and naturalness). Method Here, 28 PwPD on their typical dopaminergic medication schedule and 28 age-, sex-, and hearing-matched controls completed tasks yielding reflexive and adaptive responses as well as auditory acuity for both vocal and articulatory features. Results No group differences were found for any measures of auditory-motor control, conflicting with prior findings in PwPD while off medication. Auditory-motor measures were also compared with listener ratings of speech function: first formant frequency acuity was related to speech intelligibility, whereas adaptive responses to vocal fundamental frequency manipulations were related to speech naturalness. Conclusions These results support that auditory-motor processes for both voice and articulatory features are intact for PwPD receiving medication. This work is also the first to suggest associations between measures of auditory-motor control and speech intelligibility and naturalness.

RevDate: 2021-10-31

Cheung ST, Thompson K, Chen JL, et al (2021)

Response patterns to vowel formant perturbations in children.

The Journal of the Acoustical Society of America, 150(4):2647.

Auditory feedback is an important component of speech motor control, but its precise role in developing speech is less understood. The role of auditory feedback in development was probed by perturbing the speech of children 4-9 years old. The vowel sound /ɛ/ was shifted to /æ/ in real time and presented to participants as their own auditory feedback. Analyses of the resultant formant magnitude changes in the participants' speech indicated that children compensated and adapted by adjusting their formants to oppose the perturbation. Older and younger children responded to perturbation differently in F1 and F2. The compensatory change in F1 was greater for younger children, whereas the increase in F2 was greater for older children. Adaptation aftereffects were observed in both groups. Exploratory directional analyses in the two-dimensional formant space indicated that older children responded more directly and less variably to the perturbation than younger children, shifting their vowels back toward the vowel sound /ɛ/ to oppose the perturbation. Findings support the hypothesis that auditory feedback integration continues to develop between the ages of 4 and 9 years old such that the differences in the adaptive and compensatory responses arise between younger and older children despite receiving the same auditory feedback perturbation.

RevDate: 2021-10-30

Tang DL, McDaniel A, KE Watkins (2021)

Disruption of speech motor adaptation with repetitive transcranial magnetic stimulation of the articulatory representation in primary motor cortex.

Cortex; a journal devoted to the study of the nervous system and behavior, 145:115-130 pii:S0010-9452(21)00310-5 [Epub ahead of print].

When auditory feedback perturbation is introduced in a predictable way over a number of utterances, speakers learn to compensate by adjusting their own productions, a process known as sensorimotor adaptation. Despite multiple lines of evidence indicating the role of primary motor cortex (M1) in motor learning and memory, whether M1 causally contributes to sensorimotor adaptation in the speech domain remains unclear. Here, we aimed to assay whether temporary disruption of the articulatory representation in left M1 by repetitive transcranial magnetic stimulation (rTMS) impairs speech adaptation. To induce sensorimotor adaptation, the frequencies of first formants (F1) were shifted up and played back to participants when they produced "head", "bed", and "dead" repeatedly (the learning phase). A low-frequency rTMS train (.6 Hz, subthreshold, 12 min) over either the tongue or the hand representation of M1 (between-subjects design) was applied before participants experienced altered auditory feedback in the learning phase. We found that the group who received rTMS over the hand representation showed the expected compensatory response for the upwards shift in F1 by significantly reducing F1 and increasing the second formant (F2) frequencies in their productions. In contrast, these expected compensatory changes in both F1 and F2 did not occur in the group that received rTMS over the tongue representation. Critically, rTMS (subthreshold) over the tongue representation did not affect vowel production, which was unchanged from baseline. These results provide direct evidence that the articulatory representation in left M1 causally contributes to sensorimotor learning in speech. Furthermore, these results also suggest that M1 is critical to the network supporting a more global adaptation that aims to move the altered speech production closer to a learnt pattern of speech production used to produce another vowel.

RevDate: 2021-10-21

König A, Mallick E, Tröger J, et al (2021)

Measuring neuropsychiatric symptoms in patients with early cognitive decline using speech analysis.

European psychiatry : the journal of the Association of European Psychiatrists, 64(1):e64 pii:S0924933821022367.

BACKGROUND: Certain neuropsychiatric symptoms (NPS), namely apathy, depression, and anxiety demonstrated great value in predicting dementia progression, representing eventually an opportunity window for timely diagnosis and treatment. However, sensitive and objective markers of these symptoms are still missing. Therefore, the present study aims to investigate the association between automatically extracted speech features and NPS in patients with mild neurocognitive disorders.

METHODS: Speech of 141 patients aged 65 or older with neurocognitive disorder was recorded while performing two short narrative speech tasks. NPS were assessed by the neuropsychiatric inventory. Paralinguistic markers relating to prosodic, formant, source, and temporal qualities of speech were automatically extracted, correlated with NPS. Machine learning experiments were carried out to validate the diagnostic power of extracted markers.

RESULTS: Different speech variables are associated with specific NPS; apathy correlates with temporal aspects, and anxiety with voice quality-and this was mostly consistent between male and female after correction for cognitive impairment. Machine learning regressors are able to extract information from speech features and perform above baseline in predicting anxiety, apathy, and depression scores.

CONCLUSIONS: Different NPS seem to be characterized by distinct speech features, which are easily extractable automatically from short vocal tasks. These findings support the use of speech analysis for detecting subtypes of NPS in patients with cognitive impairment. This could have great implications for the design of future clinical trials as this cost-effective method could allow more continuous and even remote monitoring of symptoms.

RevDate: 2021-10-15

Lester-Smith RA, Derrick E, CR Larson (2021)

Characterization of Source-Filter Interactions in Vocal Vibrato Using a Neck-Surface Vibration Sensor: A Pilot Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00281-2 [Epub ahead of print].

PURPOSE: Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. The physiological sources of modulation within the speech mechanism and the interactions between the laryngeal source and vocal tract filter in vibrato are not fully understood. Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering.

METHOD: Nine classically-trained singers produced sustained vowels with vibrato while simultaneous signals were recorded using a vibration sensor and a microphone. Acoustical analyses were performed to measure the rate and extent of fo and intensity modulation for each trial. Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals.

RESULTS: The rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first and second formant frequencies.

CONCLUSIONS: This study demonstrated that the rate of intensity modulation at the source prior to supraglottal vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after supraglottal vocal tract filtering, as measured in microphone signals. The difference in rate varied based on the vowel. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in neck-surface vibration sensor and microphone signals.

RevDate: 2021-10-14

Harvey RG, DH Lloyd (1995)

The Distribution of Bacteria (Other than Staphylococci and Propionibacterium acnes) on the Hair, at the Skin Surface and Within the Hair Follicles of Dogs.

Veterinary dermatology, 6(2):79-84.

Résumé- La distribution des bactéries, autres que les staphlocoques sur la tige des poils, à la surface cutanée et dans les follicules pileux de 8 chiens est analysée. Sur la tige des poils Micrococcus spp. et les bactéries aérobies gram mégatifs sont plus nombreuses avec des numérations variant de 1,12 à 0,84 log10 (colonies formant unites par cm2). Des nombres hautement significatifs (p < 0,05) sont également trouvés. A la surface cutanée Micrococcus spp. des bactéries aérobies gram négatifs et Clostridium sp. sont les plus nombreuses avec des numérations respectives variant de 0,62, 1,12 et 0,84 log10 (colonies formant unités par cm2). Des nombres hautement significatifs (p < 0,05) de Micrococcus spp. sont trouvés de façon plus importante à l'intérieur des follicules pileux qu' à la surface cutanée, les Streptocoques et Bacillus sp. ont été trouvés respectivement sur cinq et quatre chiens. Proteus sp., Pseudomonas sp. Nocardia sp. sont occasion-nellement trouvés. [HARVEY, R.G., LLOYD, D.H. The distribution of bacteria (other than Staphylococci and Propionibacterium acnes) on the hair, at the skin surface and within the hair follicles of dogs (Distribution des bactéries autres que Staphylococci et Propionibacterium acnes) sur le poil, à la surface de al peau et dans les follicules pileux). Resumen- Presentamos la localizatión de bacterias noestafilocócicas en el pelo, en la superficie cutánea y dentro del foliculo piloso de ocho perros. En los pelos, Micrococcus spp. abundantes, con contajes medios entre 1.12 a 0.84 Log10 (unidades formadoras de colonias +1) cm-1 , respectivamente. Se encontró a nivel proximal un número significativamente mayor (p < 0.05) de bacterias aeróbicas gram-negativas y Bacillus spp. En la superficie cutánea, Micrococcus spp., las bacterias aeróbicas contajes medios de 0.62, 1.12 y 0.84 Log10 (unidades formadoras de colonias +1) cm"2 , respectivamente. Se aisló un número significativamente mayor (p < 0.05) de Micrococcus spp. dentro de los foliculos pilosos que en la superficie cutánea (p < 0.05). Se aisló Streptococi y Bacillus spp. en cinco y cuatro perros, respectivamente. Proteus spp., Pseudomonas spp. y Nocardia spp. fueron hallados ocasionalmente. [HARVEY, R.G., LLOYD, D.H. The distribution of bacteria (other than Staphylococci and Propionibacterium acnes) on the hair, at the skin surface and within the hair follicles of dogs (Localizatión de bacterias (exceptuando Staphilococci y Propionibacterium acnes) en el pelo, en la superficie cutánea y dentro de los foliculos pilosos). Abstract- The distribution of bacteria, other than staphylococci, on the hair shaft, at the skin surface and within the hair follicles of eight dogs is reported. On the hair shafts Micrococcus spp. and aerobic Gram-negative bacteria were most numerous, with mean counts ranging from 1.12 to 0.84 Log10 (colony forming units + 1) cm"1 respectively. Significantly higher numbers (p < 0.05) of Gram-negative bacteria and Bacillus sp. were found proximally. At the skin surface Micrococcus spp., aerobic Gram-negative bacteria and Clostridium spp. were the most numerous with mean counts of 0.62, 1.12 and 0.84 Log10 (colony forming units + 1) cm"2 , respectively. Significantly higher numbers (p < 0.05) of Micrococcus spp. were found within the hair follicles than on the skin surface (p < 0.05). Streptococci and Bacillus spp. were found on five and four dogs, respectively. Proteus spp., Pseudomonas spp. and Nocardia spp. were occasionally found.

RevDate: 2021-10-13

Lee Y, Park HJ, Bae IH, et al (2021)

Resonance Characteristics in Epiglottic Cyst: Formant Frequency, Vowel Space Area, Vowel Articulatory Index, and Formant Centralization Ratio.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00308-8 [Epub ahead of print].

OBJECTIVES: Resonance characteristics can change due to alterations in the shape of the vocal tract in patients with epiglottic cysts. This study aimed to analyze the resonance characteristics before and after the surgical excision of epiglottic cysts.

METHODS: Twelve male patients with epiglottic cysts were enrolled in this study. We analyzed the first and second formants (F1 and F2) in vowels /a/, /e/, /i/, /o/, and /u/, vowel space area (VSA), vowel articulatory index (VAI), and formant centralization ratio (FCR). We measured these parameters before and after the surgical excision of epiglottic cysts.

RESULTS: There was a significant increase in the F1 values of /a/, VSA, and VAI, and a significant decrease in the value of FCR after the surgery.

CONCLUSION: We confirmed the change in the resonance characteristics in patients with epiglottic cysts. It is considered that further studies on epiglottic cysts and resonance changes are needed in the future.

RevDate: 2021-10-11

Coto-Solano R, Stanford JN, SK Reddy (2021)

Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.

Frontiers in artificial intelligence, 4:662097 pii:662097.

In recent decades, computational approaches to sociophonetic vowel analysis have been steadily increasing, and sociolinguists now frequently use semi-automated systems for phonetic alignment and vowel formant extraction, including FAVE (Forced Alignment and Vowel Extraction, Rosenfelder et al., 2011; Evanini et al., Proceedings of Interspeech, 2009), Penn Aligner (Yuan and Liberman, J. Acoust. Soc. America, 2008, 123, 3878), and DARLA (Dartmouth Linguistic Automation), (Reddy and Stanford, DARLA Dartmouth Linguistic Automation: Online Tools for Linguistic Research, 2015a). Yet these systems still have a major bottleneck: manual transcription. For most modern sociolinguistic vowel alignment and formant extraction, researchers must first create manual transcriptions. This human step is painstaking, time-consuming, and resource intensive. If this manual step could be replaced with completely automated methods, sociolinguists could potentially tap into vast datasets that have previously been unexplored, including legacy recordings that are underutilized due to lack of transcriptions. Moreover, if sociolinguists could quickly and accurately extract phonetic information from the millions of hours of new audio content posted on the Internet every day, a virtual ocean of speech from newly created podcasts, videos, live-streams, and other audio content would now inform research. How close are the current technological tools to achieving such groundbreaking changes for sociolinguistics? Prior work (Reddy et al., Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71-75) showed that an HMM-based Automated Speech Recognition system, trained with CMU Sphinx (Lamere et al., 2003), was accurate enough for DARLA to uncover evidence of the US Southern Vowel Shift without any human transcription. Even so, because that automatic speech recognition (ASR) system relied on a small training set, it produced numerous transcription errors. Six years have passed since that study, and since that time numerous end-to-end automatic speech recognition (ASR) algorithms have shown considerable improvement in transcription quality. One example of such a system is the RNN/CTC-based DeepSpeech from Mozilla (Hannun et al., 2014). (RNN stands for recurrent neural networks, the learning mechanism for DeepSpeech. CTC stands for connectionist temporal classification, the mechanism to merge phones into words). The present paper combines DeepSpeech with DARLA to push the technological envelope and determine how well contemporary ASR systems can perform in completely automated vowel analyses with sociolinguistic goals. Specifically, we used these techniques on audio recordings from 352 North American English speakers in the International Dialects of English Archive (IDEA), extracting 88,500 tokens of vowels in stressed position from spontaneous, free speech passages. With this large dataset we conducted acoustic sociophonetic analyses of the Southern Vowel Shift and the Northern Cities Chain Shift in the North American IDEA speakers. We compared the results using three different sources of transcriptions: 1) IDEA's manual transcriptions as the baseline "ground truth", 2) the ASR built on CMU Sphinx used by Reddy et al. (Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71-75), and 3) the latest publicly available Mozilla DeepSpeech system. We input these three different transcriptions to DARLA, which automatically aligned and extracted the vowel formants from the 352 IDEA speakers. Our quantitative results show that newer ASR systems like DeepSpeech show considerable promise for sociolinguistic applications like DARLA. We found that DeepSpeech's automated transcriptions had significantly fewer character error rates than those from the prior Sphinx system (from 46 to 35%). When we performed the sociolinguistic analysis of the extracted vowel formants from DARLA, we found that the automated transcriptions from DeepSpeech matched the results from the ground truth for the Southern Vowel Shift (SVS): five vowels showed a shift in both transcriptions, and two vowels didn't show a shift in either transcription. The Northern Cities Shift (NCS) was more difficult to detect, but ground truth and DeepSpeech matched for four vowels: One of the vowels showed a clear shift, and three showed no shift in either transcription. Our study therefore shows how technology has made progress toward greater automation in vowel sociophonetics, while also showing what remains to be done. Our statistical modeling provides a quantified view of both the abilities and the limitations of a completely "hands-free" analysis of vowel shifts in a large dataset. Naturally, when comparing a completely automated system against a semi-automated system involving human manual work, there will always be a tradeoff between accuracy on the one hand versus speed and replicability on the other hand [Kendall and Joseph, Towards best practices in sociophonetics (with Marianna DiPaolo), 2014]. The amount of "noise" that can be tolerated for a given study will depend on the particular research goals and researchers' preferences. Nonetheless, our study shows that, for certain large-scale applications and research goals, a completely automated approach using publicly available ASR can produce meaningful sociolinguistic results across large datasets, and these results can be generated quickly, efficiently, and with full replicability.

RevDate: 2021-10-11

Sondhi S, Salhan A, Santoso CA, et al (2021)

Voice processing for COVID-19 scanning and prognostic indicator.

Heliyon, 7(10):e08134.

COVID-19 pandemic has posed serious risk of contagion to humans. There is a need to find reliable non-contact tests like vocal correlates of COVID-19 infection. Thirty-six Asian ethnic volunteers 16 (8M & 8F) infected subjects and 20 (10M &10F) non-infected controls participated in this study by vocalizing vowels /a/, /e/, /i/, /o/, /u/. Voice correlates of 16 COVID-19 positive patients were compared during infection and after recovery with 20 non-infected controls. Compared to non-infected controls, significantly higher values of energy intensity for /o/ (p = 0.048); formant F1 for /o/ (p = 0.014); and formant F3 for /u/ (p = 0.032) were observed in male patients, while higher values of Jitter (local, abs) for /o/ (p = 0.021) and Jitter (ppq5) for /a/ (p = 0.014) were observed in female patients. However, formant F2 for /u/ (p = 0.018), mean pitch F0 for /e/, /i/ and /o/ (p = 0.033; 0.036; 0.047) decreased for female patients under infection. Compared to recovered conditions, HNR for /e/ (p = 0.014) was higher in male patients under infection, while Jitter (rap) for /a/ (p = 0.041); Jitter (ppq5) for /a/ (p = 0.032); Shimmer (local, dB) for /i/ (p = 0.024); Shimmer (apq5) for /u/ (p = 0.019); and formant F4 for vowel /o/ (p = 0.022) were higher in female patients under infection. However, HNR for /e/ (p = 0.041); and formant F1 for /o/ (p = 0.002) were lower in female patients compared to their recovered conditions. Obtained results support the hypothesis since changes in voice parameters were observed in the infected patients which can be correlated to a combination of acoustic measures like fundamental frequency, formant characteristics, HNR, and voice perturbations like jitter and shimmer for different vowels. Thus, voice analysis can be used for scanning and prognosis of COVID-19 infection. Based on the findings of this study, a mobile application can be developed to analyze human voice in real-time to detect COVID-19 symptoms for remedial measures and necessary action.

RevDate: 2021-09-20

Wang Y, Qiu X, Wang F, et al (2021)

Single-crystal ordered macroporous metal-organic framework as support for molecularly imprinted polymers and their integration in membrane formant for the specific recognition of zearalenone.

Journal of separation science [Epub ahead of print].

Zearalenone is a fungal contaminant that is widely present in grains. Here, a novel molecularly imprinted membrane based on SOM-ZIF-8 was developed for the rapid and highly selective identification of zearalenone in grain samples. The molecularly imprinted membrane was prepared using polyvinylidene fluoride, cyclododecyl 2,4-dihydroxybenzoate as a template and SOM-ZIF-8 as a carrier. The factors influencing the extraction of zearalenone using this membrane, including the solution pH, extraction time, elution solvent, elution time and elution volume were studied in detail. The optimized conditions were 5 mL of sample solution at pH 6, extraction time of 45 min, 4 mL of acetonitrile:methanol=9:1 as elution solvent, and elution time of 20 min. This method displayed a good linear range of 12∼120 ng·g-1 (R2 =0.998) with the limits of detection and quantification of this method are 1.7 ng·g-1 and 5.5 ng·g-1 , respectively. In addition, the membrane was used to selectively identify zearalenone in grain samples with percent recoveries ranging from 87.9% to 101.0% and relative standard deviation of less than 6.6 %. Overall, this study presents a simple and effective chromatographic pretreatment method for detecting zearalenone in food samples. This article is protected by copyright. All rights reserved.

RevDate: 2021-09-20

Erdur OE, BS Yilmaz (2021)

Voice changes after surgically assisted rapid maxillary expansion.

American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics pii:S0889-5406(21)00563-1 [Epub ahead of print].

INTRODUCTION: This study aimed to investigate voice changes in patients who had surgically assisted rapid maxillary expansion (SARME).

METHODS: Nineteen adult patients with maxillary transverse deficiency were asked to pronounce the sounds "[a], [ϵ], [ɯ], [i], [ɔ], [œ] [u], [y]" for 3 seconds. Voice records were taken before the expansion appliance was placed (T0) and 5.8 weeks after removal (T1, after 5.2 months of retention). The same records were taken for the control group (n = 19). The formant frequencies (F0, F1, F2, and F3), shimmer, jitter, and noise-to-harmonics ratio (NHR) parameters were considered with Praat (version 6.0.43).

RESULTS: In the SARME group, significant differences were observed in the F1 of [a] (P = 0.005), F2 of [ϵ] (P = 0.008), and [œ] sounds (P = 0.004). The postexpansion values were lower than those recorded before. In contrast, the F1 of [y] sound (P = 0.02), F2 of [u] sound (P = 0.01), the jitter parameter of [ɯ] and [i] sounds (P = 0.04; P = 0.002), and the NHR value of [ϵ] sound (P = 0.04) were significantly than the baseline values. In the comparison with the control group, significant differences were found in the F0 (P = 0.025) and F1 (P = 0.046) of the [u] sound, the F1 of the [a] sound (P = 0.03), and the F2 of the [ϵ] sound (P = 0.037). Significant differences were also found in the shimmer of [i] (P = 0.017) and [ɔ] (P = 0.002), the jitter of [ϵ] (P = 0.046) and [i] (P = 0.017), and the NHR of [i] (P = 0.012) and [ɔ] (P = 0.009).

CONCLUSION: SARME led to significant differences in some of the acoustics parameters.

RevDate: 2021-09-09

Perlman M, Paul J, G Lupyan (2021)

Vocal communication of magnitude across language, age, and auditory experience.

Journal of experimental psychology. General pii:2021-82980-001 [Epub ahead of print].

Like many other vocalizing vertebrates, humans convey information about their body size through the sound of their voice. Vocalizations of larger animals are typically longer in duration, louder in intensity, and lower in frequency. We investigated people's ability to use voice-size correspondences to communicate about the magnitude of external referents. First, we asked hearing children, as well as deaf children and adolescents, living in China to improvise nonlinguistic vocalizations to distinguish between paired items contrasting in magnitude (e.g., a long vs. short string, a big vs. small ball). Then we played these vocalizations back to adult listeners in the United States and China to assess their ability to correctly guess the intended referents. We find that hearing and deaf producers both signaled greater magnitude items with longer and louder vocalizations and with smaller formant spacing. Only hearing producers systematically used fundamental frequency, communicating greater magnitude with higher fo. The vocalizations of both groups were understandable to Chinese and American listeners, although accuracy was higher with vocalizations from older producers. American listeners relied on the same acoustic properties as Chinese listeners: both groups interpreted vocalizations with longer duration and greater intensity as referring to greater items; neither American nor Chinese listeners consistently used fo or formant spacing as a cue. These findings show that the human ability to use vocalizations to communicate about the magnitude of external referents is highly robust, extending across listeners of disparate linguistic and cultural backgrounds, as well as across age and auditory experience. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

RevDate: 2021-09-07

Stansbury AL, VM Janik (2021)

The role of vocal learning in call acquisition of wild grey seal pups.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 376(1836):20200251.

Pinnipeds have been identified as one of the best available models for the study of vocal learning. Experimental evidence for their learning skills is demonstrated with advanced copying skills, particularly in formant structure when copying human speech sounds and melodies. By contrast, almost no data are available on how learning skills are used in their own communication systems. We investigated the impact of playing modified seal sounds in a breeding colony of grey seals (Halichoerus grypus) to study how acoustic input influenced vocal development of eight pups. Sequences of two or three seal pup calls were edited so that the average peak frequency between calls in a sequence changed up or down. We found that seals copied the specific stimuli played to them and that copies became more accurate over time. The differential response of different groups showed that vocal production learning was used to achieve conformity, suggesting that geographical variation in seal calls can be caused by horizontal cultural transmission. While learning of pup calls appears to have few benefits, we suggest that it also affects the development of the adult repertoire, which may facilitate social interactions such as mate choice. This article is part of the theme issue 'Vocal learning in animals and humans'.

RevDate: 2021-09-02

Stehr DA, Hickok G, Ferguson SH, et al (2021)

Examining vocal attractiveness through articulatory working space.

The Journal of the Acoustical Society of America, 150(2):1548.

Robust gender differences exist in the acoustic correlates of clearly articulated speech, with females, on average, producing speech that is acoustically and phonetically more distinct than that of males. This study investigates the relationship between several acoustic correlates of clear speech and subjective ratings of vocal attractiveness. Talkers were recorded producing vowels in /bVd/ context and sentences containing the four corner vowels. Multiple measures of working vowel space were computed from continuously sampled formant trajectories and were combined with measures of speech timing known to co-vary with clear articulation. Partial least squares regression (PLS-R) modeling was used to predict ratings of vocal attractiveness for male and female talkers based on the acoustic measures. PLS components that loaded on size and shape measures of working vowel space-including the quadrilateral vowel space area, convex hull area, and bivariate spread of formants-along with measures of speech timing were highly successful at predicting attractiveness in female talkers producing /bVd/ words. These findings are consistent with a number of hypotheses regarding human attractiveness judgments, including the role of sexual dimorphism in mate selection, the significance of traits signalling underlying health, and perceptual fluency accounts of preferences.

RevDate: 2021-09-02

Sahoo S, S Dandapat (2021)

Analyzing the vocal tract characteristics for out-of-breath speech.

The Journal of the Acoustical Society of America, 150(2):1524.

In this work, vocal tract characteristic changes under the out-of-breath condition are explored. Speaking under the influence of physical exercise is called out-of-breath speech. The change in breathing pattern results in perceptual changes in the produced sound. For vocal tract, the first four formants show a lowering in their average frequency. The bandwidths BF1 and BF2 widen, whereas the other two get narrowed. The change in bandwidth is small for the last three. For a speaker, the change in frequency and bandwidth may not be uniform across formants. Subband analysis is carried out around formants for comparing the variation of the vocal tract with the source. A vocal tract adaptive empirical wavelet transform is used for extracting formant specific subbands from speech and source. The support vector machine performs the subband-based binary classification between the normal and out-of-breath speech. For all speakers, it shows an F1-score improvement of 4% over speech subbands. Similarly, a performance improvement of 5% can be seen for both male and female speakers. Furthermore, the misclassification amount is less for source compared to speech. These results suggest that physical exercise influences the source more than the vocal tract.

RevDate: 2021-09-01

Dastolfo-Hromack C, Bush A, Chrabaszcz A, et al (2021)

Articulatory Gain Predicts Motor Cortex and Subthalamic Nucleus Activity During Speech.

Cerebral cortex (New York, N.Y. : 1991) pii:6362001 [Epub ahead of print].

Speaking precisely is important for effective verbal communication, and articulatory gain is one component of speech motor control that contributes to achieving this goal. Given that the basal ganglia have been proposed to regulate the speed and size of limb movement, that is, movement gain, we explored the basal ganglia contribution to articulatory gain, through local field potentials (LFP) recorded simultaneously from the subthalamic nucleus (STN), precentral gyrus, and postcentral gyrus. During STN deep brain stimulation implantation for Parkinson's disease, participants read aloud consonant-vowel-consonant syllables. Articulatory gain was indirectly assessed using the F2 Ratio, an acoustic measurement of the second formant frequency of/i/vowels divided by/u/vowels. Mixed effects models demonstrated that the F2 Ratio correlated with alpha and theta activity in the precentral gyrus and STN. No correlations were observed for the postcentral gyrus. Functional connectivity analysis revealed that higher phase locking values for beta activity between the STN and precentral gyrus were correlated with lower F2 Ratios, suggesting that higher beta synchrony impairs articulatory precision. Effects were not related to disease severity. These data suggest that articulatory gain is encoded within the basal ganglia-cortical loop.

RevDate: 2021-08-17

Aires MM, de Vasconcelos D, Lucena JA, et al (2021)

Effect of Wendler glottoplasty on voice and quality of life of transgender women.

Brazilian journal of otorhinolaryngology pii:S1808-8694(21)00134-8 [Epub ahead of print].

OBJECTIVE: To investigate the effect of Wendler glottoplasty on voice feminization, voice quality and voice-related quality of life.

METHODS: Prospective interventional cohort of transgender women submitted to Wendler glottoplasty. Acoustic analysis of the voice included assessment of fundamental frequency, maximum phonation time formant frequencies (F1 and F2), frequency range, jitter and shimmer. Voice quality was blindly assessed through GRBAS scale. Voice-related quality of life was measured using the Trans Woman Voice Questionnaire and the self-perceived femininity of the voice.

RESULTS: A total of 7 patients were included. The mean age was 35.4 years, and the mean postoperative follow-up time was 13.7 months. There was a mean increase of 47.9 ± 46.6 Hz (p = 0.023) in sustained/e/F0 and a mean increase of 24.6 ± 27.5 Hz (p = 0.029) in speaking F0 after glottoplasty. There was no statistical significance in the pre- and postoperative comparison of maximum phonation time, formant frequencies, frequency range, jitter, shimmer, and grade, roughness, breathiness, asthenia, and strain scale. Trans Woman Voice Questionnaire decreased following surgery from 98.3 ± 9.2 to 54.1 ± 25.0 (p = 0.007) and mean self-perceived femininity of the voice increased from 2.8 ± 1.8 to 7.7 ± 2.4 (p = 0.008). One patient (14%) presented a postoperative granuloma and there was 1 (14%) premature suture dehiscence.

CONCLUSION: Glottoplasty is safe and effective for feminizing the voice of transgender women. There was an increase in fundamental frequency, without aggravating other acoustic parameters or voice quality. Voice-related quality of life improved after surgery.

RevDate: 2021-08-16

Chung H (2021)

Acoustic Characteristics of Pre- and Post-vocalic /l/: Patterns from One Southern White Vernacular English.

Language and speech [Epub ahead of print].

This study examined acoustic characteristics of the phoneme /l/ produced by young female and male adult speakers of Southern White Vernacular English (SWVE) from Louisiana. F1, F2, and F2-F1 values extracted at the /l/ midpoint were analyzed by word position (pre- vs. post-vocalic) and vowel contexts (/i, ɪ/ vs. /ɔ, a/). Descriptive analysis showed that SWVE /l/ exhibited characteristics of the dark /l/ variant. The formant patterns of /l/, however, differed significantly by word position and vowel context, with pre-vocalic /l/ showing significantly higher F2-F1 values than post-vocalic /l/, and /l/ in the high front vowel context showing significantly higher F2-F1 values than those in the low back vowel context. Individual variation in the effects of word position and vowel contexts on /l/ pattern was also observed. Overall, the findings of the current study showed a gradient nature of SWVE /l/ variants whose F2-F1 patterns generally fell into the range of the dark /l/ variant, while varying by word position and vowel context.

RevDate: 2021-09-02

Yang L, Fu K, Zhang J, et al (2021)

Non-native acoustic modeling for mispronunciation verification based on language adversarial representation learning.

Neural networks : the official journal of the International Neural Network Society, 142:597-607.

Non-native mispronunciation verification is designed to provide feedback to guide language learners to correct their pronunciation errors in their further learning and it plays an important role in the computer-aided pronunciation training (CAPT) system. Most existing approaches focus on establishing the acoustic model directly using non-native corpus thus they are suffering the data sparsity problem due to time-consuming non-native speech data collection and annotation tasks. In this work, to address this problem, we propose a pre-trained approach to utilize the speech data of two native languages (the learner's native and target languages) for non-native mispronunciation verification. We set up an unsupervised model to extract knowledge from a large scale of unlabeled raw speech of the target language by making predictions about future observations in the speech signal, then the model is trained with language adversarial training using the learner's native language to align the feature distribution of two languages by confusing a language discriminator. In addition, sinc filter is incorporated at the first convolutional layer to capture the formant-like feature. Formant is relevant to the place and manner of articulation. Therefore, it is useful not only for pronunciation error detection but also for providing instructive feedback. Then the pre-trained model serves as the feature extractor in the downstream mispronunciation verification task. Through the experiments on the Japanese part of the BLCU inter-Chinese speech corpus, the experimental results demonstrate that for the non-native phone recognition and mispronunciation verification tasks (1) the knowledge learned from two native languages speech with the proposed unsupervised approach is useful for these two tasks (2) our proposed language adversarial representation learning is effective to improve the performance (3) formant-like feature can be incorporated by introducing sinc filter to further improve the performance of mispronunciation verification.

RevDate: 2021-08-13

Leyns C, Corthals P, Cosyns M, et al (2021)

Acoustic and Perceptual Effects of Articulation Exercises in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(21)00242-3 [Epub ahead of print].

PURPOSE: This study measured the impact of articulation exercises using a cork and articulation exercises for lip spreading on the formant frequencies of vowels and listener perceptions of femininity in transgender women.

METHODS: Thirteen transgender women were recorded before and after the cork exercise and before and after the lip spreading exercise. Speech samples included continuous speech during reading and were analyzed using Praat software. Vowel formant frequencies (F1, F2, F3, F4, F5) and vowel space were determined. A listening experiment was organized using naïve cisgender women and cisgender men rating audio samples of continuous speech. Masculinity/femininity, vocal quality and age were rated, using a visual analogue scale (VAS).

RESULTS: Concerning vowel formant frequencies, F2 /a/ and F5 /u/ significantly increased after the lip spreading exercise, as well as F3 /a/, F3 /u/ and F4 /a/ after the cork exercise. The lip spreading exercise had more impact on the F2 /a/ than the cork exercise. Vowel space did not change after the exercises. The fundamental frequency (fo) increased simultaneously during both exercises. Both articulation exercises were associated with significantly increased listener perceptions of femininity of the voice.

CONCLUSION: Subtle changes in formant frequencies can be observed after performing articulation exercises, but not in every formant frequency or vowel. Cisgender listeners rated the speech of the transgender women more feminine after the exercises. Further research with a more extensive therapy program and listening experiment is needed to examine these preliminary findings.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Yang JJ, Cheng LY, W Xu (2021)

[Study on changes of voice characteristics after adenotonsillectomy or adenoidectomy in children].

Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery, 56(7):724-729.

Objective: To study voice changes in children after adenotonsillectomy or adenoidectomy and the relationship with the vocal tract structure. Methods: Fifty patients were recruited in this study prospectively, aged from 4 to 12 years old with the median age of 6. They were underwent adenotonsillectomy or adenoidectomy in Beijing Tongren Hospital, Capital Medical University from July 2019 to August 2020. In the cases, there are 31 males and 19 females. Thirty-six patients underwent adenotonsillectomy and 14 patients underwent adenoidectomy alone. Twenty-two children (13 males, 9 females) with Ⅰ degree of bilateral tonsils without adenoid hypertrophy and no snoring were selected as normal controls. Adenoid and tonsil sizes were evaluated. Subjective changes of voice were recorded after surgery. Moreover, voice data including fundamental frequency(F0), jitter, shimmer, noise to harmonic ratio(NHR), maximum phonation time(MPT), formant frequencies(F1-F5) and bandwidths(B1-B5) of vowel/a/and/i/were analyzed before, 3 days and 1 month after surgery respectively.SPSS 23.0 was used for statistical analysis. Results: Thirty-six patients(72.0%,36/50) complained of postoperative voice changes. The incidence was inversely correlated with age. In children aged 4-6, 7-9, and 10-12, the incidence was 83.3%(25/30), 63.6%(7/11) and 44.4%(4/9) respectively. Voice changes appeared more common in children underwent adenotonsillectomy(77.8%,28/36) than in those underwent adenoidectomy alone(57.1%,8/14), but there was no statistical difference. After operation, for vowel/a/, MPT(Z=2.18,P=0.041) and F2(t=2.13,P=0.040) increased, B2(Z=2.04,P=0.041) and B4(Z=2.00,P=0.046) decreased. For vowel/i/, F2(t=2.035,P=0.050) and F4(t=4.44,P=0.0001) increased, B2(Z=2.36,P=0.019) decreased. Other acoustic parameters were not significantly different from those before surgery. The F2(r=-0.392, P =0.032) of vowel/a/and F2(r=-0.279, P=0.048) and F4 (r=-0.401, P =0.028) of vowel/i/after adenotonsillectomy were significantly higher than those of adenoidectomy alone. Half of patients with postopertive voice changes can recover spontaneously 1 month after surgery. Conclusions: Voice changes in children underwent adenotonsillectomy or adenoidectomy might be related to their changes in formants and bandwidths. The effect of adenotonsillectomy on voice was more significant compared with that of adenoidectomy alone. The acoustic parameters did not change significantly after surgery except MPT.

RevDate: 2021-08-03

Frey R, Wyman MT, Johnston M, et al (2021)

Roars, groans and moans: Anatomical correlates of vocal diversity in polygynous deer.

Journal of anatomy [Epub ahead of print].

Eurasian deer are characterized by the extraordinary diversity of their vocal repertoires. Male sexual calls range from roars with relatively low fundamental frequency (hereafter fo) in red deer Cervus elaphus, to moans with extremely high fo in sika deer Cervus nippon, and almost infrasonic groans with exceptionally low fo in fallow deer Dama dama. Moreover, while both red and fallow males are capable of lowering their formant frequencies during their calls, sika males appear to lack this ability. Female contact calls are also characterized by relatively less pronounced, yet strong interspecific differences. The aim of this study is to examine the anatomical bases of these inter-specific and inter-sexual differences by identifying if the acoustic variation is reflected in corresponding anatomical variation. To do this, we investigated the vocal anatomy of male and female specimens of each of these three species. Across species and sexes, we find that the observed acoustic variability is indeed related to expected corresponding anatomical differences, based on the source-filter theory of vocal production. At the source level, low fo is associated with larger vocal folds, whereas high fo is associated with smaller vocal folds: sika deer have the smallest vocal folds and male fallow deer the largest. Red and sika deer vocal folds do not appear to be sexually dimorphic, while fallow deer exhibit strong sexual dimorphism (after correcting for body size differences). At the filter level, the variability in formants is related to the configuration of the vocal tract: in fallow and red deer, both sexes have evolved a permanently descended larynx (with a resting position of the larynx much lower in males than in females). Both sexes also have the potential for momentary, call-synchronous vocal tract elongation, again more pronounced in males than in females. In contrast, the resting position of the larynx is high in both sexes of sika deer and the potential for further active vocal tract elongation is virtually absent in both sexes. Anatomical evidence suggests an evolutionary reversal in larynx position within sika deer, that is, a secondary larynx ascent. Together, our observations confirm that the observed diversity of vocal behaviour in polygynous deer is supported by strong anatomical differences, highlighting the importance of anatomical specializations in shaping mammalian vocal repertoires. Sexual selection is discussed as a potential evolutionary driver of the observed vocal diversity and sexual dimorphisms.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Strycharczuk P, Ćavar M, S Coretta (2021)

Distance vs time. Acoustic and articulatory consequences of reduced vowel duration in Polish.

The Journal of the Acoustical Society of America, 150(1):592.

This paper presents acoustic and articulatory (ultrasound) data on vowel reduction in Polish. The analysis focuses on the question of whether the change in formant value in unstressed vowels can be explained by duration-driven undershoot alone or whether there is also evidence for additional stress-specific articulatory mechanisms that systematically affect vowel formants. On top of the expected durational differences between the stressed and unstressed conditions, the duration is manipulated by inducing changes in the speech rate. The observed vowel formants are compared to expected formants derived from the articulatory midsagittal tongue data in different conditions. The results show that the acoustic vowel space is reduced in size and raised in unstressed vowels compared to stressed vowels. Most of the spectral reduction can be explained by reduced vowel duration, but there is also an additional systematic effect of F1-lowering in unstressed non-high vowels that does not follow from tongue movement. The proposed interpretation is that spectral vowel reduction in Polish behaves largely as predicted by the undershoot model of vowel reduction, but the effect of undershoot is enhanced for low unstressed vowels, potentially by a stress marking strategy which involves raising the fundamental frequency.

RevDate: 2021-08-03

Petersen EA, Colinot T, Silva F, et al (2021)

The bassoon tonehole lattice: Links between the open and closed holes and the radiated sound spectrum.

The Journal of the Acoustical Society of America, 150(1):398.

The acoustics of the bassoon has been the subject of relatively few studies compared with other woodwind instruments. One reason for this may lie in its complicated resonator geometry, which includes irregularly spaced toneholes with chimney heights ranging from 3 to 31 mm. The current article evaluates the effect of the open and closed tonehole lattice (THL) on the acoustic response of the bassoon resonator. It is shown that this response can be divided into three distinct frequency bands that are determined by the open and closed THL: below 500 Hz, 500-2200 Hz, and above 2200 Hz. The first is caused by the stopband of the open THL, where the low frequency effective length of the instrument is determined by the location of the first open tonehole. The second is due to the passband of the open THL, such that the modes are proportional to the total length of the resonator. The third is due to the closed THL, where part of the acoustical power is trapped within the resonator. It is proposed that these three frequency bands impact the radiated spectrum by introducing a formant in the vicinity of 500 Hz and suppressing radiation above 2200 Hz for most first register fingerings.

RevDate: 2021-08-05
CmpDate: 2021-08-05

Uezu Y, Hiroya S, T Mochida (2021)

Articulatory compensation for low-pass filtered formant-altered auditory feedback.

The Journal of the Acoustical Society of America, 150(1):64.

Auditory feedback while speaking plays an important role in stably controlling speech articulation. Its importance has been verified in formant-altered auditory feedback (AAF) experiments where speakers utter while listening to speech with perturbed first (F1) and second (F2) formant frequencies. However, the contribution of the frequency components higher than F2 to the articulatory control under the perturbations of F1 and F2 has not yet been investigated. In this study, a formant-AAF experiment was conducted in which a low-pass filter was applied to speech. The experimental results showed that the deviation in the compensatory response was significantly larger when a low-pass filter with a cutoff frequency of 3 kHz was used compared to that when cutoff frequencies of 4 and 8 kHz were used. It was also found that the deviation in the 3-kHz condition correlated with the fundamental frequency and spectral tilt of the produced speech. Additional simulation results using a neurocomputational model of speech production (SimpleDIVA model) and the experimental data showed that the feedforward learning rate increased as the cutoff frequency decreased. These results suggest that high-frequency components of the auditory feedback would be involved in the determination of corrective motor commands from auditory errors.

RevDate: 2021-07-23

Lynn E, Narayanan SS, AC Lammert (2021)

Dark tone quality and vocal tract shaping in soprano song production: Insights from real-time MRI.

JASA express letters, 1(7):075202.

Tone quality termed "dark" is an aesthetically important property of Western classical voice performance and has been associated with lowered formant frequencies, lowered larynx, and widened pharynx. The present study uses real-time magnetic resonance imaging with synchronous audio recordings to investigate dark tone quality in four professionally trained sopranos with enhanced ecological validity and a relatively complete view of the vocal tract. Findings differ from traditional accounts, indicating that labial narrowing may be the primary driver of dark tone quality across performers, while many other aspects of vocal tract shaping are shown to differ significantly in a performer-specific way.

RevDate: 2021-07-18

Joshi A, Procter T, PA Kulesz (2021)

COVID-19: Acoustic Measures of Voice in Individuals Wearing Different Facemasks.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

AIM: The global health pandemic caused by the SARS-coronavirus 2 (COVID-19) has led to the adoption of facemasks as a necessary safety precaution. Depending on the level of risk for exposure to the virus, the facemasks that are used can vary. The aim of this study was to examine the effect of different types of facemasks, typically used by healthcare professionals and the public during the COVID-19 pandemic, on measures of voice.

METHODS: Nineteen adults (ten females, nine males) with a normal voice quality completed sustained vowel tasks. All tasks were performed for each of the six mask conditions: no mask, cloth mask, surgical mask, KN95 mask and, surgical mask over a KN95 mask with and without a face shield. Intensity measurements were obtained at a 1ft and 6ft distance from the speaker with sound level meters. Tasks were recorded with a 1ft mouth-to-microphone distance. Acoustic variables of interest were fundamental frequency (F0), and formant frequencies (F1, F2) for /a/ and /i/ and smoothed cepstral peak prominence (CPPs) for /a/.

RESULTS: Data were analyzed to compare differences between sex and mask types. There was statistical significance between males and females for intensity measures and all acoustic variables except F2 for /a/ and F1 for /i/. Few pairwise comparisons between masks reached significance even though main effects for mask type were observed. These are further discussed in the article.

CONCLUSION: The masks tested in this study did not have a significant impact on intensity, fundamental frequency, CPPs, first or second formant frequency compared to voice output without a mask. Use of a face shield seemed to affect intensity and CPPs to some extent. Implications of these findings are discussed further in the article.

RevDate: 2021-08-04

Easwar V, Birstler J, Harrison A, et al (2021)

The Influence of Sensation Level on Speech-Evoked Envelope Following Responses.

Ear and hearing pii:00003446-900000000-98474 [Epub ahead of print].

OBJECTIVES: To evaluate sensation level (SL)-dependent characteristics of envelope following responses (EFRs) elicited by band-limited speech dominant in low, mid, and high frequencies.

DESIGN: In 21 young normal hearing adults, EFRs were elicited by 8 male-spoken speech stimuli-the first formant, and second and higher formants of /u/, /a/ and /i/, and modulated fricatives, /∫/ and /s/. Stimulus SL was computed from behaviorally measured thresholds.

RESULTS: At 30 dB SL, the amplitude and phase coherence of fricative-elicited EFRs were ~1.5 to 2 times higher than all vowel-elicited EFRs, whereas fewer and smaller differences were found among vowel-elicited EFRs. For all stimuli, EFR amplitude and phase coherence increased by roughly 50% for every 10 dB increase in SL between ~0 and 50 dB.

CONCLUSIONS: Stimulus and frequency dependency in EFRs exist despite accounting for differences in audibility of speech sounds. The growth rate of EFR characteristics with SL is independent of stimulus and its frequency.

RevDate: 2021-07-17

Zealouk O, Satori H, Hamidi M, et al (2021)

Analysis of COVID-19 Resulting Cough Using Formants and Automatic Speech Recognition System.

Journal of voice : official journal of the Voice Foundation [Epub ahead of print].

As part of our contributions to researches on the ongoing COVID-19 pandemic worldwide, we have studied the cough changes to the infected people based on the Hidden Markov Model (HMM) speech recognition classification, formants frequency and pitch analysis. In this paper, An HMM-based cough recognition system was implemented with 5 HMM states, 8 Gaussian Mixture Distributions (GMMs) and 13 dimensions of the basic Mel-Frequency Cepstral Coefficients (MFCC) with 39 dimensions of the overall feature vector. A comparison between formants frequency and pitch extracted values is realized based on the cough of COVID-19 infected people and healthy ones to confirm our cough recognition system results. The experimental results present that the difference between the recognition rates of infected and non-infected people is 6.7%. Whereas, the formant analysis variation based on the cough of infected and non-infected people is clearly observed with F1, F3, and F4 and lower for F0 and F2.

RevDate: 2021-07-15
CmpDate: 2021-07-15

Zhang C, Jepson K, Lohfink G, et al (2021)

Comparing acoustic analyses of speech data collected remotely.

The Journal of the Acoustical Society of America, 149(6):3910.

Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.

RevDate: 2021-08-08

Diamant N, O Amir (2021)

Examining the voice of Israeli transgender women: Acoustic measures, voice femininity and voice-related quality-of-life.

International journal of transgender health, 22(3):281-293.

Background: Transgender women may experience gender-dysphoria associated with their voice and the way it is perceived. Previous studies have shown that specific acoustic measures are associated with the perception of voice-femininity and with voice-related quality-of-life, yet results are inconsistent.

Aims: This study aimed to examine the associations between specific voice measures of transgender women, voice-related quality-of-life, and the perception of voice-femininity by listeners and by the speakers themselves.

Methods: Thirty Hebrew speaking transgender women were recorded. They had also rated their voice-femininity and completed the Hebrew version of the TVQMtF questionnaire. Recordings were analyzed to extract mean fundamental frequency (F0), formant frequencies (F1, F2, F3), and vocal-range (calculated in Hz. and in semitones). Recordings were also rated on a voice-gender 7-point scale, by 20 naïve cisgender listeners.

Results: Significant correlations were found between both F0 and F1 and listeners' as well as speakers' evaluation of voice-femininity. TVQMtF scores were significantly correlated with F0 and with the lower and upper boundaries of the vocal-range. Voice-femininity ratings were strongly correlated with vocal-range, when calculated in Hz, but not when defined in semitones. Listeners' evaluation and speakers' self-evaluation of voice-femininity were significantly correlated. However, TVQMtF scores were significantly correlated only with the speakers' voice-femininity ratings, but not with those of the listeners.

Conclusion: Higher F0 and F1, which are perceived as more feminine, jointly improved speakers' satisfaction with their voice. Speakers' self-evaluation of voice-femininity does not mirror listeners' judgment, as it is affected by additional factors, related to self-satisfaction and personal experience. Combining listeners' and speakers' voice evaluation with acoustic analysis is valuable by providing a more holistic view on how transgender women feel about their voice and how it is perceived by listeners.

RevDate: 2021-08-06
CmpDate: 2021-08-06

Leung Y, Oates J, Chan SP, et al (2021)

Associations Between Speaking Fundamental Frequency, Vowel Formant Frequencies, and Listener Perceptions of Speaker Gender and Vocal Femininity-Masculinity.

Journal of speech, language, and hearing research : JSLHR, 64(7):2600-2622.

Purpose The aim of the study was to examine associations between speaking fundamental frequency (f os), vowel formant frequencies (F), listener perceptions of speaker gender, and vocal femininity-masculinity. Method An exploratory study was undertaken to examine associations between f os, F 1-F 3, listener perceptions of speaker gender (nominal scale), and vocal femininity-masculinity (visual analog scale). For 379 speakers of Australian English aged 18-60 years, f os mode and F 1-F 3 (12 monophthongs; total of 36 Fs) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity-masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 Fs could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity-masculinity behaved differently and were predicted by F 1, F 3, and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os. Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity-masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity-masculinity relative to f os than has previously been reported.

RevDate: 2021-08-09

Easwar V, Boothalingam S, R Flaherty (2021)

Fundamental frequency-dependent changes in vowel-evoked envelope following responses.

Hearing research, 408:108297.

Scalp-recorded envelope following responses (EFRs) provide a non-invasive method to assess the encoding of the fundamental frequency (f0) of voice that is important for speech understanding. It is well-known that EFRs are influenced by voice f0. However, this effect of f0 has not been examined independent of concomitant changes in spectra or neural generators. We evaluated the effect of voice f0 on EFRs while controlling for vowel formant characteristics and potentially avoiding significant changes in dominant neural generators using a small f0 range. EFRs were elicited by a male-spoken vowel /u/ (average f0 = 100.4 Hz) and its lowered f0 version (average f0 = 91.9 Hz) with closely matched formant characteristics. Vowels were presented to each ear of 17 young adults with normal hearing. EFRs were simultaneously recorded between the vertex and the nape, and the vertex and the ipsilateral mastoid-the two most common electrode montages used for EFRs. Our results indicate that when vowel formant characteristics are matched, an increase in f0 by 8.5 Hz reduces EFR amplitude by 25 nV, phase coherence by 0.05 and signal-to-noise ratio by 3.5 dB, on average. The reduction in EFR characteristics was similar across ears of stimulation and the two montages used. These findings will help parse the influence of f0 or stimulus spectra on EFRs when both co-vary.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )