picture
RJR-logo

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

icon

Bibliography Options Menu

icon
QUERY RUN:
07 Oct 2024 at 01:46
HITS:
3084
PAGE OPTIONS:
Hide Abstracts   |   Hide Additional Links
NOTE:
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication

RJR-3x

Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 07 Oct 2024 at 01:46 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, "formant" is also sometimes used to refer to the acoustic resonance patten of a human vocal tract. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unique morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal/human vocalizations. Discussions of how formants affect the production and interpretation of vocalizations are available in a few YouTube videos. For example: Formants Explained and Demonstrated or What are FORMANTS and HARMONICS? VOCAL FORMANTS AND HARMONICS Explained! or How Do We Change Our Mouths to Shape Waves? Formants

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)

-->

RevDate: 2024-10-02

Parrell B, Niziolek CA, T Chen (2024)

Sensorimotor adaptation to a non-uniform formant perturbation generalizes to untrained vowels.

Journal of neurophysiology [Epub ahead of print].

When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a non-uniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.

RevDate: 2024-09-25

Huang T, Wang X, Xu T, et al (2024)

Acoustic Analysis of Mandarin-Speaking Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00291-1 [Epub ahead of print].

OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).

METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.

RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.

CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.

RevDate: 2024-09-17
CmpDate: 2024-09-17

Kim H, Ratkute V, B Epp (2024)

Monaural and binaural masking release with speech-like stimuli.

JASA express letters, 4(9):.

The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.

RevDate: 2024-09-16

Chen S, Whalen DH, PPK Mok (2024)

What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.

Phonetica [Epub ahead of print].

Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.

RevDate: 2024-09-11

Thompson A, Y Kim (2024)

Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.

RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.

CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.

RevDate: 2024-09-05

Subrahmanya A, Ranasinghe KG, Kothare H, et al (2024)

Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.

Frontiers in human neuroscience, 18:1424920.

Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.

RevDate: 2024-09-01

Vampola T, Horáček J, AM Laukkanen (2024)

Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00248-0 [Epub ahead of print].

This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.

RevDate: 2024-08-31

Figueroa C, Guillén V, Huenupán F, et al (2024)

Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00254-6 [Epub ahead of print].

UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.

OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.

METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.

RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).

CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.

RevDate: 2024-08-30
CmpDate: 2024-08-30

Zaltz Y (2024)

The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.

Trends in hearing, 28:23312165241275895.

Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.

RevDate: 2024-08-26

Parrell B, Naber C, Kim OA, et al (2024)

Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.

bioRxiv : the preprint server for biology pii:2024.08.13.607718.

Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.

RevDate: 2024-08-25

Chan RKW, BX Wang (2024)

Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?.

Forensic science international, 363:112199 pii:S0379-0738(24)00280-9 [Epub ahead of print].

A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.

RevDate: 2024-08-23
CmpDate: 2024-08-23

Huang L, Yang H, Che Y, et al (2024)

Automatic speech analysis for detecting cognitive decline of older adults.

Frontiers in public health, 12:1417966.

BACKGROUND: Speech analysis has been expected to help as a screening tool for early detection of Alzheimer's disease (AD) and mild-cognitively impairment (MCI). Acoustic features and linguistic features are usually used in speech analysis. However, no studies have yet determined which type of features provides better screening effectiveness, especially in the large aging population of China.

OBJECTIVE: Firstly, to compare the screening effectiveness of acoustic features, linguistic features, and their combination using the same dataset. Secondly, to develop Chinese automated diagnosis model using self-collected natural discourse data obtained from native Chinese speakers.

METHODS: A total of 92 participants from communities in Shanghai, completed MoCA-B and a picture description task based on the Cookie Theft under the guidance of trained operators, and were divided into three groups including AD, MCI, and heathy control (HC) based on their MoCA-B score. Acoustic features (Pitches, Jitter, Shimmer, MFCCs, Formants) and linguistic features (part-of-speech, type-token ratio, information words, information units) are extracted. The machine algorithms used in this study included logistic regression, random forest (RF), support vector machines (SVM), Gaussian Naive Bayesian (GNB), and k-Nearest neighbor (kNN). The validation accuracies of the same ML model using acoustic features, linguistic features, and their combination were compared.

RESULTS: The accuracy with linguistic features is generally higher than acoustic features in training. The highest accuracy to differentiate HC and AD is 80.77% achieved by SVM, based on all the features extracted from the speech data, while the highest accuracy to differentiate HC and AD or MCI is 80.43% achieved by RF, based only on linguistic features.

CONCLUSION: Our results suggest the utility and validity of linguistic features in the automated diagnosis of cognitive impairment, and validated the applicability of automated diagnosis for Chinese language data.

RevDate: 2024-08-22

Holmes L, Rieger G, S Paulmann (2024)

The effect of sexual orientation on voice acoustic properties.

Frontiers in psychology, 15:1412372.

INTRODUCTION: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses.

METHODS: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout.

RESULTS: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex.

DISCUSSION: Recommendations for future "natural voice" investigations are outlined.

RevDate: 2024-08-02

Goncharova M, Jadoul Y, Reichmuth C, et al (2024)

Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.

Annals of the New York Academy of Sciences [Epub ahead of print].

Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures-such as jaw, tongue, lip, and soft palate movements-shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame-by-frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production.

RevDate: 2024-08-01

Dorman MF, Natale SC, Stohl JS, et al (2024)

Close approximations to the sound of a cochlear implant.

Frontiers in human neuroscience, 18:1434786.

Cochlear implant (CI) systems differ in terms of electrode design and signal processing. It is likely that patients fit with different implant systems will experience different percepts when presented speech via their implant. The sound quality of speech can be evaluated by asking single-sided-deaf (SSD) listeners fit with a cochlear implant (CI) to modify clean signals presented to their typically hearing ear to match the sound quality of signals presented to their CI ear. In this paper, we describe very close matches to CI sound quality, i.e., similarity ratings of 9.5 to 10 on a 10-point scale, by ten patients fit with a 28 mm electrode array and MED EL signal processing. The modifications required to make close approximations to CI sound quality fell into two groups: One consisted of a restricted frequency bandwidth and spectral smearing while a second was characterized by a wide bandwidth and no spectral smearing. Both sets of modifications were different from those found for patients with shorter electrode arrays who chose upshifts in voice pitch and formant frequencies to match CI sound quality. The data from matching-based metrics of CI sound quality document that speech sound-quality differs for patients fit with different CIs and among patients fit with the same CI.

RevDate: 2024-07-26

Bonacina S, Krizman J, Farley J, et al (2024)

Persistent post-concussion symptoms include neural auditory processing in young children.

Concussion (London, England), 9(1):CNC114.

AIM: Difficulty understanding speech following concussion is likely caused by auditory processing impairments. We hypothesized that concussion disrupts pitch and phonetic processing of a sound, cues in understanding a talker.

We obtained frequency following responses to a syllable from 120 concussed and 120 control. Encoding of the fundamental frequency (F0), a pitch cue and the first formant (F1), a phonetic cue, was poorer in concussed children. The F0 reduction was greater in the children assessed within 2 weeks of their injuries.

CONCLUSION: Concussions affect auditory processing. Results strengthen evidence of reduced F0 encoding in children with concussion and call for longitudinal study aimed at monitoring the recovery course with respect to the auditory system.

RevDate: 2024-07-19

Li JJ, Daliri A, Kim KS, et al (2024)

Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.

bioRxiv : the preprint server for biology pii:2024.07.13.603344.

Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production and its possible relation to PSAM at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).

RevDate: 2024-07-17

Doyle KA, Harel D, Feeny GT, et al (2024)

Word and Gender Identification in the Speech of Transgender Individuals.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00178-4 [Epub ahead of print].

Listeners use speech to identify both linguistic information, such as the word being produced, and indexical attributes, such as the gender of the speaker. Previous research has shown that these two aspects of speech perception are interrelated. It is important to understand this relationship in the context of gender-affirming voice training (GAVT), where changes in speech production as part of a speaker's gender-affirming care could potentially influence listeners' recognition of the intended utterance. This study conducted a secondary analysis of data from an experiment in which trans women matched shifted targets for the second formant frequency using visual-acoustic biofeedback. Utterances were synthetically altered to feature a gender-ambiguous fundamental frequency and were presented to blinded listeners for rating on a visual analog scale representing the gender spectrum, as well as word identification in a forced-choice task. We found a statistically significant association between the accuracy of word identification and the gender rating of utterances. However, there was no statistically significant difference in word identification accuracy for the formant-shifted conditions relative to an unshifted condition. Overall, these results support previous research in finding that word identification and speaker gender identification are interrelated processes; however, the findings also suggest that a small magnitude of shift in formant frequencies (of the type that might be pursued in a GAVT context) does not have a significant negative impact on the perceptual recoverability of isolated words.

RevDate: 2024-07-10
CmpDate: 2024-07-10

Lorenzoni DC, Henriques JFC, Silva LKD, et al (2024)

Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.

Dental press journal of orthodontics, 29(3):e2423277.

OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech.

MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison.

RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks.

CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.

RevDate: 2024-07-09

Liu B, Lei J, Wischhoff OP, et al (2024)

Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000540255 [Epub ahead of print].

INTRODUCTION: Benign and malignant vocal fold lesions are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aims to determine the important acoustic characteristics specific to benign and malignant vocal fold lesions.

METHODS: The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose and Throat Hospital between May 2020 and July 2021.The acoustic variables of the participants were analyzed using Principal Component Analysis to present important acoustic characteristics that are specific to normal vocal folds, benign vocal fold lesions, and malignant vocal fold lesions. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia.

RESULTS: Using the Principal Component Analysis method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion as a weighted factor in Principal Component Analysis was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch and coefficients of variation for formant dispersion were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia.

CONCLUSION: Experimental evidence demonstrates the utility of the Principal Component Analysis method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the Principal Component Analysis analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.

RevDate: 2024-07-01

Fletcher MD, Akis E, Verschuur CA, et al (2024)

Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion.

Scientific reports, 14(1):15029.

Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.

RevDate: 2024-06-25

Sahoo AK, Sahoo PK, Gupta V, et al (2024)

Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.

Cureus, 16(5):e60873.

Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.

RevDate: 2024-06-18

Xiu N, Li W, Liu L, et al (2024)

A Study on Voice Measures in Patients with Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00168-1 [Epub ahead of print].

PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers.

METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features.

RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers.

CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.

RevDate: 2024-06-16

Weirich M, Simpson AP, N Knutti (2024)

Effects of testosterone on speech production and perception: Linking hormone levels in males to vocal cues and female voice attractiveness ratings.

Physiology & behavior pii:S0031-9384(24)00160-4 [Epub ahead of print].

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.

RevDate: 2024-06-14

Iyer R, D Meyer (2022)

Detection of Suicide Risk Using Vocal Characteristics: Systematic Review.

JMIR biomedical engineering, 7(2):e42386 pii:v7i2e42386.

BACKGROUND: In an age when telehealth services are increasingly being used for forward triage, there is a need for accurate suicide risk detection. Vocal characteristics analyzed using artificial intelligence are now proving capable of detecting suicide risk with accuracies superior to traditional survey-based approaches, suggesting an efficient and economical approach to ensuring ongoing patient safety.

OBJECTIVE: This systematic review aimed to identify which vocal characteristics perform best at differentiating between patients with an elevated risk of suicide in comparison with other cohorts and identify the methodological specifications of the systems used to derive each feature and the accuracies of classification that result.

METHODS: A search of MEDLINE via Ovid, Scopus, Computers and Applied Science Complete, CADTH, Web of Science, ProQuest Dissertations and Theses A&I, Australian Policy Online, and Mednar was conducted between 1995 and 2020 and updated in 2021. The inclusion criteria were human participants with no language, age, or setting restrictions applied; randomized controlled studies, observational cohort studies, and theses; studies that used some measure of vocal quality; and individuals assessed as being at high risk of suicide compared with other individuals at lower risk using a validated measure of suicide risk. Risk of bias was assessed using the Risk of Bias in Non-randomized Studies tool. A random-effects model meta-analysis was used wherever mean measures of vocal quality were reported.

RESULTS: The search yielded 1074 unique citations, of which 30 (2.79%) were screened via full text. A total of 21 studies involving 1734 participants met all inclusion criteria. Most studies (15/21, 71%) sourced participants via either the Vanderbilt II database of recordings (8/21, 38%) or the Silverman and Silverman perceptual study recording database (7/21, 33%). Candidate vocal characteristics that performed best at differentiating between high risk of suicide and comparison cohorts included timing patterns of speech (median accuracy 95%), power spectral density sub-bands (median accuracy 90.3%), and mel-frequency cepstral coefficients (median accuracy 80%). A random-effects meta-analysis was used to compare 22 characteristics nested within 14% (3/21) of the studies, which demonstrated significant standardized mean differences for frequencies within the first and second formants (standardized mean difference ranged between -1.07 and -2.56) and jitter values (standardized mean difference=1.47). In 43% (9/21) of the studies, risk of bias was assessed as moderate, whereas in the remaining studies (12/21, 57%), the risk of bias was assessed as high.

CONCLUSIONS: Although several key methodological issues prevailed among the studies reviewed, there is promise in the use of vocal characteristics to detect elevations in suicide risk, particularly in novel settings such as telehealth or conversational agents.

TRIAL REGISTRATION: PROSPERO International Prospective Register of Systematic Reviews CRD420200167413; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020167413.

RevDate: 2024-06-09

Krupić F, Moravcova M, Dervišević E, et al (2024)

When time does not heal all wounds: three decades' experience of immigrants living in Sweden.

Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina, 21(2): [Epub ahead of print].

AIM: To investigate how immigrants from the Balkan region experienced their current life situation after living in Sweden for 30 years or more.

MATERIALS: The study was designed as a qualitative study using data from interviews with informants from five Balkan countries. The inclusion criteria were informants who were immigrants to Sweden and had lived in Sweden for more than 30 years. Five groups comprising sixteen informants were invited to participate in the study, and they all agreed.

RESULTS: The analysis of the interviews resulted in three main categories: "from someone to no one", "labour market", and "discrimination". All the informants reported that having an education and life experience was worth-less, having a life but having to start over, re-educating, applying for many jobs but often not being answered, and finally getting a job for which every in-formant was educated but being humiliated every day and treated separately as well as being discriminated against.

CONCLUSION: Coming to Sweden with all their problems, having an education and work experience that was equal to zero in Sweden, studying Swedish and re-reading/repeating all their education, looking for a job and not receiving answers to applications, and finally getting a job but being treated differently and discriminated against on a daily basis was experienced by all the in-formants as terrible. Even though there are enough similar studies in Sweden, it is always good to write more to help prospective immigrants and prospective employers in Sweden.

RevDate: 2024-06-07

Mittapalle KR, P Alku (2024)

Classification of phonation types in singing voice using wavelet scattering network-based features.

JASA express letters, 4(6):.

The automatic classification of phonation types in singing voice is essential for tasks such as identification of singing style. In this study, it is proposed to use wavelet scattering network (WSN)-based features for classification of phonation types in singing voice. WSN, which has a close similarity with auditory physiological models, generates acoustic features that greatly characterize the information related to pitch, formants, and timbre. Hence, the WSN-based features can effectively capture the discriminative information across phonation types in singing voice. The experimental results show that the proposed WSN-based features improved phonation classification accuracy by at least 9% compared to state-of-the-art features.

RevDate: 2024-06-06

Gorina-Careta N, Arenillas-Alcón S, Puertollano M, et al (2024)

Exposure to bilingual or monolingual maternal speech during pregnancy affects the neurophysiological encoding of speech sounds in neonates differently.

Frontiers in human neuroscience, 18:1379660.

INTRODUCTION: Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother's voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus' neural mechanisms underlying speech sound encoding.

METHODS: In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 129 healthy term neonates within 1 to 3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F0) and first formant (F1) frequencies of each vowel were, respectively, taken as measures of pitch and formant structure neural encoding.

RESULTS: Our results reveal that while spectral amplitudes at F0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F1 frequencies.

DISCUSSION: We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.

RevDate: 2024-05-31

Wu HY (2024)

Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender.

METHOD: Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models.

RESULTS: Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception.

CONCLUSIONS: The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.

RevDate: 2024-05-24

Neuhaus TJ, Scherer RC, JA Whitfield (2024)

Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00016-X [Epub ahead of print].

OBJECTIVE: To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender.

METHODS: Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment.

RESULTS: Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short.

CONCLUSIONS: The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.

RevDate: 2024-05-23
CmpDate: 2024-05-23

Balolia KL, PL Fitzgerald (2024)

Male proboscis monkey cranionasal size and shape is associated with visual and acoustic signalling.

Scientific reports, 14(1):10715.

The large nose adorned by adult male proboscis monkeys is hypothesised to serve as an audiovisual signal of sexual selection. It serves as a visual signal of male quality and social status, and as an acoustic signal, through the expression of loud, low-formant nasalised calls in dense rainforests, where visibility is poor. However, it is unclear how the male proboscis monkey nasal complex, including the internal structure of the nose, plays a role in visual or acoustic signalling. Here, we use cranionasal data to assess whether large noses found in male proboscis monkeys serve visual and/or acoustic signalling functions. Our findings support a visual signalling function for male nasal enlargement through a relatively high degree of nasal aperture sexual size dimorphism, the craniofacial region to which nasal soft tissue attaches. We additionally find nasal aperture size increases beyond dental maturity among male proboscis monkeys, consistent with the visual signalling hypothesis. We show that the cranionasal region has an acoustic signalling role through pronounced nasal cavity sexual shape dimorphism, wherein male nasal cavity shape allows the expression of loud, low-formant nasalised calls. Our findings provide robust support for the male proboscis monkey nasal complex serving both visual and acoustic functions.

RevDate: 2024-05-23

Beach SD, CA Niziolek (2024)

Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task.

Cognitive neuropsychology [Epub ahead of print].

How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.

RevDate: 2024-05-16

Aaen M, C Sadolin (2024)

Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00117-6 [Epub ahead of print].

UNLABELLED: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.

METHODS: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.

RESULTS: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.

CONCLUSION: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.

RevDate: 2024-05-16

Ning LH, TC Hui (2024)

The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study.

METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /ɛ/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains.

RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch.

CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.

RevDate: 2024-05-14

Ekström AG (2024)

Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022).

American journal of primatology [Epub ahead of print].

The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.

RevDate: 2024-05-13

Cao S, Rosenzweig I, Bilotta F, et al (2024)

Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review.

Journal of thoracic disease, 16(4):2654-2667.

BACKGROUND AND OBJECTIVE: Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach.

METHODS: PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed.

KEY CONTENT AND FINDINGS: Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA.

CONCLUSIONS: Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.

RevDate: 2024-05-08
CmpDate: 2024-05-08

Feng H, L Wang (2024)

Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.

The Journal of the Acoustical Society of America, 155(5):3071-3089.

This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.

RevDate: 2024-05-07
CmpDate: 2024-05-07

Ostrega J, Shiramizu V, Lee AJ, et al (2024)

No evidence that averaging voices influences attractiveness.

Scientific reports, 14(1):10488.

Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.

RevDate: 2024-05-04

Leyns C, Adriaansen A, Daelman J, et al (2024)

Long-term Acoustic Effects of Gender-Affirming Voice Training in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00123-1 [Epub ahead of print].

OBJECTIVES: One role of a speech-language pathologist (SLP) is to help transgender clients in developing a healthy, gender-congruent communication. Transgender women frequently approach SLPs to train their voices to sound more feminine, however, long-term acoustic effects of the training needs to be rigorously examined in effectiveness studies. The aim of this study was to investigate the long-term effects (follow-up 1: 3months and follow-up 2: 1year after last session) of gender-affirming voice training for transgender women, in terms of acoustic parameters.

STUDY DESIGN: This study was a randomized sham-controlled trial with a cross-over design.

METHODS: Twenty-six transgender women were included for follow-up 1 and 18 for follow-up 2. All participants received 14weeks of gender-affirming voice training (4weeks sham training, 10weeks of voice feminization training: 5weeks pitch elevation training and 5weeks articulation-resonance training), but in a different order. Speech samples were recorded with Praat at four different time points (pre, post, follow-up 1, follow-up 2). Acoustic analysis included fo of sustained vowel /a:/, reading and spontaneous speech. Formant frequencies (F1-F2-F3) of vowels /a/, /i/, and /u/ were determined and vowel space was calculated. A linear mixed model was used to compare the acoustic voice measurements between measurements (pre - post, pre - follow-up 1, pre - follow-up 2, post - follow-up 1, post - follow-up 2, follow-up 1 - follow-up 2).

RESULTS: Most of the fo measurements and formant frequencies that increased immediately after the intervention, were stable at both follow-up measurements. The median fo during the sustained vowel, reading and spontaneous speech stayed increased at both follow-ups compared to the pre-measurement. However, a decrease of 16 Hz/1.7 ST (reading) and 12 Hz/1.5 ST (spontaneous speech) was detected between the post-measurement (169 Hz for reading, 144 Hz for spontaneous speech) and 1year after the last session (153 Hz and 132 Hz, respectively). The lower limit of fo did not change during reading and spontaneous speech, both directly after the intervention and during both follow-ups. F1-2 of vowel /a/ and the vowel space increased after the intervention and both follow-ups. Individual analyses showed that more aspects should be controlled after the intervention, such as exercises that were performed at home, or the duration of extra gender-affirming voice training sessions.

CONCLUSIONS: After 10 sessions of voice feminization training and follow-up measurements after 3months and 1year, stable increases were found for some formant frequencies and fo measurements, but not all of them. More time should be spent on increasing the fifth percentile of fo, as the lower limit of fo also contributes to the perception of more feminine voice.

RevDate: 2024-05-02

Kocjančič T, Bořil T, S Hofmann (2024)

Acoustic and Articulatory Visual Feedback in Classroom L2 Vowel Remediation.

Language and speech [Epub ahead of print].

This paper presents L2 vowel remediation in a classroom setting via two real-time visual feedback methods: articulatory ultrasound tongue imaging, which shows tongue shape and position, and a newly developed acoustic formant analyzer, which visualizes a point correlating with the combined effect of tongue position and lip rounding in a vowel quadrilateral. Ten Czech students of the Swedish language participated in the study. Swedish vowel production is difficult for Czech speakers since the languages differ significantly in their vowel systems. The students selected the vowel targets on their own and practiced in two classroom groups, with six students receiving two ultrasound training lessons, followed by one acoustic, and four students receiving two acoustic lessons, followed by one ultrasound. Audio data were collected pre-training, after the two sessions employing the first visual feedback method, and at post-training, allowing measuring Euclidean distance among selected groups of vowels and observing the direction of change within the vowel quadrilateral as a result of practice. Perception tests were performed before and after training, revealing that most learners perceived selected vowels correctly already before the practice. The study showed that both feedback methods can be successfully applied to L2 classroom learning, and both lead to the improvement in the pronunciation of the selected vowels, as well as the Swedish vowel set as a whole. However, ultrasound tongue imaging seems to have an advantage as it resulted in a greater number of improved targets.

RevDate: 2024-04-24

Saldías O'Hrens M, Castro C, Espinoza VM, et al (2024)

Spectral features related to the auditory perception of twang-like voices.

Logopedics, phoniatrics, vocology [Epub ahead of print].

BACKGROUND: To the best of our knowledge, studies on the relationship between spectral energy distribution and the degree of perceived twang-like voices are still sparse. Through an auditory-perceptual test we aimed to explore the spectral features that may relate with the auditory-perception of twang-like voices.

METHODS: Ten judges who were blind to the test's tasks and stimuli rated the amount of twang perceived on seventy-six audio samples. The stimuli consisted of twenty voices recorded from eight CCM singers who sustained the vowel [a:] in different pitches, with and without a twang-like voice. Also, forty filtered and sixteen synthesized-manipulated stimuli were included.

RESULTS AND CONCLUSIONS: Based on the intra-rater reliability scores, four judges were identified as suitable to be included in the analyses. Results showed that the frequency of F1 and F2 correlated strongly with the auditory-perception of twang-like voices (0.90 and 0.74, respectively), whereas F3 showed a moderate negative correlation (-0.52). The frequency difference between F1 and F3 showed a strong negative correlation (-0.82). The mean energy between 1-2 kHz and 2-3 kHz correlated moderately (0.51 and 0.42, respectively). The frequency of F4 and F5, and the energy above 3 kHz showed weak correlations. Since the spectral changes under 2 kHz have been associated with the jaw, lips, and tongue adjustments (i.e. vowel articulation) and a higher vertical laryngeal position might affect the frequency of all formants (including F1 and F2), our results suggest that vowel articulation and the laryngeal height may be relevant when performing twang-like voices.

RevDate: 2024-04-21

Cruz TLB, Frič M, PA Andrade (2024)

A Comparison of Countertenor Singing at Various Professional Levels Using Acoustic, Electroglottographic, and Videofluoroscopic Methods.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00111-5 [Epub ahead of print].

INTRODUCTION: The vocal characteristics of countertenors (CTTs) are poorly understood due to a lack of studies in this field. This study aims to explore differences among CTTs at various professional levels, examining both disparities and congruences in singing styles to better understand the CTT voice.

MATERIALS AND METHODS: Four CTTs (one student, one amateur, and two professionals) sang "La giustizia ha già sull'arco" from Handel's Giulio Cesare, with concurrent videofluoroscopic, electroglottography (EGG), and acoustic data collection. Auditory-perceptual analysis was employed to rate professional level. Acoustic analysis included LH1-LH2, formant cluster prominence, and vibrato analysis. EGG data was analyzed using FonaDyn software, while anatomical modifications were quantified using videofluoroscopic images.

RESULTS: CTTs exhibited EGG contact quotient values surpassing typical levels for inexperienced falsettos. Their vibrato characteristics aligned with expectations for classical singing, whereas the presence of the singer's formant was not observed. Variations in supraglottic adjustments among CTTs underscored the diversity of techniques employed by CTT singers.

CONCLUSIONS: CTTs exhibited vocal techniques that highlighted the influence of individual preferences, professional experience, and stylistic choices in shaping their singing characteristics. The data revealed discernible differences between professional and amateur CTTs, providing insights into the impact of varying levels of experience on vocal expression.

RevDate: 2024-04-17

Torres C, Li W, P Escudero (2024)

Acoustic, phonetic, and phonological features of Drehu vowels.

The Journal of the Acoustical Society of America, 155(4):2612-2626.

This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.

RevDate: 2024-04-02

Wang H, Ali Y, L Max (2024)

Perceptual formant discrimination during speech movement planning.

PloS one, 19(4):e0301514 pii:PONE-D-23-34985.

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.

RevDate: 2024-04-01

Havenhill J (2024)

Articulatory and acoustic dynamics of fronted back vowels in American English.

The Journal of the Acoustical Society of America, 155(4):2285-2301.

Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.

RevDate: 2024-03-26

Singh VP, Sahidullah M, T Kinnunen (2024)

ChildAugment: Data augmentation methods for zero-resource children's speaker verification.

The Journal of the Acoustical Society of America, 155(3):2221-2232.

The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.

RevDate: 2024-03-19

Södersten M, Oates J, Sand A, et al (2024)

Gender-Affirming Voice Training for Trans Women: Acoustic Outcomes and Their Associations With Listener Perceptions Related to Gender.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00023-7 [Epub ahead of print].

OBJECTIVES: To investigate acoustic outcomes of gender-affirming voice training for trans women wanting to develop a female sounding voice and to describe what happens acoustically when male sounding voices become more female sounding.

STUDY DESIGN: Prospective treatment study with repeated measures.

METHODS: N = 74 trans women completed a voice training program of 8-12 sessions and had their voices audio recorded twice before and twice after training. Reference data were obtained from N = 40 cisgender speakers. Fundamental frequency (fo), formant frequencies (F1-F4), sound pressure level (Leq), and level difference between first and second harmonic (L1-L2) were extracted from a reading passage and spontaneous speech. N = 79 naive listeners provided gender-related ratings of participants' audio recordings. A linear mixed-effects model was used to estimate average training effects. Individual level analyses determined how changes in acoustic data were related to listeners' ratings.

RESULTS: Group data showed substantial training effects on fo (average, minimum, and maximum) and formant frequencies. Individual data demonstrated that many participants also increased Leq and some increased L1-L2. Measures that most strongly predicted listener ratings of a female sounding voice were: fo, average formant frequency, and Leq.

CONCLUSIONS: This is the largest prospective study reporting on acoustic outcomes of gender-affirming voice training for trans women. We confirm findings from previous smaller scale studies by demonstrating that listener perceptions of male and female sounding voices are related to acoustic voice features, and that voice training for trans women wanting to sound female is associated with desirable acoustic changes, indicating training effectiveness. Although acoustic measures can be a valuable indicator of training effectiveness, particularly from the perspective of clinicians and researchers, we contend that a combination of outcome measures, including client perspectives, are needed to provide comprehensive evaluation of gender-affirming voice training that is relevant for all stakeholders.

RevDate: 2024-03-19

Dolquist DV, B Munson (2024)

Clinical Focus: The Development and Description of a Palette of Transmasculine Voices.

American journal of speech-language pathology [Epub ahead of print].

PURPOSE: The study of gender and speech has historically excluded studies of transmasculine individuals. Consequently, generalizations about speech and gender are based on cisgender individuals. This lack of representation hinders clinical training and clinical service delivery, particularly by speech-language pathologists providing gender-affirming communication services. This letter describes a new corpus of the speech of American English-speaking transmasculine men, transmasculine nonbinary people, and cisgender men that is open and available to clinicians and researchers.

METHOD: Twenty masculine-presenting native English speakers from the Upper Midwestern United States (including cisgender men, transmasculine men, and transmasculine nonbinary people) were recorded, producing three sets of speech materials: Consensus Auditory-Perceptual Evaluation of Voice sentences, the Rainbow Passage, and a novel set of sentences developed for this project. Acoustic measures vowels (overall formant frequency scaling, vowel-space dispersion, fundamental frequency, breathiness), consonants (voice onset time of word-initial voiceless stops, spectral moments of word-initial /s/), and the entire sentence (rate of speech) that were made.

RESULTS: The acoustic measures reveal a wide range for all dependent measures and low correlations among the measures. Results show that many of the voices depart considerably from the norms for men's speech in published studies.

CONCLUSION: This new corpus can be used to illustrate different ways of sounding masculine by speech-language pathologists performing gender-affirming communication services and by higher education teachers as examples of diverse ways of sounding masculine.

RevDate: 2024-03-18

Kim Y, Thompson A, ISB Nip (2024)

Effects of Deep-Brain Stimulation on Speech: Perceptual and Acoustic Data.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study examined speech changes induced by deep-brain stimulation (DBS) in speakers with Parkinson's disease (PD) using a set of auditory-perceptual and acoustic measures.

METHOD: Speech recordings from nine speakers with PD and DBS were compared between DBS-On and DBS-Off conditions using auditory-perceptual and acoustic analyses. Auditory-perceptual ratings included voice quality, articulation precision, prosody, speech intelligibility, and listening effort obtained from 44 listeners. Acoustic measures were made for voicing proportion, second formant frequency slope, vowel dispersion, articulation rate, and range of fundamental frequency and intensity.

RESULTS: No significant changes were found between DBS-On and DBS-Off for the five perceptual ratings. Four of six acoustic measures revealed significant differences between the two conditions. While articulation rate and acoustic vowel dispersion increased, voicing proportion and intensity range decreased from the DBS-Off to DBS-On condition. However, a visual examination of the data indicated that the statistical significance was mostly driven by a small number of participants, while the majority did not show a consistent pattern of such changes.

CONCLUSIONS: Our data, in general, indicate no-to-minimal changes in speech production ensued from DBS stimulation. The findings are discussed with a focus on large interspeaker variability in PD in terms of their speech characteristics and the potential effects of DBS on speech.

RevDate: 2024-03-18

Sabev M, B Andreeva (2024)

The acoustics of Contemporary Standard Bulgarian vowels: A corpus study.

The Journal of the Acoustical Society of America, 155(3):2128-2138.

A comprehensive examination of the acoustics of Contemporary Standard Bulgarian vowels is lacking to date, and this article aims to fill that gap. Six acoustic variables-the first three formant frequencies, duration, mean f0, and mean intensity-of 11 615 vowel tokens from 140 speakers were analysed using linear mixed models, multivariate analysis of variance, and linear discriminant analysis. The vowel system, which comprises six phonemes in stressed position, [ε a ɔ i ɤ u], was examined from four angles. First, vowels in pretonic syllables were compared to other unstressed vowels, and no spectral or durational differences were found, contrary to an oft-repeated claim that pretonic vowels reduce less. Second, comparisons of stressed and unstressed vowels revealed significant differences in all six variables for the non-high vowels [ε a ɔ]. No spectral or durational differences were found in [i ɤ u], which disproves another received view that high vowels are lowered when unstressed. Third, non-high vowels were compared with their high counterparts; the height contrast was completely neutralized in unstressed [a-ɤ] and [ɔ-u] while [ε-i] remained distinct. Last, the acoustic correlates of vowel contrasts were examined, and it was demonstrated that only F1, F2 frequencies and duration were systematically employed in differentiating vowel phonemes.

RevDate: 2024-03-18

Ashokumar M, Schwartz JL, T Ito (2024)

Changes in Speech Production Following Perceptual Training With Orofacial Somatosensory Inputs.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Orofacial somatosensory inputs play an important role in speech motor control and speech learning. Since receiving specific auditory-somatosensory inputs during speech perceptual training alters speech perception, similar perceptual training could also alter speech production. We examined whether the production performance was changed by perceptual training with orofacial somatosensory inputs.

METHOD: We focused on the French vowels /e/ and /ø/, contrasted in their articulation by horizontal gestures. Perceptual training consisted of a vowel identification task contrasting /e/ and /ø/. Along with training, for the first group of participants, somatosensory stimulation was applied as facial skin stretch in backward direction. We recorded the target vowels uttered by the participants before and after the perceptual training and compared their F1, F2, and F3 formants. We also tested a control group with no somatosensory stimulation and another somatosensory group with a different vowel continuum (/e/-/i/) for perceptual training.

RESULTS: Perceptual training with somatosensory stimulation induced changes in F2 and F3 in the produced vowel sounds. F2 decreased consistently in the two somatosensory groups. F3 increased following the /e/-/ø/ training and decreased following the /e/-/i/ training. F2 change was significantly correlated with the perceptual shift between the first and second half of the training phase in the somatosensory group with the /e/-/ø/ training, but not with the /e/-/i/ training. The control group displayed no effect on F2 and F3, and just a tendency of F1 increase.

CONCLUSION: The results suggest that somatosensory inputs associated to speech sound inputs can play a role in speech training and learning in both production and perception.

RevDate: 2024-03-14

Saha S, Rattansingh A, Martino R, et al (2024)

A pilot observation using ultrasonography and vowel articulation to investigate the influence of suspected obstructive sleep apnea on upper airway.

Scientific reports, 14(1):6144.

Failure to employ suitable measures before administering full anesthesia to patients with obstructive sleep apnea (OSA) who are undergoing surgery may lead to developing complications after surgery. Therefore, it is very important to screen OSA before performing a surgery, which is currently done by subjective questionnaires such as STOP-Bang, Berlin scores. These questionnaires have 10-36% specificity in detecting sleep apnea, along with no information given on anatomy of upper airway, which is important for intubation. To address these challenges, we performed a pilot study to understand the utility of ultrasonography and vowel articulation in screening OSA. Our objective was to investigate the influence of OSA risk factors in vowel articulation through ultrasonography and acoustic features analysis. To accomplish this, we recruited 18 individuals with no risk of OSA and 13 individuals with high risk of OSA and asked them to utter vowels, such as /a/ (as in "Sah"), /e/ (as in "See"). An expert ultra-sonographer measured the parasagittal anterior-posterior (PAP) and transverse diameter of the upper airway. From the recorded vowel sounds, we extracted 106 features, including power, pitch, formant, and Mel frequency cepstral coefficients (MFCC). We analyzed the variation of the PAP diameters and vowel features from "See: /i/" to "Sah /a/" between control and OSA groups by two-way repeated measures ANOVA. We found that, there was a variation of upper airway diameter from "See" to "Sah" was significantly smaller in OSA group than control group (OSA: ∆12.8 ± 5.3 mm vs. control: ∆22.5 ± 3.9 mm OSA, p < 0.01). Moreover, we found several vowel features showed the exact same or opposite trend as PAP diameter variation, which led us to build a machine learning model to estimate PAP diameter from vowel features. We found a correlation coefficient of 0.75 between the estimated and measured PAP diameter after applying four estimation models and combining their output with a random forest model, which showed the feasibility of using acoustic features of vowel sounds to monitor upper airway diameter. Overall, this study has proven the concept that ultrasonography and vowel sounds analysis may be useful as an easily accessible imaging tool of upper airway.

RevDate: 2024-03-12

Lee H, Cho M, HY Kwon (2024)

Attention-based speech feature transfer between speakers.

Frontiers in artificial intelligence, 7:1259641.

In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.

RevDate: 2024-03-08

Borjigin A, Bakst S, Anderson K, et al (2024)

Discrimination and sensorimotor adaptation of self-produced vowels in cochlear implant users.

The Journal of the Acoustical Society of America, 155(3):1895-1908.

Humans rely on auditory feedback to monitor and adjust their speech for clarity. Cochlear implants (CIs) have helped over a million people restore access to auditory feedback, which significantly improves speech production. However, there is substantial variability in outcomes. This study investigates the extent to which CI users can use their auditory feedback to detect self-produced sensory errors and make adjustments to their speech, given the coarse spectral resolution provided by their implants. First, we used an auditory discrimination task to assess the sensitivity of CI users to small differences in formant frequencies of their self-produced vowels. Then, CI users produced words with altered auditory feedback in order to assess sensorimotor adaptation to auditory error. Almost half of the CI users tested can detect small, within-channel differences in their self-produced vowels, and they can utilize this auditory feedback towards speech adaptation. An acoustic hearing control group showed better sensitivity to the shifts in vowels, even in CI-simulated speech, and elicited more robust speech adaptation behavior than the CI users. Nevertheless, this study confirms that CI users can compensate for sensory errors in their speech and supports the idea that sensitivity to these errors may relate to variability in production.

RevDate: 2024-03-05

Stone TC, ML Erickson (2024)

Experienced and Inexperienced Listeners' Perception of Vocal Strain.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00024-9 [Epub ahead of print].

OBJECTIVE: The ability to perceive strain or tension in a voice is critical for both speech-language pathologists and singing teachers. Research on voice quality has focused primarily on the perception of breathiness or roughness. The perception of vocal strain has not been extensively researched and is poorly understood.

METHODS/DESIGN: This study employs a group and a within-subject design. Synthetic female sung stimuli were created that varied in source slope and vocal tract transfer function. Two groups of listeners, inexperienced listeners and experienced vocal pedagogues, listened to the stimuli and rated the perceived strain using a visual analog scale Synthetic female stimuli were constructed on the vowel /ɑ/ at 2 pitches, A3 and F5, using glottal source slopes that drop in amplitude at constant rates varying from - 6 dB/octave to - 18 dB/octave. All stimuli were filtered using three vocal tract transfer functions, one derived from a lyric/coloratura soprano, one derived from a mezzo-soprano, and a third that has resonance frequencies mid-way between the two. Listeners heard the stimuli over headphones and rated them on a scale from "no strain" to "very strained" using a visual-analog scale.

RESULTS: Spectral source slope was strongly related to the perception of strain in both groups of listeners. Experienced listeners' perception of strain was also related to formant pattern, while inexperienced listeners' perception of strain was also related to pitch.

CONCLUSION: This study has shown that spectral source slope can be a powerful cue to the perception of strain. However, inexperienced and experienced listeners also differ from each other in how strain is perceived across speaking and singing pitches. These differences may be based on both experience and the goals of the listener.

RevDate: 2024-03-05

Umashankar A, Ramamoorthy S, Selvaraj JL, et al (2024)

Comparative Study on the Acoustic Analysis of Voice in Auditory Brainstem Implantees, Cochlear Implantees, and Normal Hearing Children.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 76(1):645-652.

The aim of the study was to compare the acoustic characteristics of voice between Auditory Brainstem Implantees, Cochlear Implantees and normal hearing children. Voice parameters such as fundamental frequency, formant frequencies, perturbation measures, and harmonic to noise ratio were measured in a total of 30 children out of which 10 were Auditory Brainstem Implantees, 10 were Cochlear Implantees and 10 were normal hearing children. Parametric and nonparametric statistics were done to establish the nature of significance between the three groups. Overall deviancies were seen in the implanted group for all acoustic parameters. However abnormal deviations were seen in individuals with Auditory Brainstem Implants indicating the deficit in the feedback loop impacting the voice characteristics. The deviancy in feedback could attribute to the poor performance in ABI and CI. The CI performed comparatively better when compared to the ABI group indicating a slight feedback loop due to the type of Implant. However, there needs to be additional evidence supporting this and there is a need to carry out the same study using a larger sample size and a longitudinal design.

RevDate: 2024-03-04

Cuadros J, Z-Rivera L, Castro C, et al (2023)

DIVA Meets EEG: Model Validation Using Formant-Shift Reflex.

Applied sciences (Basel, Switzerland), 13(13):.

The neurocomputational model 'Directions into Velocities of Articulators' (DIVA) was developed to account for various aspects of normal and disordered speech production and acquisition. The neural substrates of DIVA were established through functional magnetic resonance imaging (fMRI), providing physiological validation of the model. This study introduces DIVA_EEG an extension of DIVA that utilizes electroencephalography (EEG) to leverage the high temporal resolution and broad availability of EEG over fMRI. For the development of DIVA_EEG, EEG-like signals were derived from original equations describing the activity of the different DIVA maps. Synthetic EEG associated with the utterance of syllables was generated when both unperturbed and perturbed auditory feedback (first formant perturbations) were simulated. The cortical activation maps derived from synthetic EEG closely resembled those of the original DIVA model. To validate DIVA_EEG, the EEG of individuals with typical voices (N = 30) was acquired during an altered auditory feedback paradigm. The resulting empirical brain activity maps significantly overlapped with those predicted by DIVA_EEG. In conjunction with other recent model extensions, DIVA_EEG lays the foundations for constructing a complete neurocomputational framework to tackle vocal and speech disorders, which can guide model-driven personalized interventions.

RevDate: 2024-02-28

Fletcher MD, Akis E, Verschuur CA, et al (2024)

Improved tactile speech perception using audio-to-tactile sensory substitution with formant frequency focusing.

Scientific reports, 14(1):4889.

Haptic hearing aids, which provide speech information through tactile stimulation, could substantially improve outcomes for both cochlear implant users and for those unable to access cochlear implants. Recent advances in wide-band haptic actuator technology have made new audio-to-tactile conversion strategies viable for wearable devices. One such strategy filters the audio into eight frequency bands, which are evenly distributed across the speech frequency range. The amplitude envelopes from the eight bands modulate the amplitudes of eight low-frequency tones, which are delivered through vibration to a single site on the wrist. This tactile vocoder strategy effectively transfers some phonemic information, but vowels and obstruent consonants are poorly portrayed. In 20 participants with normal touch perception, we tested (1) whether focusing the audio filters of the tactile vocoder more densely around the first and second formant frequencies improved tactile vowel discrimination, and (2) whether focusing filters at mid-to-high frequencies improved obstruent consonant discrimination. The obstruent-focused approach was found to be ineffective. However, the formant-focused approach improved vowel discrimination by 8%, without changing overall consonant discrimination. The formant-focused tactile vocoder strategy, which can readily be implemented in real time on a compact device, could substantially improve speech perception for haptic hearing aid users.

RevDate: 2024-02-21

Maya Lastra N, Rangel Negrín A, Coyohua Fuentes A, et al (2024)

Mantled howler monkey males assess their rivals through formant spacing of long-distance calls.

Primates; journal of primatology [Epub ahead of print].

Formant frequency spacing of long-distance vocalizations is allometrically related to body size and could represent an honest signal of fighting potential. There is, however, only limited evidence that primates use formant spacing to assess the competitive potential of rivals during interactions with extragroup males, a risky context. We hypothesized that if formant spacing of long-distance calls is inversely related to the fighting potential of male mantled howler monkeys (Alouatta palliata), then males should: (1) be more likely and (2) faster to display vocal responses to calling rivals; (3) be more likely and (4) faster to approach calling rivals; and have higher fecal (5) glucocorticoid and (6) testosterone metabolite concentrations in response to rivals calling at intermediate and high formant spacing than to those with low formant spacing. We studied the behavioral responses of 11 adult males to playback experiments of long-distance calls from unknown individuals with low (i.e., emulating large individuals), intermediate, and high (i.e., small individuals) formant spacing (n = 36 experiments). We assayed fecal glucocorticoid and testosterone metabolite concentrations (n = 174). Playbacks always elicited vocal responses, but males responded quicker to intermediate than to low formant spacing playbacks. Low formant spacing calls were less likely to elicit approaches whereas high formant spacing calls resulted in quicker approaches. Males showed stronger hormonal responses to low than to both intermediate and high formant spacing calls. It is possible that males do not escalate conflicts with rivals with low formant spacing calls if these are perceived as large, and against whom winning probabilities should decrease and confrontation costs increase; but are willing to escalate conflicts with rivals of high formant spacing. Formant spacing may therefore be an important signal for rival assessment in this species.

RevDate: 2024-02-16

Merritt B, Bent T, Kilgore R, et al (2024)

Auditory free classification of gender diverse speakersa).

The Journal of the Acoustical Society of America, 155(2):1422-1436.

Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.

RevDate: 2024-02-15

Almurashi W, Al-Tamimi J, G Khattab (2024)

Dynamic specification of vowels in Hijazi Arabic.

Phonetica [Epub ahead of print].

Research on various languages shows that dynamic approaches to vowel acoustics - in particular Vowel-Inherent Spectral Change (VISC) - can play a vital role in characterising and classifying monophthongal vowels compared with a static model. This study's aim was to investigate whether dynamic cues also allow for better description and classification of the Hijazi Arabic (HA) vowel system, a phonological system based on both temporal and spectral distinctions. Along with static and dynamic F1 and F2 patterns, we evaluated the extent to which vowel duration, F0, and F3 contribute to increased/decreased discriminability among vowels. Data were collected from 20 native HA speakers (10 females and 10 males) producing eight HA monophthongal vowels in a word list with varied consonantal contexts. Results showed that dynamic cues provide further insights regarding HA vowels that are not normally gleaned from static measures alone. Using discriminant analysis, the dynamic cues (particularly the seven-point model) had relatively higher classification rates, and vowel duration was found to play a significant role as an additional cue. Our results are in line with dynamic approaches and highlight the importance of looking beyond static cues and beyond the first two formants for further insights into the description and classification of vowel systems.

RevDate: 2024-02-13

Simeone PJ, Green JR, Tager-Flusberg H, et al (2024)

Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children.

Autism research : official journal of the International Society for Autism Research [Epub ahead of print].

Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.

RevDate: 2024-02-11

Shadle CH, Fulop SA, Chen WR, et al (2024)

Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.

The Journal of the Acoustical Society of America, 155(2):1253-1263.

The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.

RevDate: 2024-02-06

Saghiri MA, Vakhnovetsky J, Amanabi M, et al (2024)

Exploring the impact of type II diabetes mellitus on voice quality.

European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery [Epub ahead of print].

PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants.

METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests.

RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP.

CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.

RevDate: 2024-02-01

Benway NR, Preston JL, Salekin A, et al (2024)

Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.

JASA express letters, 4(2):.

The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.

RevDate: 2024-01-23

Hou Y, Li Q, Wang Z, et al (2024)

Study on a Pig Vocalization Classification Method Based on Multi-Feature Fusion.

Sensors (Basel, Switzerland), 24(2): pii:s24020313.

To improve the classification of pig vocalization using vocal signals and improve recognition accuracy, a pig vocalization classification method based on multi-feature fusion is proposed in this study. With the typical vocalization of pigs in large-scale breeding houses as the research object, short-time energy, frequency centroid, formant frequency and first-order difference, and Mel frequency cepstral coefficient and first-order difference were extracted as the fusion features. These fusion features were improved using principal component analysis. A pig vocalization classification model with a BP neural network optimized based on the genetic algorithm was constructed. The results showed that using the improved features to recognize pig grunting, squealing, and coughing, the average recognition accuracy was 93.2%; the recognition precisions were 87.9%, 98.1%, and 92.7%, respectively, with an average of 92.9%; and the recognition recalls were 92.0%, 99.1%, and 87.4%, respectively, with an average of 92.8%, which indicated that the proposed pig vocalization classification method had good recognition precision and recall, and could provide a reference for pig vocalization information feedback and automatic recognition.

RevDate: 2024-01-22

Nagamine T (2024)

Formant dynamics in second language speech: Japanese speakers' production of English liquids.

The Journal of the Acoustical Society of America, 155(1):479-495.

This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2-F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2-F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately.

RevDate: 2024-01-16

Ghaemi H, Grillo R, Alizadeh O, et al (2023)

What Is the Effect of Maxillary Impaction Orthognathic Surgery on Voice Characteristics? A Quasi-Experimental Study.

World journal of plastic surgery, 12(3):44-56.

BACKGROUND: Regarding the impact of orthognathic surgery on the airway and voice, this study was carried out to investigate the effects of maxillary impaction surgery on patients' voices through acoustic analysis and articulation assessment.

METHODS: This quasi-experimental, before-and-after, double-blind study aimed at examining the effects of maxillary impaction surgery on the voice of orthognathic surgery patients. Before the surgery, a speech therapist conducted acoustic analysis, which included fundament frequency (F0), Jitter, Shimmer, and the harmonic-to-noise ratio (HNR), as well as first, second, and third formants (F1, F2, and F3). The patient's age, sex, degree of maxillary deformity, and impaction were documented in a checklist. Voice analysis was repeated during follow-up appointments at one and six months after the surgery in a blinded manner. The data were statistically analyzed using SPSS 23, and the significance level was set at 0.05.

RESULTS: Twenty two patients (18 females, 4 males) were examined, with ages ranging from 18 to 40 years and an average age of 25.54 years. F2, F3, HNR, and Shimmer demonstrated a significant increase over the investigation period compared to the initial phase of the study (P <0.001 for each). Conversely, the Jitter variable exhibited a significant decrease during the follow-up assessments in comparison to the initial phase of the study (P< 0.001).

CONCLUSION: Following maxillary impaction surgery, improvements in voice quality were observed compared to the preoperative condition. However, further studies with larger samples are needed to confirm the relevancy.

RevDate: 2024-01-12

Hedrick M, K Thornton (2024)

Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments.

JASA express letters, 4(1):.

Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.

RevDate: 2024-01-04

Sathe NC, Kain A, LAJ Reiss (2024)

Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa).

The Journal of the Acoustical Society of America, 155(1):68-77.

Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically-with ΔF0 varied-to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination.

RevDate: 2024-01-02

Wang L, Liu R, Wang Y, et al (2024)

Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.

Applied psychophysiology and biofeedback [Epub ahead of print].

Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.

RevDate: 2023-12-29

Anikin A, Barreda S, D Reby (2023)

A practical guide to calculating vocal tract length and scale-invariant formant patterns.

Behavior research methods [Epub ahead of print].

Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.

RevDate: 2023-12-23

Kraxberger F, Näger C, Laudato M, et al (2023)

On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations.

Bioengineering (Basel, Switzerland), 10(12): pii:bioengineering10121369.

Sound generation in human phonation and the underlying fluid-structure-acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds' coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds' vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates.

RevDate: 2023-12-12

Pah ND, Motin MA, Oliveira GC, et al (2023)

The Change of Vocal Tract Length in People with Parkinson's Disease.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference, 2023:1-4.

Hypokinetic dysarthria is one of the early symptoms of Parkinson's disease (PD) and has been proposed for early detection and also for monitoring of the progression of the disease. PD reduces the control of vocal tract muscles such as the tongue and lips and, therefore the length of the active vocal tract is altered. However, the change in the vocal tract length due to the disease has not been investigated. The aim of this study was to determine the difference in the apparent vocal tract length (AVTL) between people with PD and age-matched control healthy people. The phoneme, /a/ from the UCI Parkinson's Disease Classification Dataset and the Italian Parkinson's Voice and Speech Dataset were used and AVTL was calculated based on the first four formants of the sustained phoneme (F1-F4). The results show a correlation between Parkinson's disease and an increase in vocal tract length. The most sensitive feature was the AVTL calculated using the first formants of sustained phonemes (F1). The other significant finding reported in this article is that the difference is significant and only appeared in the male participants. However, the size of the database is not sufficiently large to identify the possible confounding factors such as the severity and duration of the disease, medication, age, and comorbidity factors.Clinical relevance-The outcomes of this research have the potential to improve the identification of early Parkinsonian dysarthria and monitor PD progression.

RevDate: 2023-12-07

Orekhova EV, Fadeev KA, Goiaeva DE, et al (2023)

Different hemispheric lateralization for periodicity and formant structure of vowels in the auditory cortex and its changes between childhood and adulthood.

Cortex; a journal devoted to the study of the nervous system and behavior, 171:287-307 pii:S0010-9452(23)00281-2 [Epub ahead of print].

The spectral formant structure and periodicity pitch are the major features that determine the identity of vowels and the characteristics of the speaker. However, very little is known about how the processing of these features in the auditory cortex changes during development. To address this question, we independently manipulated the periodicity and formant structure of vowels while measuring auditory cortex responses using magnetoencephalography (MEG) in children aged 7-12 years and adults. We analyzed the sustained negative shift of source current associated with these vowel properties, which was present in the auditory cortex in both age groups despite differences in the transient components of the auditory response. In adults, the sustained activation associated with formant structure was lateralized to the left hemisphere early in the auditory processing stream requiring neither attention nor semantic mapping. This lateralization was not yet established in children, in whom the right hemisphere contribution to formant processing was strong and decreased during or after puberty. In contrast to the formant structure, periodicity was associated with a greater response in the right hemisphere in both children and adults. These findings suggest that left-lateralization for the automatic processing of vowel formant structure emerges relatively late in ontogenesis and pose a serious challenge to current theories of hemispheric specialization for speech processing.

RevDate: 2023-12-07

Alain C, Göke K, Shen D, et al (2023)

Neural alpha oscillations index context-driven perception of ambiguous vowel sequences.

iScience, 26(12):108457.

Perception of bistable stimuli is influenced by prior context. In some cases, the interpretation matches with how the preceding stimulus was perceived; in others, it tends to be the opposite of the previous stimulus percept. We measured high-density electroencephalography (EEG) while participants were presented with a sequence of vowels that varied in formant transition, promoting the perception of one or two auditory streams followed by an ambiguous bistable sequence. For the bistable sequence, participants were more likely to report hearing the opposite percept of the one heard immediately before. This auditory contrast effect coincided with changes in alpha power localized in the left angular gyrus and left sensorimotor and right sensorimotor/supramarginal areas. The latter correlated with participants' perception. These results suggest that the contrast effect for a bistable sequence of vowels may be related to neural adaptation in posterior auditory areas, which influences participants' perceptual construal level of ambiguous stimuli.

RevDate: 2023-12-05

Shellikeri S, Cho S, Ash S, et al (2023)

Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders.

Amyotrophic lateral sclerosis & frontotemporal degeneration [Epub ahead of print].

OBJECTIVE: To evaluate automated digital speech measures, derived from spontaneous speech (picture descriptions), in assessing bulbar motor impairments in patients with ALS-FTD spectrum disorders (ALS-FTSD).

METHODS: Automated vowel algorithms were employed to extract two vowel acoustic measures: vowel space area (VSA), and mean second formant slope (F2 slope). Vowel measures were compared between ALS with and without clinical bulbar symptoms (ALS + bulbar (n = 49, ALSFRS-r bulbar subscore: x¯ = 9.8 (SD = 1.7)) vs. ALS-nonbulbar (n = 23), behavioral variant frontotemporal dementia (bvFTD, n = 25) without a motor syndrome, and healthy controls (HC, n = 32). Correlations with bulbar motor clinical scales, perceived listener effort, and MRI cortical thickness of the orobuccal primary motor cortex (oral PMC) were examined. We compared vowel measures to speaking rate, a conventional metric for assessing bulbar dysfunction.

RESULTS: ALS + bulbar had significantly reduced VSA and F2 slope than ALS-nonbulbar (|d|=0.94 and |d|=1.04, respectively), bvFTD (|d|=0.89 and |d|=1.47), and HC (|d|=0.73 and |d|=0.99). These reductions correlated with worse bulbar clinical scores (VSA: R = 0.33, p = 0.043; F2 slope: R = 0.38, p = 0.011), greater listener effort (VSA: R=-0.43, p = 0.041; F2 slope: p > 0.05), and cortical thinning in oral PMC (F2 slope: β = 0.0026, p = 0.017). Vowel measures demonstrated greater sensitivity and specificity for bulbar impairment than speaking rate, while showing independence from cognitive and respiratory impairments.

CONCLUSION: Automatic vowel measures are easily derived from a brief spontaneous speech sample, are sensitive to mild-moderate stage of bulbar disease in ALS-FTSD, and may present better sensitivity to bulbar impairment compared to traditional assessments such as speaking rate.

RevDate: 2023-11-30

Heeringa AN, Jüchter C, Beutelmann R, et al (2023)

Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss.

Frontiers in neuroscience, 17:1238941.

INTRODUCTION: Understanding speech in a noisy environment, as opposed to speech in quiet, becomes increasingly more difficult with increasing age. Using the quiet-aged gerbil, we studied the effects of aging on speech-in-noise processing. Specifically, behavioral vowel discrimination and the encoding of these vowels by single auditory-nerve fibers were compared, to elucidate some of the underlying mechanisms of age-related speech-in-noise perception deficits.

METHODS: Young-adult and quiet-aged Mongolian gerbils, of either sex, were trained to discriminate a deviant naturally-spoken vowel in a sequence of vowel standards against a speech-like background noise. In addition, we recorded responses from single auditory-nerve fibers of young-adult and quiet-aged gerbils while presenting the same speech stimuli.

RESULTS: Behavioral vowel discrimination was not significantly affected by aging. For both young-adult and quiet-aged gerbils, the behavioral discrimination between /eː/ and /iː/ was more difficult to make than /eː/ vs. /aː/ or /iː/ vs. /aː/, as evidenced by longer response times and lower d' values. In young-adults, spike timing-based vowel discrimination agreed with the behavioral vowel discrimination, while in quiet-aged gerbils it did not. Paradoxically, discrimination between vowels based on temporal responses was enhanced in aged gerbils for all vowel comparisons. Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the lower formant frequencies in fibers of quiet-aged gerbils, but no qualitative changes in vowel encoding. Elevated thresholds in combination with a fixed stimulus level, i.e., lower sensation levels of the stimuli for old individuals, can explain the enhanced temporal coding of the vowels in noise.

DISCUSSION: These results suggest that the altered auditory-nerve discrimination metrics in old gerbils may mask age-related deterioration in the central (auditory) system to the extent that behavioral vowel discrimination matches that of the young adults.

RevDate: 2023-11-29

Mohn JL, Baese-Berk MM, S Jaramillo (2023)

Selectivity to acoustic features of human speech in the auditory cortex of the mouse.

Hearing research, 441:108920 pii:S0378-5955(23)00232-0 [Epub ahead of print].

A better understanding of the neural mechanisms of speech processing can have a major impact in the development of strategies for language learning and in addressing disorders that affect speech comprehension. Technical limitations in research with human subjects hinder a comprehensive exploration of these processes, making animal models essential for advancing the characterization of how neural circuits make speech perception possible. Here, we investigated the mouse as a model organism for studying speech processing and explored whether distinct regions of the mouse auditory cortex are sensitive to specific acoustic features of speech. We found that mice can learn to categorize frequency-shifted human speech sounds based on differences in formant transitions (FT) and voice onset time (VOT). Moreover, neurons across various auditory cortical regions were selective to these speech features, with a higher proportion of speech-selective neurons in the dorso-posterior region. Last, many of these neurons displayed mixed-selectivity for both features, an attribute that was most common in dorsal regions of the auditory cortex. Our results demonstrate that the mouse serves as a valuable model for studying the detailed mechanisms of speech feature encoding and neural plasticity during speech-sound learning.

RevDate: 2023-11-27

Anikin A, Valente D, Pisanski K, et al (2023)

The role of loudness in vocal intimidation.

Journal of experimental psychology. General pii:2024-28586-001 [Epub ahead of print].

Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered "large" in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness-frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

RevDate: 2023-11-24

Barrientos E, E Cataldo (2023)

Estimating Formant Frequencies of Vowels Sung by Sopranos Using Weighted Linear Prediction.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00322-3 [Epub ahead of print].

This study introduces the weighted linear prediction adapted to high-pitched singing voices (WLP-HPSV) method for accurately estimating formant frequencies of vowels sung by lyric sopranos. The WLP-HPSV method employs a variant of the WLP analysis combined with the zero-frequency filtering (ZFF) technique to address specific challenges in formant estimation from singing signals. Evaluation of the WLP-HPSV method compared to the LPC method demonstrated its superior performance in accurately capturing the spectral characteristics of synthetic /u/ vowels and the /a/ and /u/ natural singing vowels. The QCP parameters used in the WLP-HPSV method varied with pitch, revealing insights into the interplay between the vocal tract and glottal characteristics during vowel production. The comparison between the LPC and WLP-HPSV methods highlighted the robustness of the WLP-HPSV method in accurately estimating formant frequencies across different pitches.

RevDate: 2023-11-22

Punamäki RL, Diab SY, Drosos K, et al (2023)

The role of acoustic features of maternal infant-directed singing in enhancing infant sensorimotor, language and socioemotional development.

Infant behavior & development, 74:101908 pii:S0163-6383(23)00100-5 [Epub ahead of print].

The quality of infant-directed speech (IDS) and infant-directed singing (IDSi) are considered vital to children, but empirical studies on protomusical qualities of the IDSi influencing infant development are rare. The current prospective study examines the role of IDSi acoustic features, such as pitch variability, shape and movement, and vocal amplitude vibration, timbre, and resonance, in associating with infant sensorimotor, language, and socioemotional development at six and 18 months. The sample consists of 236 Palestinian mothers from Gaza Strip singing to their six-month-olds a song by their own choice. Maternal IDSi was recorded and analyzed by the OpenSMILE- tool to depict main acoustic features of pitch frequencies, variations, and contours, vocal intensity, resonance formants, and power. The results are based on completed 219 maternal IDSi. Mothers reported about their infants' sensorimotor, language-vocalization, and socioemotional skills at six months, and psychologists tested these skills by Bayley Scales for Infant Development at 18 months. Results show that maternal IDSi characterized by wide pitch variability and rich and high vocal amplitude and vibration were associated with infants' optimal sensorimotor, language vocalization, and socioemotional skills at six months, and rich and high vocal amplitude and vibration predicted these optimal developmental skills also at 18 months. High resonance and rhythmicity formants were associated with optimal language and vocalization skills at six months. To conclude, the IDSi is considered important in enhancing newborn and risk infants' wellbeing, and the current findings argue that favorable acoustic singing qualities are crucial for optimal multidomain development across infancy.

RevDate: 2023-11-22

Levin M, Y Zaltz (2023)

Voice Discrimination in Quiet and in Background Noise by Simulated and Real Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Cochlear implant (CI) users demonstrate poor voice discrimination (VD) in quiet conditions based on the speaker's fundamental frequency (fo) and formant frequencies (i.e., vocal-tract length [VTL]). Our purpose was to examine the effect of background noise at levels that allow good speech recognition thresholds (SRTs) on VD via acoustic CI simulations and CI hearing.

METHOD: Forty-eight normal-hearing (NH) listeners who listened via noise-excited (n = 20) or sinewave (n = 28) vocoders and 10 prelingually deaf CI users (i.e., whose hearing loss began before language acquisition) participated in the study. First, the signal-to-noise ratio (SNR) that yields 70.7% correct SRT was assessed using an adaptive sentence-in-noise test. Next, the CI simulation listeners performed 12 adaptive VDs: six in quiet conditions, two with each cue (fo, VTL, fo + VTL), and six amid speech-shaped noise. The CI participants performed six VDs: one with each cue, in quiet and amid noise. SNR at VD testing was 5 dB higher than the individual's SRT in noise (SRTn +5 dB).

RESULTS: Results showed the following: (a) Better VD was achieved via the noise-excited than the sinewave vocoder, with the noise-excited vocoder better mimicking CI VD; (b) background noise had a limited negative effect on VD, only for the CI simulation listeners; and (c) there was a significant association between SNR at testing and VTL VD only for the CI simulation listeners.

CONCLUSIONS: For NH listeners who listen to CI simulations, noise that allows good SRT can nevertheless impede VD, probably because VD depends more on bottom-up sensory processing. Conversely, for prelingually deaf CI users, noise that allows good SRT hardly affects VD, suggesting that they rely strongly on bottom-up processing for both VD and speech recognition.

RevDate: 2023-11-22

Kapsner-Smith MR, Abur D, Eadie TL, et al (2023)

Test-Retest Reliability of Behavioral Assays of Feedforward and Feedback Auditory-Motor Control of Voice and Articulation.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Behavioral assays of feedforward and feedback auditory-motor control of voice and articulation frequently are used to make inferences about underlying neural mechanisms and to study speech development and disorders. However, no studies have examined the test-retest reliability of such measures, which is critical for rigorous study of auditory-motor control. Thus, the purpose of the present study was to assess the reliability of assays of feedforward and feedback control in voice versus articulation domains.

METHOD: Twenty-eight participants (14 cisgender women, 12 cisgender men, one transgender man, one transmasculine/nonbinary) who denied any history of speech, hearing, or neurological impairment were measured for responses to predictable versus unexpected auditory feedback perturbations of vocal (fundamental frequency, fo) and articulatory (first formant, F1) acoustic parameters twice, with 3-6 weeks between sessions. Reliability was measured with intraclass correlations.

RESULTS: Opposite patterns of reliability were observed for fo and F1; fo reflexive responses showed good reliability and fo adaptive responses showed poor reliability, whereas F1 reflexive responses showed poor reliability and F1 adaptive responses showed moderate reliability. However, a criterion-referenced categorical measurement of fo adaptive responses as typical versus atypical showed substantial test-retest agreement.

CONCLUSIONS: Individual responses to some behavioral assays of auditory-motor control of speech should be interpreted with caution, which has implications for several fields of research. Additional research is needed to establish reliable criterion-referenced measures of F1 adaptive responses as well as fo and F1 reflexive responses. Furthermore, the opposite patterns of test-retest reliability observed for voice versus articulation add to growing evidence for differences in underlying neural control mechanisms.

RevDate: 2023-11-21

Zhang W, M Clayards (2023)

Contribution of acoustic cues to prominence ratings for four Mandarin vowels.

The Journal of the Acoustical Society of America, 154(5):3364-3373.

The acoustic cues for prosodic prominence have been explored extensively, but one open question is to what extent they differ by context. This study investigates the extent to which vowel type affects how acoustic cues are related to prominence ratings provided in a corpus of spoken Mandarin. In the corpus, each syllable was rated as either prominent or non-prominent. We predicted prominence ratings using Bayesian mixed-effect regression models for each of four Mandarin vowels (/a, i, ɤ, u/), using fundamental frequency (F0), intensity, duration, the first and second formants, and tone type as predictors. We compared the role of each cue within and across the four models. We found that overall duration was the best predictor of prominence ratings and that formants were the weakest, but the role of each cue differed by vowel. We did not find credible evidence that F0 was relevant for /a/, or that intensity was relevant for /i/. We also found evidence that duration was more important for /ɤ/ than for /i/. The results suggest that vowel type credibly affects prominence ratings, which may reflect differences in the coordination of acoustic cues in prominence marking.

RevDate: 2023-11-17

Jasim M, Nayana VG, Nayaka H, et al (2023)

Effect of Adenotonsillectomy on Spectral and Acoustic Characteristics.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 75(4):3467-3475.

Acoustic analysis and perceptual analysis has been extensively used to assess the speech and voice among individual with voice disorders. These methods provide objective, quantitative and precise information on the speech and voice characteristics in any given disorder and help in monitoring any recovery, deterioration, or improvement in an individual's speech and also differentiate between normal and abnormal speech and voice characteristics. The present study was carried out to investigate the spectral characteristics (formant frequency parameters and formant centralization ratios) and voice characteristics (Acoustic parameters of voice) changes in individuals following adenotonsillectomy. A total of 34 participants participated in the study with a history of adenotonsillar hypertrophy. Spectral and acoustic voice parameters were analyzed across the three-time domains, before surgery (T0), 30 days (T1), and 90 days (T2) after surgery. Data was analyzed statistically using the SPSS software version-28.0.0.0. Descriptive statistics were used to find the mean and standard deviation. Repeated measures of ANOVA were used to compare the pre and post-experimental measures for spectral and acoustic, voice parameters. The derived parameter of acoustic vowel space (formant centralization ratio 3) was compared across three conditions timelines. The results revealed that acoustic vowel space measure and formant frequency measures were significantly increased in pre and post-operative conditions across the three timelines. A significant difference was obtained across the acoustic parameters across the time domains. Adenotonsillectomy has been proved to be an efficient surgical procedure in treating children with chronic adenotonsillitis. The results obtained have indicated an overall improvement in the spectral and acoustic voice parameters thereby highlighting the need for adenotonsillectomy at the right time and at the right age.

RevDate: 2023-11-16

Noffs G, Cobler-Lichter M, Perera T, et al (2023)

Plug-and-play microphones for recording speech and voice with smart devices.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000535152 [Epub ahead of print].

INTRODUCTION Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone set ups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug-and-play microphones has been proposed as a possible solution. Our aim was to assess if the addition of consumer-grade plug-and-play microphones increase the acoustic measurement agreement between ultra-portable devices and a reference microphone. METHODS Speech was simultaneously recorded by a reference high-quality microphone commonly used in research, and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone's response to speech and to silence were characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with Multiple Sclerosis, using metrics from each tested microphone was compared to that of the reference microphone. RESULTS The two consumer-grade plug-and-play microphones favoured high frequencies (mean centre of gravity difference ≥ +175.3Hz) and recorded more noise (mean difference in signal-to-noise ≤ -4.2dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker's mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06Hz), f0 instability (f0 CoV, agreement interval ≤0.05%) and for tracking of second formant movement (agreement interval ≤1.4Hz/millisecond). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone. CONCLUSION Measurement of f0 and F2 slope were robust to variation in recording equipment while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug-and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in this study.

RevDate: 2023-11-09

Ribas-Prats T, Cordero G, Lip-Sosa DL, et al (2023)

Developmental Trajectory of the Frequency-Following Response During the First 6 Months of Life.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The aim of the present study is to characterize the maturational changes during the first 6 months of life in the neural encoding of two speech sound features relevant for early language acquisition: the stimulus fundamental frequency (fo), related to stimulus pitch, and the vowel formant composition, particularly F1. The frequency-following response (FFR) was used as a snapshot into the neural encoding of these two stimulus attributes.

METHOD: FFRs to a consonant-vowel stimulus /da/ were retrieved from electroencephalographic recordings in a sample of 80 healthy infants (45 at birth and 35 at the age of 1 month). Thirty-two infants (16 recorded at birth and 16 recorded at 1 month) returned for a second recording at 6 months of age.

RESULTS: Stimulus fo and F1 encoding showed improvements from birth to 6 months of age. Most remarkably, a significant improvement in the F1 neural encoding was observed during the first month of life.

CONCLUSION: Our results highlight the rapid and sustained maturation of the basic neural machinery necessary for the phoneme discrimination ability during the first 6 months of age.

RevDate: 2023-11-09

Mračková M, Mareček R, Mekyska J, et al (2023)

Levodopa may modulate specific speech impairment in Parkinson's disease: an fMRI study.

Journal of neural transmission (Vienna, Austria : 1996) [Epub ahead of print].

Hypokinetic dysarthria (HD) is a difficult-to-treat symptom affecting quality of life in patients with Parkinson's disease (PD). Levodopa may partially alleviate some symptoms of HD in PD, but the neural correlates of these effects are not fully understood. The aim of our study was to identify neural mechanisms by which levodopa affects articulation and prosody in patients with PD. Altogether 20 PD patients participated in a task fMRI study (overt sentence reading). Using a single dose of levodopa after an overnight withdrawal of dopaminergic medication, levodopa-induced BOLD signal changes within the articulatory pathway (in regions of interest; ROIs) were studied. We also correlated levodopa-induced BOLD signal changes with the changes in acoustic parameters of speech. We observed no significant changes in acoustic parameters due to acute levodopa administration. After levodopa administration as compared to the OFF dopaminergic condition, patients showed task-induced BOLD signal decreases in the left ventral thalamus (p = 0.0033). The changes in thalamic activation were associated with changes in pitch variation (R = 0.67, p = 0.006), while the changes in caudate nucleus activation were related to changes in the second formant variability which evaluates precise articulation (R = 0.70, p = 0.003). The results are in line with the notion that levodopa does not have a major impact on HD in PD, but it may induce neural changes within the basal ganglia circuitries that are related to changes in speech prosody and articulation.

RevDate: 2023-11-08

Liu W, Wang Y, C Liang (2023)

Formant and Voice Source Characteristics of Vowels in Chinese National Singing and Bel Canto. A Pilot Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00323-5 [Epub ahead of print].

BACKGROUND: There have been numerous reports on the acoustic characteristics of singers' vowel articulation and phonation, and these studies cover many phonetic dimensions, such as fundamental frequency (F0), intensity, formant frequency, and voice quality.

METHOD: Taking the three representative vowels (/a/, /i/, /u/) in Chinese National Singing and Bel Canto as the research object, the present study investigates the differences and associations in vowel articulation and phonation between Chinese National Singing and Bel Canto using acoustic measures, for example, F0, formant frequency, long-term average spectrum (LTAS).

RESULTS: The relationship between F0 and formant indicates that F1 is proportional to F0, in which the female has a significant variation in vowel /a/. Compared with the male, the formant structure of the female singing voice differs significantly from that of the speech voice. Regarding the relationship between intensity and formant, LTAS shows that the Chinese National Singing tenor and Bel Canto baritone have the singer's formant cluster when singing vowels, while the two sopranos do not.

CONCLUSIONS: The systematic changes of formant frequencies with voice source are observed. (i) F1 of the female vowel /a/ has undergone a significant tuning change in the register transition, reflecting the characteristics of singing genres. (ii) Female singers utilize the intrinsic pitch of vowels when adopting the register transition strategy. This finding can be assumed to facilitate understanding the theory of intrinsic vowel pitch and revise Sundberg's hypothesis that F1 rises with F0. A non-linear relationship exists between F1 and F0, which adds to the non-linear interaction of the formant and vocal source. (iii) The singer's formant is affected by voice classification, gender, and singing genres.

RevDate: 2023-11-07

Keller PE, Lee J, König R, et al (2023)

Sex-related communicative functions of voice spectral energy in human chorusing.

Biology letters, 19(11):20230326.

Music is a human communicative art whose evolutionary origins may lie in capacities that support cooperation and/or competition. A mixed account favouring simultaneous cooperation and competition draws on analogous interactive displays produced by collectively signalling non-human animals (e.g. crickets and frogs). In these displays, rhythmically coordinated calls serve as a beacon whereby groups of males 'cooperatively' attract potential female mates, while the likelihood of each male competitively attracting an actual mate depends on the precedence of his signal. Human behaviour consistent with the mixed account was previously observed in a renowned boys choir, where the basses-the oldest boys with the deepest voices-boosted their acoustic prominence by increasing energy in a high-frequency band of the vocal spectrum when girls were in an otherwise male audience. The current study tested female and male sensitivity and preferences for this subtle vocal modulation in online listening tasks. Results indicate that while female and male listeners are similarly sensitive to enhanced high-spectral energy elicited by the presence of girls in the audience, only female listeners exhibit a reliable preference for it. Findings suggest that human chorusing is a flexible form of social communicative behaviour that allows simultaneous group cohesion and sexually motivated competition.

RevDate: 2023-11-04

Baker CP, Brockmann-Bauser M, Purdy SC, et al (2023)

High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00316-8 [Epub ahead of print].

OBJECTIVES: This in silico study explored the effects of a wide range of fundamental frequency (fo), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures.

METHOD: Using 53 synthesized tones produced in Madde, the effects of stepwise increases in fo, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any fo effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation.

RESULTS: Increasing fo resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored fo range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde fo condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05).

CONCLUSION: These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying fo, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.

RevDate: 2023-10-31

Song J, Kim M, J Park (2023)

Acoustic correlates of perceived personality from Korean utterances in a formal communicative setting.

PloS one, 18(10):e0293222 pii:PONE-D-23-04761.

The aim of the present study was to find acoustic correlates of perceived personality from the speech produced in a formal communicative setting-that of Korean customer service employees in particular. This work extended previous research on voice personality impressions to a different sociocultural and linguistic context in which speakers are expected to speak politely in a formal register. To use naturally produced speech rather than read speech, we devised a new method that successfully elicited spontaneous speech from speakers who were role-playing as customer service employees, while controlling for the words and sentence structures they used. We then examined a wide range of acoustic properties in the utterances, including voice quality and global acoustic and segmental properties using Principal Component Analysis. Subjects of the personality rating task listened to the utterances and rated perceived personality in terms of the Big-Five personality traits. While replicating some previous findings, we discovered several acoustic variables that exclusively accounted for the personality judgments of female speakers; a more modal voice quality increased perceived conscientiousness and neuroticism, and less dispersed formants reflecting a larger body size increased the perceived levels of extraversion and openness. These biases in personality perception likely reflect gender and occupation-related stereotypes that exist in South Korea. Our findings can also serve as a basis for developing and evaluating synthetic speech for Voice Assistant applications in future studies.

RevDate: 2023-10-31

Ealer C, Niemczak CE, Nicol T, et al (2023)

Auditory neural processing in children living with HIV uncovers underlying central nervous system dysfunction.

AIDS (London, England) pii:00002030-990000000-00380 [Epub ahead of print].

OBJECTIVE: Central nervous system (CNS) damage from HIV infection or treatment can lead to developmental delays and poor educational outcomes in children living with HIV (CLWH). Early markers of central nervous system dysfunction are needed to target interventions and prevent life-long disability. The Frequency Following Response (FFR) is an auditory electrophysiology test that can reflect the health of the central nervous system. In this study, we explore whether the FFR reveals auditory central nervous system dysfunction in CLWH.

STUDY DESIGN: Cross-sectional analysis of an ongoing cohort study. Data were from the child's first visit in the study.

SETTING: The infectious disease center in Dar es Salaam, Tanzania.

METHODS: We collected the FFR from 151 CLWH and 151 HIV-negative children. To evoke the FFR, three speech syllabi (/da/, /ba/, /ga/) were played monaurally to the child's right ear. Response measures included neural timing (peak latencies), strength of frequency encoding (fundamental frequency and first formant amplitude), encoding consistency (inter-response consistency), and encoding precision (stimulus-to-response correlation).

RESULTS: CLWH showed smaller first formant amplitudes (p < .0001), weaker inter-response consistencies (p < .0001) and smaller stimulus to response correlations (p < .0001) than FFRs from HIV-negative children. These findings generalized across the three speech stimuli with moderately strong effect sizes (partial η2 ranged from 0·061 to 0·094).

CONCLUSION: The FFR shows auditory central nervous system dysfunction in CLWH. Neural encoding of auditory stimuli was less robust, more variable, and less accurate. Since the FFR is a passive and objective test, it may offer an effective way to assess and detect central nervous system function in CLWH.

RevDate: 2023-10-30

Mutlu A, Celik S, MA Kilic (2023)

Effects of Personal Protective Equipment on Speech Acoustics.

Sisli Etfal Hastanesi tip bulteni, 57(3):434-439.

OBJECTIVES: The transmission of severe acute respiratory syndrome coronavirus-2 occurs primarily through droplets, which highlights the importance of protecting the oral, nasal, and conjunctival mucosas using personal protective equipment (PPE). The use of PPE can lead to communication difficulties between healthcare workers and patients. This study aimed to investigate changes in the acoustic parameters of speech sounds when different types of PPE are used.

METHODS: A cross-sectional study was conducted, enrolling 18 healthy male and female participants. They were instructed to produce a sustained [ɑː] vowel for at least 3 s to estimate voice quality. In addition, all Turkish vowels were produced for a minimum of 200 ms. Finally, three Turkish fricative consonants ([f], [s], and [ʃ]) were produced in a consonant/vowel/consonant format with different vowel contexts within a carrier sentence. Recordings were repeated under the following conditions: no PPE, surgical mask, N99 mask, face shield, surgical mask + face shield, and N99 mask + face shield. All recordings were subjected to analysis.

RESULTS: Frequency perturbation parameters did not show significant differences. However, in males, all vowels except [u] in the first formant (F1), except [ɔ] and [u] in the second formant (F2), except [ɛ] and [ɔ] in the third formant (F3), and only [i] in the fourth formant (F4) were significant. In females, all vowels except [i] in F1, except [u] in F2, all vowels in F3, and except [u] and [ɯ] in F4 were significant. Spectral moment values exhibited significance in both groups.

CONCLUSION: The use of different types of PPE resulted in variations in speech acoustic features. These findings may be attributed to the filtering effects of PPE on specific frequencies and the potential chamber effect in front of the face. Understanding the impact of PPE on speech acoustics contributes to addressing communication challenges in healthcare settings.

RevDate: 2023-10-25

Steffman J, W Zhang (2023)

Vowel perception under prominence: Examining the roles of F0, duration, and distributional information.

The Journal of the Acoustical Society of America, 154(4):2594-2608.

This study investigates how prosodic prominence mediates the perception of American English vowels, testing the effects of F0 and duration. In Experiment 1, the perception of four vowel continua varying in duration and formants (high: /i-ɪ/, /u-ʊ/, non-high: /ɛ-ae/, /ʌ-ɑ/), was examined under changes in F0-based prominence. Experiment 2 tested if cue usage varies as the distributional informativity of duration as a cue to prominence is manipulated. Both experiments show that duration is a consistent vowel-intrinsic cue. F0-based prominence affected perception of vowels via compensation for peripheralization of prominent vowels in the vowel space. Longer duration and F0-based prominence further enhanced the perception of formant cues. The distributional manipulation in Experiment 2 exerted a minimal impact. Findings suggest that vowel perception is mediated by prominence in a height-dependent manner which reflects patterns in the speech production literature. Further, duration simultaneously serves as an intrinsic cue and serves a prominence-related function in enhancing perception of formant cues.

RevDate: 2023-10-24

Wang H, Ali Y, L Max (2023)

Perceptual formant discrimination during speech movement planning.

bioRxiv : the preprint server for biology pii:2023.10.11.561423.

Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.

RevDate: 2023-10-18

Miller HE, Kearney E, Nieto-Castañón A, et al (2023)

Do Not Cut Off Your Tail: A Mega-Analysis of Responses to Auditory Perturbation Experiments.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The practice of removing "following" responses from speech perturbation analyses is increasingly common, despite no clear evidence as to whether these responses represent a unique response type. This study aimed to determine if the distribution of responses to auditory perturbation paradigms represents a bimodal distribution, consisting of two distinct response types, or a unimodal distribution.

METHOD: This mega-analysis pooled data from 22 previous studies to examine the distribution and magnitude of responses to auditory perturbations across four tasks: adaptive pitch, adaptive formant, reflexive pitch, and reflexive formant. Data included at least 150 unique participants for each task, with studies comprising younger adult, older adult, and Parkinson's disease populations. A Silverman's unimodality test followed by a smoothed bootstrap resampling technique was performed for each task to evaluate the number of modes in each distribution. Wilcoxon signed-ranks tests were also performed for each distribution to confirm significant compensation in response to the perturbation.

RESULTS: Modality analyses were not significant (p > .05) for any group or task, indicating unimodal distributions. Our analyses also confirmed compensatory reflexive responses to pitch and formant perturbations across all groups, as well as adaptive responses to sustained formant perturbations. However, analyses of sustained pitch perturbations only revealed evidence of adaptation in studies with younger adults.

CONCLUSION: The demonstration of a clear unimodal distribution across all tasks suggests that following responses do not represent a distinct response pattern, but rather the tail of a unimodal distribution.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24282676.

RevDate: 2023-10-16

Chu M, Wang J, Fan Z, et al (2023)

A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00274-6 [Epub ahead of print].

Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.

RevDate: 2023-10-14

Santos SS, Christmann MK, CA Cielo (2023)

Spectrographic Vocal Characteristics in Female Teachers: Finger Kazoo Intensive Short-term Vocal Therapy.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00270-9 [Epub ahead of print].

OBJECTIVE: Verify the results from intensive short-term vocal therapy using the Finger Kazoo technique about the spectrographic vocal measurements of teachers.

METHODS: Controlled and randomized trial. Spectrographic vocal assessment was performed by judges before and after intensive short-term vocal therapy with Finger Kazoo. Sample was composed of 41 female teachers. There were two study groups (with vocal nodules and without structural affection of the vocal folds) and the respective control groups. For the statistical analysis of the data, nonparametric tests were used (Mann-Whitney test and Wilcoxon test).

RESULTS: After intensive short-term vocal therapy with Finger Kazoo, improvement in voice spectral parameters, such as improvement in tracing (color intensity and regularity), greater definition of formants and harmonics, increased replacement of harmonics by noise, and a greater number of harmonics, mainly in the group without structural affection of the vocal folds.

CONCLUSION: There was an improvement in the spectrographic vocal parameters, showing greater stability, quality, and projection of the emission, especially in female teachers without structural affection of the vocal folds.

LOAD NEXT 100 CITATIONS

RJR Experience and Expertise

Researcher

Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.

Administrator

Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.

Technologist

Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.

Facilitator

Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226

206-300-3443

E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )