RJ-ROBBINS Formants: Modulators of Communication

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40258124,
year = {2025},
author = {Hong, Y and Chen, S and Jiang, H},
title = {Does Musical Experience Facilitate Phonetic Accommodation During Human-Robot Interaction?.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-16},
doi = {10.1044/2025_JSLHR-24-00495},
pmid = {40258124},
issn = {1558-9102},
abstract = {PURPOSE: This study investigated the effect of musical training on phonetic accommodation in a second language (L2) after interacting with a social robot, exploring the motivations and reasons behind their accommodation strategies.

METHOD: Fifteen L2 English speakers with long-term musical training experience (musician group) and 15 speakers without musical training experience (nonmusician group) were recruited to complete four conversational tasks with the social robot Furhat. Their production of a list of key words and carrier sentences was collected before and after conversations and used to quantify their phonetic accommodations. The spectral cues and prosodic cues of the production were extracted and analyzed.

RESULTS: Both groups showed similar convergence patterns but different divergence patterns. Specifically, the musician group showed divergence from the robot's production on more prosodic cues (mean fundamental frequency and duration) than the nonmusician group. Both groups converged their vowel formants toward the robot without group differences.

CONCLUSIONS: The findings reflect individuals' assessment of the robot's speech characteristics and their efforts to enhance communication efficiency, which might indicate a special speech register used for addressing the robot. The finding is more noticeable in the musician group compared to the nonmusician group. We proposed two possible explanations of the effect of musical training on phonetic accommodations: one involves the training of auditory attention and working memory and the other relates to the refinement of phonetic talent in L2 acquisition, contributing to theories on the relationship between music and language. This study also has implications for applying musical training to speech communication training in clinical populations and for designing social robots to better serve as speech therapy partners.},
}

RevDate: 2025-04-19
CmpDate: 2025-04-19

Filippa M, Tissot H, Mancinelli T, et al (2025)

Maternal and paternal infant directed speech is modulated by the child's age in in two and three person interactions.

Scientific reports, 15(1):13624.

Prosody in infant-directed speech (IDS) serves important functions for the infant's attention, regulation, and emotional expression. However, how the structural characteristics of this vocal signal are influenced by the presence or absence of one or two parents at different infant ages remains under-investigated. This study aimed to identify the acoustic characteristics of parental vocalizations in 69 families during specific phases of the Lausanne Trilogue Play (LTP) setting. Vocalizations were analyzed in both two-person contexts (mother-baby or father-baby interacting with the infant individually) and three-person contexts (mother-baby or father-baby interactions in the presence of the other parent) at three time points: when the infant was 3, 9, and 18 months old. Videos of interactions were coded, and the parental vocalizations were extracted. Five components of acoustic features related to the prosodic aspects of speech were extracted for subsequent analysis: intensity and its variability, pitch and pitch variability, formant amplitude, the intensity of specific speech frequency bands affecting sound timbre, and the rate of voiced and unvoiced segments per second. The study demonstrated a main effect of infant age on parental acoustic prosodic characteristics, along with significant interactions between infant age and interaction context (two- versus three-person) and between infant age and parental role (mother versus father). Across contexts and parental roles, intensity, pitch, and their variability consistently increased from 3 to 9 months. By 9 months, distinct prosodic patterns emerged, including a reduced syllable rate and formant amplitude, along with an increase in pauses. The mother's voice exhibited a steady increase in intensity, as well as in pitch and intensity variability. Interestingly, when comparing parents across the two contexts, IDS in the three-person context is characterized by a higher rate of syllables and fewer pauses, with the most pronounced changes observed at 9 months of age. The development of prosodic characteristics in IDS is not constant across age and it is influenced by the complex interactions between age phases, parental gender, and contextual factors, with a dynamic adaptation of the communication strategies in three-person contexts. The current study underscores the importance of taking a comprehensive perspective in analyzing infant-directed speech within an interactive context involving both fathers and mothers in two- and three-person settings.

Additional Links: PMID-40253572

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253572,
year = {2025},
author = {Filippa, M and Tissot, H and Mancinelli, T and Favez, N and Grandjean, D},
title = {Maternal and paternal infant directed speech is modulated by the child's age in in two and three person interactions.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {13624},
pmid = {40253572},
issn = {2045-2322},
mesh = {Humans ; Infant ; Female ; Male ; *Speech/physiology ; *Mother-Child Relations ; Adult ; Mothers ; Age Factors ; Fathers ; Father-Child Relations ; Speech Acoustics ; },
abstract = {Prosody in infant-directed speech (IDS) serves important functions for the infant's attention, regulation, and emotional expression. However, how the structural characteristics of this vocal signal are influenced by the presence or absence of one or two parents at different infant ages remains under-investigated. This study aimed to identify the acoustic characteristics of parental vocalizations in 69 families during specific phases of the Lausanne Trilogue Play (LTP) setting. Vocalizations were analyzed in both two-person contexts (mother-baby or father-baby interacting with the infant individually) and three-person contexts (mother-baby or father-baby interactions in the presence of the other parent) at three time points: when the infant was 3, 9, and 18 months old. Videos of interactions were coded, and the parental vocalizations were extracted. Five components of acoustic features related to the prosodic aspects of speech were extracted for subsequent analysis: intensity and its variability, pitch and pitch variability, formant amplitude, the intensity of specific speech frequency bands affecting sound timbre, and the rate of voiced and unvoiced segments per second. The study demonstrated a main effect of infant age on parental acoustic prosodic characteristics, along with significant interactions between infant age and interaction context (two- versus three-person) and between infant age and parental role (mother versus father). Across contexts and parental roles, intensity, pitch, and their variability consistently increased from 3 to 9 months. By 9 months, distinct prosodic patterns emerged, including a reduced syllable rate and formant amplitude, along with an increase in pauses. The mother's voice exhibited a steady increase in intensity, as well as in pitch and intensity variability. Interestingly, when comparing parents across the two contexts, IDS in the three-person context is characterized by a higher rate of syllables and fewer pauses, with the most pronounced changes observed at 9 months of age. The development of prosodic characteristics in IDS is not constant across age and it is influenced by the complex interactions between age phases, parental gender, and contextual factors, with a dynamic adaptation of the communication strategies in three-person contexts. The current study underscores the importance of taking a comprehensive perspective in analyzing infant-directed speech within an interactive context involving both fathers and mothers in two- and three-person settings.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Infant
Female
Male
*Speech/physiology
*Mother-Child Relations
Adult
Mothers
Age Factors
Fathers
Father-Child Relations
Speech Acoustics

RevDate: 2025-04-19

M A, I M, A M J, et al (2025)

A Pitch-Synchronous Study of Formants.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00127-4 [Epub ahead of print].

INTRODUCTION: Formants are of fundamental importance in voice science. To date, formants have typically been studied using pitch-asynchronous methods, such as linear-prediction analysis. The results are often incomplete (without level), not objective (with frequencies depending on the preset order p), and require many pitch periods of stationary signals. A method that is accurate, complete, reproducible, and widely applicable is needed.

METHOD: This study presents a pitch-synchronous method for measuring formants. From the waveform of each pitch period, formants are obtained with high reproducibility, including all formant parameters such as central frequency, level, and bandwidth.

RESULTS: The method was tested on 78 utterances of recorded sustained vowels with simultaneously acquired electroglottograph signals, segmented into 4730 individual pitch periods. For each waveform segment, Fourier analysis was applied to obtain an amplitude spectrum. Formants with three parameters were obtained from each amplitude spectrum. Using these formants, the voice waveforms were regenerated showing strong similarity to the original waveforms. The spectra can be averaged over many pitch periods to reduce noise and to estimate standard deviation.

CONCLUSIONS: Measuring formants from the waveform in each pitch period yields accurate, complete, and reproducible results. The method is applicable to live voices, including both speech and singing signals. The results can be used for voice research, speech and singing synthesis, and a quantitative study of phonetics.

Additional Links: PMID-40253259

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253259,
year = {2025},
author = {M, A and I, M and A M, J and I, H and J C, C},
title = {A Pitch-Synchronous Study of Formants.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.03.033},
pmid = {40253259},
issn = {1873-4588},
abstract = {INTRODUCTION: Formants are of fundamental importance in voice science. To date, formants have typically been studied using pitch-asynchronous methods, such as linear-prediction analysis. The results are often incomplete (without level), not objective (with frequencies depending on the preset order p), and require many pitch periods of stationary signals. A method that is accurate, complete, reproducible, and widely applicable is needed.

METHOD: This study presents a pitch-synchronous method for measuring formants. From the waveform of each pitch period, formants are obtained with high reproducibility, including all formant parameters such as central frequency, level, and bandwidth.

RESULTS: The method was tested on 78 utterances of recorded sustained vowels with simultaneously acquired electroglottograph signals, segmented into 4730 individual pitch periods. For each waveform segment, Fourier analysis was applied to obtain an amplitude spectrum. Formants with three parameters were obtained from each amplitude spectrum. Using these formants, the voice waveforms were regenerated showing strong similarity to the original waveforms. The spectra can be averaged over many pitch periods to reduce noise and to estimate standard deviation.

CONCLUSIONS: Measuring formants from the waveform in each pitch period yields accurate, complete, and reproducible results. The method is applicable to live voices, including both speech and singing signals. The results can be used for voice research, speech and singing synthesis, and a quantitative study of phonetics.},
}

RevDate: 2025-04-19

Chabib L, Yulianto , Ananda PWR, et al (2025)

Ethyl Cellulose-Based In-Situ Film of Itraconazole for Enhanced Treatment of Fungal Infections.

Annales pharmaceutiques francaises pii:S0003-4509(25)00072-0 [Epub ahead of print].

OBJECTIVES: Fungal infections represent a significant global health challenge, requiring effective treatments to prevent complications and improve patient outcomes. This study aimed to develop an in-situ film-forming system (IFFS) for transcutaneous delivery of itraconazole (ITZ) as an alternative to oral administration, addressing issues such as low bioavailability, reduced efficacy, and potential side effects.

MATERIALS AND METHODS: The IFFS was formulated using ethyl cellulose as the primary polymer, PEG 400 as a plasticizer, and a eutectic mixture of menthol and camphor as penetration enhancers. The system was characterized for viscosity, pH, drying time, water vapor permeability, bioadhesion, and physicochemical interactions (DSC and FTIR). Ex vivo skin permeation and retention studies were conducted using Franz diffusion cells, and antifungal efficacy was tested on an ex vivo Candida albicans infection model. Skin integrity and hemolysis tests were performed to evaluate safety.

RESULTS: The IFFS exhibited desirable physicochemical properties, with increased polymer concentrations enhancing skin retention and bioadhesive strength while reducing permeation rates. Ex vivo studies showed sustained ITZ release and enhanced skin retention. The antifungal activity test demonstrated complete eradication of Candida albicans within 48 hours. Safety assessments confirmed no skin irritation or toxicity.

CONCLUSION: The developed IFFS provides a safe and effective transcutaneous delivery system for ITZ. This innovative approach enhances antifungal efficacy, improves skin retention, and offers a promising alternative to oral administration, minimizing systemic side effects.

Additional Links: PMID-40253000

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253000,
year = {2025},
author = {Chabib, L and Yulianto, and Ananda, PWR and Utami, RN and Mir, M and Elim, D and Fitri, AMN and Zaman, HS and Aziz, AYR and Fauziah, N and Rahman, L and Pandoman Febrian, M and Permana, AD},
title = {Ethyl Cellulose-Based In-Situ Film of Itraconazole for Enhanced Treatment of Fungal Infections.},
journal = {Annales pharmaceutiques francaises},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.pharma.2025.04.002},
pmid = {40253000},
issn = {0003-4509},
abstract = {OBJECTIVES: Fungal infections represent a significant global health challenge, requiring effective treatments to prevent complications and improve patient outcomes. This study aimed to develop an in-situ film-forming system (IFFS) for transcutaneous delivery of itraconazole (ITZ) as an alternative to oral administration, addressing issues such as low bioavailability, reduced efficacy, and potential side effects.

MATERIALS AND METHODS: The IFFS was formulated using ethyl cellulose as the primary polymer, PEG 400 as a plasticizer, and a eutectic mixture of menthol and camphor as penetration enhancers. The system was characterized for viscosity, pH, drying time, water vapor permeability, bioadhesion, and physicochemical interactions (DSC and FTIR). Ex vivo skin permeation and retention studies were conducted using Franz diffusion cells, and antifungal efficacy was tested on an ex vivo Candida albicans infection model. Skin integrity and hemolysis tests were performed to evaluate safety.

RESULTS: The IFFS exhibited desirable physicochemical properties, with increased polymer concentrations enhancing skin retention and bioadhesive strength while reducing permeation rates. Ex vivo studies showed sustained ITZ release and enhanced skin retention. The antifungal activity test demonstrated complete eradication of Candida albicans within 48 hours. Safety assessments confirmed no skin irritation or toxicity.

CONCLUSION: The developed IFFS provides a safe and effective transcutaneous delivery system for ITZ. This innovative approach enhances antifungal efficacy, improves skin retention, and offers a promising alternative to oral administration, minimizing systemic side effects.},
}

RevDate: 2025-04-17
CmpDate: 2025-04-17

Behroozmand R, Khoshhal Mollasaraei Z, Nejati V, et al (2025)

Vocal and articulatory speech control deficits in individuals with post-stroke aphasia.

Scientific reports, 15(1):13350.

Individuals with post-stroke aphasia exhibit deficits in regulating vocal (i.e., laryngeal) pitch control during speech vowel production; however, it is not determined whether such deficits also exist when they control their supra-laryngeal speech articulators during word production. To address this question, 19 subjects with post-stroke aphasia and 20 controls were tested under an altered auditory feedback paradigm in which they received + 30% shifts in their vowel first-formant frequency during word production. In addition, 17 aphasia subjects and 19 controls from the same groups also completed steady vowel vocalizations while receiving randomized pitch shifts at ± 100 cents. Consistent with previous findings, our data showed that the magnitude of compensatory vocal responses to pitch-shifted vowel productions was significantly reduced in individuals with aphasia vs. controls. In addition, we also found that the magnitude of compensatory articulatory responses to formant-shifted vowels during word production was significantly diminished in the aphasia group compared with controls. However, no significant correlation was found between the vocal and articulatory compensatory responses to pitch and formant alterations. These findings suggest that vocal and articulatory motor speech control are regulated via independent mechanisms, and stroke-induced damage to left-hemispheric brain networks can selectively impair them in stroke survivors with aphasia.

Additional Links: PMID-40246982

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40246982,
year = {2025},
author = {Behroozmand, R and Khoshhal Mollasaraei, Z and Nejati, V and Daliri, A and Fridriksson, J},
title = {Vocal and articulatory speech control deficits in individuals with post-stroke aphasia.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {13350},
pmid = {40246982},
issn = {2045-2322},
support = {R01DC018523/NH/NIH HHS/United States ; R01DC019905/NH/NIH HHS/United States ; P50DC014664/NH/NIH HHS/United States ; },
mesh = {Humans ; *Stroke/complications/physiopathology ; *Aphasia/physiopathology/etiology ; Male ; Female ; Middle Aged ; Aged ; *Speech/physiology ; Adult ; Case-Control Studies ; *Voice/physiology ; Speech Acoustics ; },
abstract = {Individuals with post-stroke aphasia exhibit deficits in regulating vocal (i.e., laryngeal) pitch control during speech vowel production; however, it is not determined whether such deficits also exist when they control their supra-laryngeal speech articulators during word production. To address this question, 19 subjects with post-stroke aphasia and 20 controls were tested under an altered auditory feedback paradigm in which they received + 30% shifts in their vowel first-formant frequency during word production. In addition, 17 aphasia subjects and 19 controls from the same groups also completed steady vowel vocalizations while receiving randomized pitch shifts at ± 100 cents. Consistent with previous findings, our data showed that the magnitude of compensatory vocal responses to pitch-shifted vowel productions was significantly reduced in individuals with aphasia vs. controls. In addition, we also found that the magnitude of compensatory articulatory responses to formant-shifted vowels during word production was significantly diminished in the aphasia group compared with controls. However, no significant correlation was found between the vocal and articulatory compensatory responses to pitch and formant alterations. These findings suggest that vocal and articulatory motor speech control are regulated via independent mechanisms, and stroke-induced damage to left-hemispheric brain networks can selectively impair them in stroke survivors with aphasia.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Stroke/complications/physiopathology
*Aphasia/physiopathology/etiology
Male
Female
Middle Aged
Aged
*Speech/physiology
Adult
Case-Control Studies
*Voice/physiology
Speech Acoustics

RevDate: 2025-04-14

Kang MJ, Ryu JY, Lee JS, et al (2025)

Acoustic analysis of nasalance and formants in VPI patients: Implications for clinical practice and mobile application development.

Journal of cranio-maxillo-facial surgery : official publication of the European Association for Cranio-Maxillo-Facial Surgery pii:S1010-5182(25)00114-3 [Epub ahead of print].

Velopharyngeal insufficiency (VPI) often results in speech abnormalities, making accurate evaluation essential for understanding its relationship with structural anomalies. This retrospective study, spanning January 2019 to December 2022, investigates the role of formant analysis in speech evaluation and treatment. We analyzed speech data from 100 adults, 55 children, and 10 pediatric patients with VPI using Nasometer and PRAAT software, focusing on the sounds Pa, Pi, Pu, Pe, and Po. Nasalance scores and formants 1-4 were measured both pre- and post-VPI surgery and correlated with age, gender, and surgical outcomes. In both normal adults and children, the distributions of formants 1 and 2 for the vowels |a|, |e|, |i|, |o|, and |u| showed variations by age. Gender differences were significant in adults for the vowels |a|, |o|, and |u|, but not in children. VPI surgery significantly improved nasalance scores, and notable changes in formants 1 and 2 were observed post-surgery in VPI patients for the vowels |a|, |e|, and |i|. This study emphasizes the importance of formant analysis in speech therapy and introduces the potential for mobile app-based self-assessment. This approach reduces the reliance on specialized tools, such as nasometers, and provides a more accessible method for the speech management.

Additional Links: PMID-40229175

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40229175,
year = {2025},
author = {Kang, MJ and Ryu, JY and Lee, JS and Yang, JD and Chung, HY and Choi, KY},
title = {Acoustic analysis of nasalance and formants in VPI patients: Implications for clinical practice and mobile application development.},
journal = {Journal of cranio-maxillo-facial surgery : official publication of the European Association for Cranio-Maxillo-Facial Surgery},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jcms.2025.03.018},
pmid = {40229175},
issn = {1878-4119},
abstract = {Velopharyngeal insufficiency (VPI) often results in speech abnormalities, making accurate evaluation essential for understanding its relationship with structural anomalies. This retrospective study, spanning January 2019 to December 2022, investigates the role of formant analysis in speech evaluation and treatment. We analyzed speech data from 100 adults, 55 children, and 10 pediatric patients with VPI using Nasometer and PRAAT software, focusing on the sounds Pa, Pi, Pu, Pe, and Po. Nasalance scores and formants 1-4 were measured both pre- and post-VPI surgery and correlated with age, gender, and surgical outcomes. In both normal adults and children, the distributions of formants 1 and 2 for the vowels |a|, |e|, |i|, |o|, and |u| showed variations by age. Gender differences were significant in adults for the vowels |a|, |o|, and |u|, but not in children. VPI surgery significantly improved nasalance scores, and notable changes in formants 1 and 2 were observed post-surgery in VPI patients for the vowels |a|, |e|, and |i|. This study emphasizes the importance of formant analysis in speech therapy and introduces the potential for mobile app-based self-assessment. This approach reduces the reliance on specialized tools, such as nasometers, and provides a more accessible method for the speech management.},
}

RevDate: 2025-04-14
CmpDate: 2025-04-14

Su Z, Jiang H, Yang Y, et al (2025)

Acoustic Features for Identifying Suicide Risk in Crisis Hotline Callers: Machine Learning Approach.

Journal of medical Internet research, 27:e67772 pii:v27i1e67772.

BACKGROUND: Crisis hotlines serve as a crucial avenue for the early identification of suicide risk, which is of paramount importance for suicide prevention and intervention. However, assessing the risk of callers in the crisis hotline context is constrained by factors such as lack of nonverbal communication cues, anonymity, time limits, and single-occasion intervention. Therefore, it is necessary to develop approaches, including acoustic features, for identifying the suicide risk among hotline callers early and quickly. Given the complicated features of sound, adopting artificial intelligence models to analyze callers' acoustic features is promising.

OBJECTIVE: In this study, we investigated the feasibility of using acoustic features to predict suicide risk in crisis hotline callers. We also adopted a machine learning approach to analyze the complex acoustic features of hotline callers, with the aim of developing suicide risk prediction models.

METHODS: We collected 525 suicide-related calls from the records of a psychological assistance hotline in a province in northwest China. Callers were categorized as low or high risk based on suicidal ideation, suicidal plans, and history of suicide attempts, with risk assessments verified by a team of 18 clinical psychology raters. A total of 164 clearly categorized risk recordings were analyzed, including 102 low-risk and 62 high-risk calls. We extracted 273 audio segments, each exceeding 2 seconds in duration, which were labeled by raters as containing suicide-related expressions for subsequent model training and evaluation. Basic acoustic features (eg, Mel Frequency Cepstral Coefficients, formant frequencies, jitter, shimmer) and high-level statistical function (HSF) features (using OpenSMILE [Open-Source Speech and Music Interpretation by Large-Space Extraction] with the ComParE 2016 configuration) were extracted. Four supervised machine learning algorithms (logistic regression, support vector machine, random forest, and extreme gradient boosting) were trained and evaluated using grouped 5-fold cross-validation and a test set, with performance metrics, including accuracy, F1-score, recall, and false negative rate.

RESULTS: The development of machine learning models utilizing HSF acoustic features has been demonstrated to enhance recognition performance compared to models based solely on basic acoustic features. The random forest classifier, developed with HSFs, achieved the best performance in detecting the suicide risk among the models evaluated (accuracy=0.75, F1-score=0.70, recall=0.76, false negative rate=0.24).

CONCLUSIONS: The results of our study demonstrate the potential of developing artificial intelligence-based early warning systems using acoustic features for identifying the suicide risk among crisis hotline callers. Our work also has implications for employing acoustic features to identify suicide risk in salient voice contexts.

Additional Links: PMID-40228243

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40228243,
year = {2025},
author = {Su, Z and Jiang, H and Yang, Y and Hou, X and Su, Y and Yang, L},
title = {Acoustic Features for Identifying Suicide Risk in Crisis Hotline Callers: Machine Learning Approach.},
journal = {Journal of medical Internet research},
volume = {27},
number = {},
pages = {e67772},
doi = {10.2196/67772},
pmid = {40228243},
issn = {1438-8871},
mesh = {Humans ; *Machine Learning ; *Hotlines ; *Suicide ; *Acoustics ; Female ; Male ; Risk Assessment/methods ; China ; Adult ; *Suicide Prevention ; Suicidal Ideation ; },
abstract = {BACKGROUND: Crisis hotlines serve as a crucial avenue for the early identification of suicide risk, which is of paramount importance for suicide prevention and intervention. However, assessing the risk of callers in the crisis hotline context is constrained by factors such as lack of nonverbal communication cues, anonymity, time limits, and single-occasion intervention. Therefore, it is necessary to develop approaches, including acoustic features, for identifying the suicide risk among hotline callers early and quickly. Given the complicated features of sound, adopting artificial intelligence models to analyze callers' acoustic features is promising.

OBJECTIVE: In this study, we investigated the feasibility of using acoustic features to predict suicide risk in crisis hotline callers. We also adopted a machine learning approach to analyze the complex acoustic features of hotline callers, with the aim of developing suicide risk prediction models.

METHODS: We collected 525 suicide-related calls from the records of a psychological assistance hotline in a province in northwest China. Callers were categorized as low or high risk based on suicidal ideation, suicidal plans, and history of suicide attempts, with risk assessments verified by a team of 18 clinical psychology raters. A total of 164 clearly categorized risk recordings were analyzed, including 102 low-risk and 62 high-risk calls. We extracted 273 audio segments, each exceeding 2 seconds in duration, which were labeled by raters as containing suicide-related expressions for subsequent model training and evaluation. Basic acoustic features (eg, Mel Frequency Cepstral Coefficients, formant frequencies, jitter, shimmer) and high-level statistical function (HSF) features (using OpenSMILE [Open-Source Speech and Music Interpretation by Large-Space Extraction] with the ComParE 2016 configuration) were extracted. Four supervised machine learning algorithms (logistic regression, support vector machine, random forest, and extreme gradient boosting) were trained and evaluated using grouped 5-fold cross-validation and a test set, with performance metrics, including accuracy, F1-score, recall, and false negative rate.

RESULTS: The development of machine learning models utilizing HSF acoustic features has been demonstrated to enhance recognition performance compared to models based solely on basic acoustic features. The random forest classifier, developed with HSFs, achieved the best performance in detecting the suicide risk among the models evaluated (accuracy=0.75, F1-score=0.70, recall=0.76, false negative rate=0.24).

CONCLUSIONS: The results of our study demonstrate the potential of developing artificial intelligence-based early warning systems using acoustic features for identifying the suicide risk among crisis hotline callers. Our work also has implications for employing acoustic features to identify suicide risk in salient voice contexts.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Machine Learning
*Hotlines
*Suicide
*Acoustics
Female
Male
Risk Assessment/methods
China
Adult
*Suicide Prevention
Suicidal Ideation

RevDate: 2025-04-10

Zeng Y, Niziolek CA, B Parrell (2025)

Simultaneous acquisition of multiple auditory-motor transformations reveals suprasyllabic motor planning in speech production.

Journal of experimental psychology. General pii:2026-03037-001 [Epub ahead of print].

Motor planning forms a critical bridge between psycholinguistic and motoric models of word production. While syllables are often considered the core speech motor planning unit, growing evidence hints at suprasyllabic planning that may correspond to words, but firm experimental support is still lacking. We use differential adaptation to altered auditory feedback to provide novel, straightforward evidence for word-level planning. By introducing opposing perturbations to shared segmental content in near real time during speaking (e.g., raising the first vowel formant of "ped" in "pedigree" but lowering it in "pedicure," so speakers hear something akin to "padigree" and "pidicure"), we assess whether participants can use the larger word context to separately oppose the two perturbations (i.e., by producing "pidigree" and "padicure"). Critically, limb control research shows that such differential learning is possible only when the shared movement forms part of distinct motor plans, allowing a straightforward assay of the scope of planning in multisyllabic words. We found differential adaptation in multisyllabic words but of smaller size relative to monosyllabic words. Our results strongly suggest that speech relies on an interactive motor planning process encompassing both syllables and words. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

Additional Links: PMID-40208724

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40208724,
year = {2025},
author = {Zeng, Y and Niziolek, CA and Parrell, B},
title = {Simultaneous acquisition of multiple auditory-motor transformations reveals suprasyllabic motor planning in speech production.},
journal = {Journal of experimental psychology. General},
volume = {},
number = {},
pages = {},
doi = {10.1037/xge0001744},
pmid = {40208724},
issn = {1939-2222},
support = {//National Science Foundation; Division of Behavioral and Cognitive Sciences/ ; },
abstract = {Motor planning forms a critical bridge between psycholinguistic and motoric models of word production. While syllables are often considered the core speech motor planning unit, growing evidence hints at suprasyllabic planning that may correspond to words, but firm experimental support is still lacking. We use differential adaptation to altered auditory feedback to provide novel, straightforward evidence for word-level planning. By introducing opposing perturbations to shared segmental content in near real time during speaking (e.g., raising the first vowel formant of "ped" in "pedigree" but lowering it in "pedicure," so speakers hear something akin to "padigree" and "pidicure"), we assess whether participants can use the larger word context to separately oppose the two perturbations (i.e., by producing "pidigree" and "padicure"). Critically, limb control research shows that such differential learning is possible only when the shared movement forms part of distinct motor plans, allowing a straightforward assay of the scope of planning in multisyllabic words. We found differential adaptation in multisyllabic words but of smaller size relative to monosyllabic words. Our results strongly suggest that speech relies on an interactive motor planning process encompassing both syllables and words. (PsycInfo Database Record (c) 2025 APA, all rights reserved).},
}

RevDate: 2025-04-06
CmpDate: 2025-04-06

Fitch WT, Anikin A, Pisanski K, et al (2025)

Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises.

BMC biology, 23(1):92.

When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the "formant revolution" in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area.Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.

Additional Links: PMID-40189499

PubMed:

Google:

full text, via PubMed Central

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40189499,
year = {2025},
author = {Fitch, WT and Anikin, A and Pisanski, K and Valente, D and Reby, D},
title = {Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises.},
journal = {BMC biology},
volume = {23},
number = {1},
pages = {92},
pmid = {40189499},
issn = {1741-7007},
support = {W1262-B29//Austrian Science Fund/ ; 2023-00850//Vetenskapsrådet/ ; ANR-21-CE28-0007-01//French National Research Agency/ ; ANR-21-CE28-0007-01//French National Research Agency/ ; },
mesh = {Animals ; *Vocalization, Animal/physiology ; *Vertebrates/physiology ; Humans ; },
abstract = {When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the "formant revolution" in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area.Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
*Vocalization, Animal/physiology
*Vertebrates/physiology
Humans

RevDate: 2025-04-04

Song JY, Rojas C, A Pycha (2025)

Factors modulating perception and production of speech by AI tools: a test case of Amazon Alexa and Polly.

Frontiers in psychology, 16:1520111.

To develop AI tools that can communicate on par with human speakers and listeners, we need a deeper understanding of the factors that affect their perception and production of spoken language. Thus, the goal of this study was to examine to what extent two AI tools, Amazon Alexa and Polly, are impacted by factors that are known to modulate speech perception and production in humans. In particular, we examined the role of lexical (word frequency, phonological neighborhood density) and stylistic (speaking rate) factors. In the domain of perception, high-frequency words and slow speaking rate significantly improved Alexa's recognition of words produced in real time by native speakers of American English (n = 21). Alexa also recognized words with low neighborhood density with greater accuracy, but only at fast speaking rates. In contrast to human listeners, Alexa showed no evidence of adaptation to the speaker over time. In the domain of production, Polly's vowel duration and formants were unaffected by the lexical characteristics of words, unlike human speakers. Overall, these findings suggest that, despite certain patterns that humans and AI tools share, AI tools lack some of the flexibility that is the hallmark of human speech perception and production.

Additional Links: PMID-40181888

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40181888,
year = {2025},
author = {Song, JY and Rojas, C and Pycha, A},
title = {Factors modulating perception and production of speech by AI tools: a test case of Amazon Alexa and Polly.},
journal = {Frontiers in psychology},
volume = {16},
number = {},
pages = {1520111},
doi = {10.3389/fpsyg.2025.1520111},
pmid = {40181888},
issn = {1664-1078},
abstract = {To develop AI tools that can communicate on par with human speakers and listeners, we need a deeper understanding of the factors that affect their perception and production of spoken language. Thus, the goal of this study was to examine to what extent two AI tools, Amazon Alexa and Polly, are impacted by factors that are known to modulate speech perception and production in humans. In particular, we examined the role of lexical (word frequency, phonological neighborhood density) and stylistic (speaking rate) factors. In the domain of perception, high-frequency words and slow speaking rate significantly improved Alexa's recognition of words produced in real time by native speakers of American English (n = 21). Alexa also recognized words with low neighborhood density with greater accuracy, but only at fast speaking rates. In contrast to human listeners, Alexa showed no evidence of adaptation to the speaker over time. In the domain of production, Polly's vowel duration and formants were unaffected by the lexical characteristics of words, unlike human speakers. Overall, these findings suggest that, despite certain patterns that humans and AI tools share, AI tools lack some of the flexibility that is the hallmark of human speech perception and production.},
}

RevDate: 2025-04-03

Atilgan H, Walker KM, King AJ, et al (2025)

Auditory training alters the cortical representation of complex sounds.

The Journal of neuroscience : the official journal of the Society for Neuroscience pii:JNEUROSCI.0989-24.2025 [Epub ahead of print].

Auditory learning is supported by long-term changes in the neural processing of sound. We examined these task-depend changes in auditory cortex by mapping neural sensitivity to timbre, pitch and location cues in cues in trained (n = 5), and untrained control female ferrets (n = 5). Trained animals either identified vowels in a two-alternative forced choice task (n = 3) or discriminated when a repeating vowel changed in identity or pitch (n = 2). Neural responses were recorded under anesthesia in two primary auditory cortical fields and two tonotopically organized non-primary fields. In trained animals, the overall sensitivity to sound timbre was reduced across three cortical fields compared to control animals, but maintained in a non-primary field (the posterior pseudosylvian field). While training did not increase sensitivity to timbre across auditory cortex, it did change the way in which neurons integrated spectral information with neural responses in trained animals increasing their sensitivity to first and second formant frequencies, whereas in control animals' cortical sensitivity to spectral timbre depends mostly on the second formant. Animals trained on timbre identification were required to generalize across pitch when discriminating timbre and their neurons became less modulated by fundamental frequency relative to control animals. Finally, both trained groups showed increased spatial sensitivity and an enhanced response to sound source locations close to the midline, where the loudspeaker was located in the training chamber. These results demonstrate that training elicited widespread alterations in the cortical representation of complex sounds.Significance Statement Learning a task can elicit widespread changes in the brain. Here, we trained animals to discriminate sound timbre using synthetic vowel sounds. Somewhat surprisingly we observed that in 3 out of 4 of the brain regions studied, neural responses became less sensitive to timbre, while in the 4th area sensitivity was maintained. This suggests that training does not simply rewire more neurons to represent learned stimuli. Neurons also changed the way in which they processed stimuli becoming more sensitive to the formant cues that determine vowel identity and tuned preferentially for the region of space in which sounds were presented during training. Together, these results suggest that learning results in complex changes in how and whether neurons represent learned sounds.

Additional Links: PMID-40180572

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40180572,
year = {2025},
author = {Atilgan, H and Walker, KM and King, AJ and Schnupp, JW and Bizley, JK},
title = {Auditory training alters the cortical representation of complex sounds.},
journal = {The Journal of neuroscience : the official journal of the Society for Neuroscience},
volume = {},
number = {},
pages = {},
doi = {10.1523/JNEUROSCI.0989-24.2025},
pmid = {40180572},
issn = {1529-2401},
abstract = {Auditory learning is supported by long-term changes in the neural processing of sound. We examined these task-depend changes in auditory cortex by mapping neural sensitivity to timbre, pitch and location cues in cues in trained (n = 5), and untrained control female ferrets (n = 5). Trained animals either identified vowels in a two-alternative forced choice task (n = 3) or discriminated when a repeating vowel changed in identity or pitch (n = 2). Neural responses were recorded under anesthesia in two primary auditory cortical fields and two tonotopically organized non-primary fields. In trained animals, the overall sensitivity to sound timbre was reduced across three cortical fields compared to control animals, but maintained in a non-primary field (the posterior pseudosylvian field). While training did not increase sensitivity to timbre across auditory cortex, it did change the way in which neurons integrated spectral information with neural responses in trained animals increasing their sensitivity to first and second formant frequencies, whereas in control animals' cortical sensitivity to spectral timbre depends mostly on the second formant. Animals trained on timbre identification were required to generalize across pitch when discriminating timbre and their neurons became less modulated by fundamental frequency relative to control animals. Finally, both trained groups showed increased spatial sensitivity and an enhanced response to sound source locations close to the midline, where the loudspeaker was located in the training chamber. These results demonstrate that training elicited widespread alterations in the cortical representation of complex sounds.Significance Statement Learning a task can elicit widespread changes in the brain. Here, we trained animals to discriminate sound timbre using synthetic vowel sounds. Somewhat surprisingly we observed that in 3 out of 4 of the brain regions studied, neural responses became less sensitive to timbre, while in the 4th area sensitivity was maintained. This suggests that training does not simply rewire more neurons to represent learned stimuli. Neurons also changed the way in which they processed stimuli becoming more sensitive to the formant cues that determine vowel identity and tuned preferentially for the region of space in which sounds were presented during training. Together, these results suggest that learning results in complex changes in how and whether neurons represent learned sounds.},
}

RevDate: 2025-04-03

Almurashi W (2025)

Acoustic Evidence for the Tenseness and Laxity Distinction in Hijazi Arabic: A Pilot Study Using Static and Dynamic Analysis.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Standard Arabic has a simple three-vowel system with short and long distinctions, specifically /i iː a aː u uː/, traditionally believed to differ solely in duration. However, studies on regional Arabic dialects using a static approach (e.g., measuring formant values at the vowel's midpoint) have suggested that these vowels differ in both quality and quantity. This study aimed to investigate whether Hijazi Arabic (HA) exhibits a tense/lax distinction and, importantly, whether a dynamic analysis (particularly Vowel Inherent Spectral Change) could better capture this distinction, an area relatively underexplored in Arabic acoustic studies.

METHOD: Data were collected from 20 native HA speakers, who produced six HA vowels in various consonantal environments. The first two formant values and vowel duration were automatically extracted. Static formant values were measured at the vowel's midpoint, while dynamic spectral changes were measured at three points during the vowel's duration.

RESULTS: The findings revealed a significant distinction between short and long HA vowels, not only in duration but also in their acoustic properties. In the static model, short vowels were more centralized, while long vowels were more peripheral. In the dynamic model, the spectral changes of short vowels differed significantly from those of their long counterparts.

CONCLUSIONS: These results underscore the existence of a tense/lax distinction in HA, challenging the traditional view that the distinction is based solely on duration. They also highlight the value of dynamic vowel analysis for a comprehensive understanding of vowel behavior in phonological systems.

Additional Links: PMID-40178361

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40178361,
year = {2025},
author = {Almurashi, W},
title = {Acoustic Evidence for the Tenseness and Laxity Distinction in Hijazi Arabic: A Pilot Study Using Static and Dynamic Analysis.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-14},
doi = {10.1044/2025_JSLHR-24-00692},
pmid = {40178361},
issn = {1558-9102},
abstract = {PURPOSE: Standard Arabic has a simple three-vowel system with short and long distinctions, specifically /i iː a aː u uː/, traditionally believed to differ solely in duration. However, studies on regional Arabic dialects using a static approach (e.g., measuring formant values at the vowel's midpoint) have suggested that these vowels differ in both quality and quantity. This study aimed to investigate whether Hijazi Arabic (HA) exhibits a tense/lax distinction and, importantly, whether a dynamic analysis (particularly Vowel Inherent Spectral Change) could better capture this distinction, an area relatively underexplored in Arabic acoustic studies.

METHOD: Data were collected from 20 native HA speakers, who produced six HA vowels in various consonantal environments. The first two formant values and vowel duration were automatically extracted. Static formant values were measured at the vowel's midpoint, while dynamic spectral changes were measured at three points during the vowel's duration.

RESULTS: The findings revealed a significant distinction between short and long HA vowels, not only in duration but also in their acoustic properties. In the static model, short vowels were more centralized, while long vowels were more peripheral. In the dynamic model, the spectral changes of short vowels differed significantly from those of their long counterparts.

CONCLUSIONS: These results underscore the existence of a tense/lax distinction in HA, challenging the traditional view that the distinction is based solely on duration. They also highlight the value of dynamic vowel analysis for a comprehensive understanding of vowel behavior in phonological systems.},
}

RevDate: 2025-04-03
CmpDate: 2025-04-03

Anikin A, Reby D, K Pisanski (2025)

Nonlinear vocal phenomena and speech intelligibility.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 380(1923):20240254.

At some point in our evolutionary history, humans lost vocal membranes and air sacs, representing an unexpected simplification of the vocal apparatus relative to other great apes. One hypothesis is that these simplifications represent anatomical adaptations for speech because a simpler larynx provides a suitably stable and tonal vocal source with fewer nonlinear vocal phenomena (NLP). The key assumption that NLP reduce speech intelligibility is indirectly supported by studies of dysphonia, but it has not been experimentally tested. Here, we manipulate NLP in vocal stimuli ranging from single vowels to sentences, showing that the vocal source needs to be stable, but not necessarily tonal, for speech to be readily understood. When the task is to discriminate synthesized monophthong and diphthong vowels, continuous NLP (subharmonics, amplitude modulation and even deterministic chaos) actually improve vowel perception in high-pitched voices, likely because the resulting dense spectrum reveals formant transitions. Rough-sounding voices also remain highly intelligible when continuous NLP are added to recorded words and sentences. In contrast, voicing interruptions and pitch jumps dramatically reduce speech intelligibility, likely by interfering with voicing contrasts and normal intonation. We argue that NLP were not eliminated from the human vocal repertoire as we evolved for speech, but only brought under better control.This article is part of the theme issue 'Nonlinear phenomena in vertebrate vocalizations: mechanisms and communicative functions'.

Additional Links: PMID-40176514

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40176514,
year = {2025},
author = {Anikin, A and Reby, D and Pisanski, K},
title = {Nonlinear vocal phenomena and speech intelligibility.},
journal = {Philosophical transactions of the Royal Society of London. Series B, Biological sciences},
volume = {380},
number = {1923},
pages = {20240254},
doi = {10.1098/rstb.2024.0254},
pmid = {40176514},
issn = {1471-2970},
support = {//Vetenskapsrådet/ ; },
mesh = {Humans ; *Speech Intelligibility ; Male ; Female ; *Voice ; Adult ; *Speech Acoustics ; *Speech Perception ; Animals ; },
abstract = {At some point in our evolutionary history, humans lost vocal membranes and air sacs, representing an unexpected simplification of the vocal apparatus relative to other great apes. One hypothesis is that these simplifications represent anatomical adaptations for speech because a simpler larynx provides a suitably stable and tonal vocal source with fewer nonlinear vocal phenomena (NLP). The key assumption that NLP reduce speech intelligibility is indirectly supported by studies of dysphonia, but it has not been experimentally tested. Here, we manipulate NLP in vocal stimuli ranging from single vowels to sentences, showing that the vocal source needs to be stable, but not necessarily tonal, for speech to be readily understood. When the task is to discriminate synthesized monophthong and diphthong vowels, continuous NLP (subharmonics, amplitude modulation and even deterministic chaos) actually improve vowel perception in high-pitched voices, likely because the resulting dense spectrum reveals formant transitions. Rough-sounding voices also remain highly intelligible when continuous NLP are added to recorded words and sentences. In contrast, voicing interruptions and pitch jumps dramatically reduce speech intelligibility, likely by interfering with voicing contrasts and normal intonation. We argue that NLP were not eliminated from the human vocal repertoire as we evolved for speech, but only brought under better control.This article is part of the theme issue 'Nonlinear phenomena in vertebrate vocalizations: mechanisms and communicative functions'.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Intelligibility
Male
Female
*Voice
Adult
*Speech Acoustics
*Speech Perception
Animals

RevDate: 2025-03-31

Owino G, B Bernard Shibwabo (2025)

A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.

JMIR rehabilitation and assistive technologies [Epub ahead of print].

BACKGROUND: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

OBJECTIVE: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

METHODS: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.

RESULTS: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.

CONCLUSIONS: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.

Additional Links: PMID-40163619

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40163619,
year = {2025},
author = {Owino, G and Bernard Shibwabo, B},
title = {A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.},
journal = {JMIR rehabilitation and assistive technologies},
volume = {},
number = {},
pages = {},
doi = {10.2196/69457},
pmid = {40163619},
issn = {2369-2529},
abstract = {BACKGROUND: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

OBJECTIVE: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

METHODS: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.

RESULTS: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.

CONCLUSIONS: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.},
}

RevDate: 2025-03-30

Terband H, B Bhat (2025)

Intrinsic fundamental frequency of vowels in children with Childhood Apraxia of Speech (CAS).

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000545595 [Epub ahead of print].

Background Intrinsic pitch (IF0) is an inherent property of vowels where high vowels are produced with a higher fundamental frequency than low vowels. Although well studied in adults, it remains underexplored in children. IF0 reflects combined biomechanical effects as well as a deliberate effort from speakers to produce distinct vowels and enhance vowel contrasts. Vowel errors and inconsistency in vowel production is one of the well-known characteristics in Childhood Apraxia of Speech (CAS). We aimed to investigate if children with CAS exhibit IF0 and if present, how it compares with typically developing (TD) children. Method 17 CAS children and 8 TD children were asked to repeat simple bisyllabic non-word utterances of the type [dəCV] six times. The stimuli contained a consonant, C (/b, d/) and a vowel V, which comprised of the corner vowels of the Dutch vowel space (/a, i, u/). The target stimulus was produced in a carrier sentence (/he dəCV wɪːr/; 'hey the CV again'). Mean pitch (F0) and formant (F1 to F3) values were extracted from the recorded speech samples around vowel midpoint and Bark transformed prior to further analyses. Statistical analyses were carried out using linear mixed models for each outcome measure separately. Results The main finding of our study is that IF0 is present in children with CAS with a pattern generally similar to TD children. Additionally, we observed differences in vowel characteristics in children with CAS that are ambiguous, rather we observed vowel specific differences. Children with CAS produced the /a/ vowel with an exaggerated openness whereas they produced /u/ more fronted compared to TD children. Also, children with CAS produced their vowels generally with a higher pitch and a longer duration compared to TD children. Pitch and duration were only correlated (negatively) in the vowel /a/ in both groups. Conclusions Where intrinsic pitch appears to be preserved in children with CAS, they do show differences in articulatory dimensions of vowel production compared to TD that are vowel specific. Clinicians should take these vowel specific differences into account when choosing therapeutic targets.

Additional Links: PMID-40159307

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40159307,
year = {2025},
author = {Terband, H and Bhat, B},
title = {Intrinsic fundamental frequency of vowels in children with Childhood Apraxia of Speech (CAS).},
journal = {Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP)},
volume = {},
number = {},
pages = {1-18},
doi = {10.1159/000545595},
pmid = {40159307},
issn = {1421-9972},
abstract = {Background Intrinsic pitch (IF0) is an inherent property of vowels where high vowels are produced with a higher fundamental frequency than low vowels. Although well studied in adults, it remains underexplored in children. IF0 reflects combined biomechanical effects as well as a deliberate effort from speakers to produce distinct vowels and enhance vowel contrasts. Vowel errors and inconsistency in vowel production is one of the well-known characteristics in Childhood Apraxia of Speech (CAS). We aimed to investigate if children with CAS exhibit IF0 and if present, how it compares with typically developing (TD) children. Method 17 CAS children and 8 TD children were asked to repeat simple bisyllabic non-word utterances of the type [dəCV] six times. The stimuli contained a consonant, C (/b, d/) and a vowel V, which comprised of the corner vowels of the Dutch vowel space (/a, i, u/). The target stimulus was produced in a carrier sentence (/he dəCV wɪːr/; 'hey the CV again'). Mean pitch (F0) and formant (F1 to F3) values were extracted from the recorded speech samples around vowel midpoint and Bark transformed prior to further analyses. Statistical analyses were carried out using linear mixed models for each outcome measure separately. Results The main finding of our study is that IF0 is present in children with CAS with a pattern generally similar to TD children. Additionally, we observed differences in vowel characteristics in children with CAS that are ambiguous, rather we observed vowel specific differences. Children with CAS produced the /a/ vowel with an exaggerated openness whereas they produced /u/ more fronted compared to TD children. Also, children with CAS produced their vowels generally with a higher pitch and a longer duration compared to TD children. Pitch and duration were only correlated (negatively) in the vowel /a/ in both groups. Conclusions Where intrinsic pitch appears to be preserved in children with CAS, they do show differences in articulatory dimensions of vowel production compared to TD that are vowel specific. Clinicians should take these vowel specific differences into account when choosing therapeutic targets.},
}

RevDate: 2025-03-30

Eyisaraç Ş, Özel HE, Selçuk A, et al (2025)

Vocal Resonance Alterations Following Anterior Palatoplasty and Expansion Sphincter Pharyngoplasty.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00106-7 [Epub ahead of print].

AIM: This study aims to examine the effects of combined anterior palatoplasty (AP) and expansion sphincter pharyngoplasty (ESP) on vocal resonance and nasalization in patients with mild to moderate obstructive sleep apnea syndrome (OSAS), utilizing objective testing methods.

MATERIALS AND METHODS: A total of 28 patients with mild to moderate OSAS, determined by polysomnography, were included in the study. Preoperative assessments and postoperative evaluations at the 1st and 6th months were conducted, during which patients produced steady sustained phonation of the vowels /ɑ/, /ɛ/, /ɯ/, /i/, /ɔ/, /œ/, /u/, and /y/. Formant frequencies (F0, F1, F2, F3, and F4) were analyzed. Additionally, nasalization was evaluated using the vowel /ɑ/ in the syllable /ɟ ɑ ɟ/ and quantified by analyzing F0, F1, F2, F3, F4, and A1P0 values, where A1 represents the amplitude of the first formant harmonic peak and P0 represents the amplitude of the lowest nasal peak.

RESULTS: No statistically significant changes were observed in the fundamental frequency (F0) of any vowels before and after surgery. At 6 months postoperatively, significant decreases in F1 for /ɑ/ (P = 0.047) and F3 for /u/ (P = 0.017) were noted. Nasalization measurements at 6 months showed significant changes, including a decrease in F3 (P = 0.023), an increase in F4 (P = 0.025), and a decrease in A1P0 values for nasalized /ɑ/ (P = 0.013).

CONCLUSION: AP + ESP affect vocal resonance specifically in back vowels (/ɑ/, /u/), and leads to nasalization, consistent with the surgical focus on the velopharyngeal region, while preserving fundamental frequency across all vowels. These alterations might influence how individuals perceive their voice, possibly having particular relevance for professional voice users.

Additional Links: PMID-40158914

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40158914,
year = {2025},
author = {Eyisaraç, Ş and Özel, HE and Selçuk, A and Bayakır, F and Başer, S and Altıparmak, E and Genç, S and Özdoğan, F and Köroğlu, E},
title = {Vocal Resonance Alterations Following Anterior Palatoplasty and Expansion Sphincter Pharyngoplasty.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.03.010},
pmid = {40158914},
issn = {1873-4588},
abstract = {AIM: This study aims to examine the effects of combined anterior palatoplasty (AP) and expansion sphincter pharyngoplasty (ESP) on vocal resonance and nasalization in patients with mild to moderate obstructive sleep apnea syndrome (OSAS), utilizing objective testing methods.

MATERIALS AND METHODS: A total of 28 patients with mild to moderate OSAS, determined by polysomnography, were included in the study. Preoperative assessments and postoperative evaluations at the 1st and 6th months were conducted, during which patients produced steady sustained phonation of the vowels /ɑ/, /ɛ/, /ɯ/, /i/, /ɔ/, /œ/, /u/, and /y/. Formant frequencies (F0, F1, F2, F3, and F4) were analyzed. Additionally, nasalization was evaluated using the vowel /ɑ/ in the syllable /ɟ ɑ ɟ/ and quantified by analyzing F0, F1, F2, F3, F4, and A1P0 values, where A1 represents the amplitude of the first formant harmonic peak and P0 represents the amplitude of the lowest nasal peak.

RESULTS: No statistically significant changes were observed in the fundamental frequency (F0) of any vowels before and after surgery. At 6 months postoperatively, significant decreases in F1 for /ɑ/ (P = 0.047) and F3 for /u/ (P = 0.017) were noted. Nasalization measurements at 6 months showed significant changes, including a decrease in F3 (P = 0.023), an increase in F4 (P = 0.025), and a decrease in A1P0 values for nasalized /ɑ/ (P = 0.013).

CONCLUSION: AP + ESP affect vocal resonance specifically in back vowels (/ɑ/, /u/), and leads to nasalization, consistent with the surgical focus on the velopharyngeal region, while preserving fundamental frequency across all vowels. These alterations might influence how individuals perceive their voice, possibly having particular relevance for professional voice users.},
}

RevDate: 2025-03-28

Wang Q, Xu F, Wang X, et al (2025)

How Anxiety State Influences Speech Parameters: A Network Analysis Study from a Real Stressed Scenario.

Brain sciences, 15(3): pii:brainsci15030262.

Background/Objectives: Voice analysis has shown promise in anxiety assessment, yet traditional approaches examining isolated acoustic features yield inconsistent results. This study aimed to explore the relationship between anxiety states and vocal parameters from a network perspective in ecologically valid settings. Methods: A cross-sectional study was conducted with 316 undergraduate students (191 males, 125 females; mean age 20.3 ± 0.85 years) who completed a standardized picture description task while their speech was recorded. Participants were categorized into low-anxiety (n = 119) and high-anxiety (n = 197) groups based on self-reported anxiety ratings. Five acoustic parameters-jitter, fundamental frequency (F0), formant frequencies (F1/F2), intensity, and speech rate-were analyzed using network analysis. Results: Network analysis revealed a robust negative relationship between jitter and state anxiety, with jitter as the sole speech parameter consistently linked to state anxiety in the total group. Additionally, higher anxiety levels were associated with a coupling between intensity and F1/F2, whereas the low-anxiety network displayed a sparser organization without intensity and F1/F2 connection. Conclusions: Anxiety could be recognized by speech parameter networks in ecological settings. The distinct pattern with the negative jitter-anxiety relationship in the total network and the connection between intensity and F1/2 in high-anxiety states suggest potential speech markers for anxiety assessment. These findings suggest that state anxiety may directly influence jitter and fundamentally restructure the relationships among speech features, highlighting the importance of examining jitter and speech parameter interactions rather than isolated values in speech detection of anxiety.

Additional Links: PMID-40149785

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40149785,
year = {2025},
author = {Wang, Q and Xu, F and Wang, X and Wu, S and Ren, L and Liu, X},
title = {How Anxiety State Influences Speech Parameters: A Network Analysis Study from a Real Stressed Scenario.},
journal = {Brain sciences},
volume = {15},
number = {3},
pages = {},
doi = {10.3390/brainsci15030262},
pmid = {40149785},
issn = {2076-3425},
support = {2023RCJB04//Air Force Medical University/ ; },
abstract = {Background/Objectives: Voice analysis has shown promise in anxiety assessment, yet traditional approaches examining isolated acoustic features yield inconsistent results. This study aimed to explore the relationship between anxiety states and vocal parameters from a network perspective in ecologically valid settings. Methods: A cross-sectional study was conducted with 316 undergraduate students (191 males, 125 females; mean age 20.3 ± 0.85 years) who completed a standardized picture description task while their speech was recorded. Participants were categorized into low-anxiety (n = 119) and high-anxiety (n = 197) groups based on self-reported anxiety ratings. Five acoustic parameters-jitter, fundamental frequency (F0), formant frequencies (F1/F2), intensity, and speech rate-were analyzed using network analysis. Results: Network analysis revealed a robust negative relationship between jitter and state anxiety, with jitter as the sole speech parameter consistently linked to state anxiety in the total group. Additionally, higher anxiety levels were associated with a coupling between intensity and F1/F2, whereas the low-anxiety network displayed a sparser organization without intensity and F1/F2 connection. Conclusions: Anxiety could be recognized by speech parameter networks in ecological settings. The distinct pattern with the negative jitter-anxiety relationship in the total network and the connection between intensity and F1/2 in high-anxiety states suggest potential speech markers for anxiety assessment. These findings suggest that state anxiety may directly influence jitter and fundamentally restructure the relationships among speech features, highlighting the importance of examining jitter and speech parameter interactions rather than isolated values in speech detection of anxiety.},
}

RevDate: 2025-03-27
CmpDate: 2025-03-27

Stepanović M, Hardmeier C, O Scharenborg (2025)

Formant-based vowel categorization for cross-lingual phone recognition.

The Journal of the Acoustical Society of America, 157(3):2248-2262.

Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.

Additional Links: PMID-40145791

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40145791,
year = {2025},
author = {Stepanović, M and Hardmeier, C and Scharenborg, O},
title = {Formant-based vowel categorization for cross-lingual phone recognition.},
journal = {The Journal of the Acoustical Society of America},
volume = {157},
number = {3},
pages = {2248-2262},
doi = {10.1121/10.0036222},
pmid = {40145791},
issn = {1520-8524},
mesh = {Humans ; *Phonetics ; *Multilingualism ; *Speech Perception ; *Speech Acoustics ; Female ; Male ; Adult ; Language ; Young Adult ; Recognition, Psychology ; },
abstract = {Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Phonetics
*Multilingualism
*Speech Perception
*Speech Acoustics
Female
Male
Adult
Language
Young Adult
Recognition, Psychology

RevDate: 2025-03-25
CmpDate: 2025-03-25

Chen F, Pan C, Hu H, et al (2025)

Understanding the Lombard Effect for Mandarin: Relation Between Speech Recognition Thresholds and Acoustic Parameters.

Trends in hearing, 29:23312165251324266.

The present work quantifies the Lombard effect across native speakers of Mandarin Chinese using the Matrix sentence test, which is optimized for precisely assessing speech recognition thresholds (SRTs) in noise. Specifically, we studied the effects of speaker gender, fundamental frequency (F0), formant frequencies (F1 and F2), the duration and rate of voiced segments, and frequency-specific energy redistribution characterized by alpha ratio and speech-weighted signal-to-noise ratio (swSNR) on the recognition of Mandarin in plain and Lombard speech. The Mandarin Chinese matrix test was recorded with plain and Lombard speech from 11 native-Mandarin speakers. SRTs in stationary noise were measured with native-Mandarin, normal-hearing listeners. Results showed that on average, Mandarin Lombard speech was more intelligible than Mandarin plain speech for both female and male speakers, and the Mandarin Lombard gain of female speakers was larger than that of males. In addition, various acoustic analyses involving all speakers showed that (a) only swSNR was significantly correlated with the SRT of the Mandarin plain speech; (b) most acoustic measures were significantly correlated with the SRT of the Mandarin Lombard speech; and (c) alpha ratio and swSNR were significantly correlated with the SRT Lombard gain. In addition, a gender effect was found in the correlational analysis between acoustic parameters and SRT as well as Lombard gain in SRT. The findings highlight the impact of increased high-frequency energy on the observed Lombard gain in Mandarin speech, whereas the changes in individual acoustic parameters (e.g., F0 and F1) appear to play only a minor role.

Additional Links: PMID-40129406

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40129406,
year = {2025},
author = {Chen, F and Pan, C and Hu, H and Hochmuth, S and Kollmeier, B and Warzybok, A},
title = {Understanding the Lombard Effect for Mandarin: Relation Between Speech Recognition Thresholds and Acoustic Parameters.},
journal = {Trends in hearing},
volume = {29},
number = {},
pages = {23312165251324266},
doi = {10.1177/23312165251324266},
pmid = {40129406},
issn = {2331-2165},
mesh = {Humans ; Female ; Male ; *Speech Perception/physiology ; *Speech Acoustics ; Young Adult ; Adult ; *Acoustic Stimulation ; *Noise/adverse effects ; Speech Reception Threshold Test ; Auditory Threshold/physiology ; Sex Factors ; Speech Intelligibility ; Recognition, Psychology ; Perceptual Masking/physiology ; Voice Quality ; Language ; },
abstract = {The present work quantifies the Lombard effect across native speakers of Mandarin Chinese using the Matrix sentence test, which is optimized for precisely assessing speech recognition thresholds (SRTs) in noise. Specifically, we studied the effects of speaker gender, fundamental frequency (F0), formant frequencies (F1 and F2), the duration and rate of voiced segments, and frequency-specific energy redistribution characterized by alpha ratio and speech-weighted signal-to-noise ratio (swSNR) on the recognition of Mandarin in plain and Lombard speech. The Mandarin Chinese matrix test was recorded with plain and Lombard speech from 11 native-Mandarin speakers. SRTs in stationary noise were measured with native-Mandarin, normal-hearing listeners. Results showed that on average, Mandarin Lombard speech was more intelligible than Mandarin plain speech for both female and male speakers, and the Mandarin Lombard gain of female speakers was larger than that of males. In addition, various acoustic analyses involving all speakers showed that (a) only swSNR was significantly correlated with the SRT of the Mandarin plain speech; (b) most acoustic measures were significantly correlated with the SRT of the Mandarin Lombard speech; and (c) alpha ratio and swSNR were significantly correlated with the SRT Lombard gain. In addition, a gender effect was found in the correlational analysis between acoustic parameters and SRT as well as Lombard gain in SRT. The findings highlight the impact of increased high-frequency energy on the observed Lombard gain in Mandarin speech, whereas the changes in individual acoustic parameters (e.g., F0 and F1) appear to play only a minor role.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Female
Male
*Speech Perception/physiology
*Speech Acoustics
Young Adult
Adult
*Acoustic Stimulation
*Noise/adverse effects
Speech Reception Threshold Test
Auditory Threshold/physiology
Sex Factors
Speech Intelligibility
Recognition, Psychology
Perceptual Masking/physiology
Voice Quality
Language

RevDate: 2025-03-20

Mou Z, Peng K, Ye W, et al (2025)

Acoustic Properties of Vowel Production in Mandarin-Speaking Patients With Parkinson Disease-Related Hypokinetic Dysarthria.

The Journal of craniofacial surgery [Epub ahead of print].

OBJECTIVE: The objective of the present study is to identify acoustic parameters for speech evaluation in patients who speak Mandarin, with Parkinson disease-related hypokinetic dysarthria (PDHD).

METHODS: The authors' sample included 31 patients with PDHD and 38 neurologically normal adults in a similar age range. The authors recorded each participant articulating a list of Mandarin monosyllables that included 6 monophthong vowels (i.e., /a, i, u, ɤ, y, o/). The authors identified the vowel duration (V-dur) and formants (F1 and F2) of each vowel token. On the basis of the formants, the authors calculated and analyzed the acoustic indexes of vowel space area (VSA), vowel articulation index (VAI), and formant centralization ratio (FCR) of the vowels.

RESULTS: Compared with healthy speakers, patients with PDHD had a significantly longer vowel duration for all 6 vowels (P < 0.01). The differences in VSA, VAI, and FCR between the case and normal groups were all statistically significant.

CONCLUSIONS: Differences in vowel acoustic indexes (V-dur, VSA, VAI, and FCR) between the 2 groups revealed that these 4 indexes were sensitive to the variation in vowel production in patients with PDHD. These indexes can be used to evaluate speech intelligibility caused by impaired vowel pronunciation in patients with PDHD and the outcome of rehabilitation therapy.

Additional Links: PMID-40111024

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40111024,
year = {2025},
author = {Mou, Z and Peng, K and Ye, W and Xu, J and Chen, Y and Tong, M and Lu, J},
title = {Acoustic Properties of Vowel Production in Mandarin-Speaking Patients With Parkinson Disease-Related Hypokinetic Dysarthria.},
journal = {The Journal of craniofacial surgery},
volume = {},
number = {},
pages = {},
pmid = {40111024},
issn = {1536-3732},
support = {2024B03J1341//Science and Technology Projects in Guangzhou/ ; A2023353//Guangdong Medical Science and Technology Research Foundation of China/ ; 2022A0505040007//Special Project of Guangdong Province for technology innovation strategy/ ; 202201020046//Science and Technology Projects in Guangzhou/ ; 2021A1515220049//Government-enterprise Joint Programs of Natural Science Foundation of Guangdong Province/ ; 20202042//Administration of Traditional Chinese Medicine of Guangdong Province/ ; },
abstract = {OBJECTIVE: The objective of the present study is to identify acoustic parameters for speech evaluation in patients who speak Mandarin, with Parkinson disease-related hypokinetic dysarthria (PDHD).

METHODS: The authors' sample included 31 patients with PDHD and 38 neurologically normal adults in a similar age range. The authors recorded each participant articulating a list of Mandarin monosyllables that included 6 monophthong vowels (i.e., /a, i, u, ɤ, y, o/). The authors identified the vowel duration (V-dur) and formants (F1 and F2) of each vowel token. On the basis of the formants, the authors calculated and analyzed the acoustic indexes of vowel space area (VSA), vowel articulation index (VAI), and formant centralization ratio (FCR) of the vowels.

RESULTS: Compared with healthy speakers, patients with PDHD had a significantly longer vowel duration for all 6 vowels (P < 0.01). The differences in VSA, VAI, and FCR between the case and normal groups were all statistically significant.

CONCLUSIONS: Differences in vowel acoustic indexes (V-dur, VSA, VAI, and FCR) between the 2 groups revealed that these 4 indexes were sensitive to the variation in vowel production in patients with PDHD. These indexes can be used to evaluate speech intelligibility caused by impaired vowel pronunciation in patients with PDHD and the outcome of rehabilitation therapy.},
}

RevDate: 2025-03-19

Celenk C, Ulkumen B, O Celik (2025)

The Effect of Concomitant Septoplasty and Turbinate Surgery on Nasality-Related Voice Parameters.

Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery [Epub ahead of print].

INTRODUCTION: Our study aimed to reveal whether septoplasty and inferior turbinate reduction significantly impact the acoustic properties of nasalized syllables and alter subjective and objective voice parameters.

MATERIALS AND METHODS: Forty patients with nasal septal deviation and bilateral grade 2 ≤ inferior turbinate hypertrophy who underwent septoplasty and bilateral inferior turbinoplasty were enrolled. Participants completed the VHI-10, VAS, and NOSE scales preoperatively and at 6 months postoperatively. Changes in VAS and NOSE scores were calculated as VAS[change] and NOSE[change] values. Voice recordings of the sustained vowel /a/ and the word /mini/ were analysed using MDVP. Acoustic analysis was performed with the sustained vowel /a/, and spectrographic analysis was conducted with the consonants /m/, /n/, and the vowel /i/ in /mini/. Recordings were taken preoperatively and at 6 months postoperatively. Statistical analysis compared pre- and postoperative values for significant changes using SPSS Version 21.0 (IBM Corp.; Armonk, NY, USA).

RESULTS: A statistically significant decrease in VAS and NOSE scores was observed at 6 months postoperatively (p < 0.05). No significant difference was found in VHI-10 scores (p > 0.05). Acoustic analysis showed a significant change in pre- and postoperative F0 values (p < 0.05), but not in jitter, jitter%, shimmer, shimmer%, and NHR (p > 0.05). Spectrographic analysis revealed significant postoperative changes in the F3 and F4 formants of consonants /m/, /n/, and vowel /i/ in the word /mini/. A significant correlation was found between postoperative changes in F3 and F4 formant values for consonants /m/ and /n/ with the VAS[change] value. For the NOSE[change] value, a significant correlation was found only with the change in the F3 formant value for the consonant /m/.

CONCLUSION: Nasal surgeries, particularly septo-turbinoplasty, can influence voice timbre by modifying F3 and F4, which is of notable concern for professional voice users, such as singers and actors, due to the potential impact on the singer's formant cluster and overall vocal quality. Although it may not be appropriate to generalise for all rhinological surgeries, the significant changes in the F3 and F4 formants in a specific and refined patient group suggest that caution should be exercised in such surgeries, especially for professional voice users.

Additional Links: PMID-40103316

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40103316,
year = {2025},
author = {Celenk, C and Ulkumen, B and Celik, O},
title = {The Effect of Concomitant Septoplasty and Turbinate Surgery on Nasality-Related Voice Parameters.},
journal = {Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery},
volume = {},
number = {},
pages = {},
doi = {10.1111/coa.14304},
pmid = {40103316},
issn = {1749-4486},
abstract = {INTRODUCTION: Our study aimed to reveal whether septoplasty and inferior turbinate reduction significantly impact the acoustic properties of nasalized syllables and alter subjective and objective voice parameters.

MATERIALS AND METHODS: Forty patients with nasal septal deviation and bilateral grade 2 ≤ inferior turbinate hypertrophy who underwent septoplasty and bilateral inferior turbinoplasty were enrolled. Participants completed the VHI-10, VAS, and NOSE scales preoperatively and at 6 months postoperatively. Changes in VAS and NOSE scores were calculated as VAS[change] and NOSE[change] values. Voice recordings of the sustained vowel /a/ and the word /mini/ were analysed using MDVP. Acoustic analysis was performed with the sustained vowel /a/, and spectrographic analysis was conducted with the consonants /m/, /n/, and the vowel /i/ in /mini/. Recordings were taken preoperatively and at 6 months postoperatively. Statistical analysis compared pre- and postoperative values for significant changes using SPSS Version 21.0 (IBM Corp.; Armonk, NY, USA).

RESULTS: A statistically significant decrease in VAS and NOSE scores was observed at 6 months postoperatively (p < 0.05). No significant difference was found in VHI-10 scores (p > 0.05). Acoustic analysis showed a significant change in pre- and postoperative F0 values (p < 0.05), but not in jitter, jitter%, shimmer, shimmer%, and NHR (p > 0.05). Spectrographic analysis revealed significant postoperative changes in the F3 and F4 formants of consonants /m/, /n/, and vowel /i/ in the word /mini/. A significant correlation was found between postoperative changes in F3 and F4 formant values for consonants /m/ and /n/ with the VAS[change] value. For the NOSE[change] value, a significant correlation was found only with the change in the F3 formant value for the consonant /m/.

CONCLUSION: Nasal surgeries, particularly septo-turbinoplasty, can influence voice timbre by modifying F3 and F4, which is of notable concern for professional voice users, such as singers and actors, due to the potential impact on the singer's formant cluster and overall vocal quality. Although it may not be appropriate to generalise for all rhinological surgeries, the significant changes in the F3 and F4 formants in a specific and refined patient group suggest that caution should be exercised in such surgeries, especially for professional voice users.},
}

RevDate: 2025-03-17

Benz KR, Hauswald A, N Weisz (2025)

Influence of visual analogue of speech envelope, formants, and word onsets on word recognition is not pronounced.

Hearing research, 460:109237 pii:S0378-5955(25)00056-5 [Epub ahead of print].

In noisy environments, filtering out the relevant speech signal from the background noise is a major challenge. Visual cues, such as lip movements, can improve speech understanding. This suggests that lip movements carry information about speech features (e.g. speech envelope, formants, word onsets) that can be used to aid speech understanding. Moreover, the isolated visual or tactile presentation of the speech envelope can also aid word recognition. However, the evidence in this area is rather mixed, and formants and word onsets have not been studied in this context. This online study investigates the effect of different visually presented speech features (speech envelope, formants, word onsets) during a two-talker audio on word recognition. The speech features were presented as a circle whose size was modulated over time based on the dynamics of three speech features. The circle was either modulated according to the speech features of the target speaker, the distractor speaker or an unrelated control sentence. After each sentence, the participants` word recognition was tested by writing down what they heard. We show that word recognition is not enhanced for any of the visual features relative to the visual control condition.

Additional Links: PMID-40096812

Publisher:

PubMed:

Google:

full text, via PubMed Central

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40096812,
year = {2025},
author = {Benz, KR and Hauswald, A and Weisz, N},
title = {Influence of visual analogue of speech envelope, formants, and word onsets on word recognition is not pronounced.},
journal = {Hearing research},
volume = {460},
number = {},
pages = {109237},
doi = {10.1016/j.heares.2025.109237},
pmid = {40096812},
issn = {1878-5891},
abstract = {In noisy environments, filtering out the relevant speech signal from the background noise is a major challenge. Visual cues, such as lip movements, can improve speech understanding. This suggests that lip movements carry information about speech features (e.g. speech envelope, formants, word onsets) that can be used to aid speech understanding. Moreover, the isolated visual or tactile presentation of the speech envelope can also aid word recognition. However, the evidence in this area is rather mixed, and formants and word onsets have not been studied in this context. This online study investigates the effect of different visually presented speech features (speech envelope, formants, word onsets) during a two-talker audio on word recognition. The speech features were presented as a circle whose size was modulated over time based on the dynamics of three speech features. The circle was either modulated according to the speech features of the target speaker, the distractor speaker or an unrelated control sentence. After each sentence, the participants` word recognition was tested by writing down what they heard. We show that word recognition is not enhanced for any of the visual features relative to the visual control condition.},
}

RevDate: 2025-03-11

Bakhshaee M, Sadri AB, Sobhani D, et al (2025)

The Effect of Rhinoplasty on the Acoustic Characteristics of Resonance and Sound Production.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 77(1):401-411.

Rhinoplasty is the most common cosmetic surgery procedure in Iran. One of the complications of this procedure that has been less considered is the probable effect of rhinoplasty on voice. This study aimed to assess the influence of rhinoplasty on acoustic characteristics of resonance and sound production. This prospective study was undergone on 25 patients with rhinoplasty and septorhinoplasty. All patients were referred to a speech therapy clinic for voice recording. Participants were asked to read a task containing nasal vowels, nasal consonants, syllables, and sentences with and without nasal consonants while a microphone was placed 5 cm from the mouth in a silent room before and three times (one, three, and six months) after surgery. A speech therapist consultant analyzed the recording data. Acoustic parameters including formant 1-5, LTAS, and HNR were measured and compared before and after surgery. Based on this study, fourth and fifth formants were the most formant affected by rhinoplasty; however, it was not significant. In addition, other investigated acoustic parameters, including LTAS and HNR, did not differ meaningfully after the procedure. Acoustic analysis of nasal vowels, nasal consonants, syllables, words, and sentences with and without nasal consonants did not reveal any significant differences after the rhinoplasty.

Additional Links: PMID-40066383

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40066383,
year = {2025},
author = {Bakhshaee, M and Sadri, AB and Sobhani, D and Morovatdar, N and Rasoulian, B},
title = {The Effect of Rhinoplasty on the Acoustic Characteristics of Resonance and Sound Production.},
journal = {Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India},
volume = {77},
number = {1},
pages = {401-411},
doi = {10.1007/s12070-024-05208-3},
pmid = {40066383},
issn = {2231-3796},
abstract = {Rhinoplasty is the most common cosmetic surgery procedure in Iran. One of the complications of this procedure that has been less considered is the probable effect of rhinoplasty on voice. This study aimed to assess the influence of rhinoplasty on acoustic characteristics of resonance and sound production. This prospective study was undergone on 25 patients with rhinoplasty and septorhinoplasty. All patients were referred to a speech therapy clinic for voice recording. Participants were asked to read a task containing nasal vowels, nasal consonants, syllables, and sentences with and without nasal consonants while a microphone was placed 5 cm from the mouth in a silent room before and three times (one, three, and six months) after surgery. A speech therapist consultant analyzed the recording data. Acoustic parameters including formant 1-5, LTAS, and HNR were measured and compared before and after surgery. Based on this study, fourth and fifth formants were the most formant affected by rhinoplasty; however, it was not significant. In addition, other investigated acoustic parameters, including LTAS and HNR, did not differ meaningfully after the procedure. Acoustic analysis of nasal vowels, nasal consonants, syllables, words, and sentences with and without nasal consonants did not reveal any significant differences after the rhinoplasty.},
}

RevDate: 2025-03-06

Ambros GDA, MA Andrada E Silva (2025)

Resonance Strategies in the Upper Range of Western Operatic Tenors.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00071-2 [Epub ahead of print].

BACKGROUND: High notes pose a challenge for classical tenors due to physiological and acoustic aspects. According to nonlinear source-filter interactions, it is beneficial in these notes to position the resonances just above the frequency of their closest harmonics, amplifying them while avoiding phonatory discontinuities. Intentional tuning of resonances to harmonics in the high tenor tessitura has been described in the literature.

OBJECTIVES: Identify the resonance strategies employed by operatic tenors in high notes.

METHOD: Five professional tenors were recorded emitting the vowels /a, e, i, o, u/, sung in ascending scales between the notes C3 (131 Hz) and C5 (523 Hz) and spoken in carrier sentences. The frequencies of the first two resonances were extracted through inverse filtering, as well as the amplitudes of the first four harmonics and the peak in the singer's formant region in the radiated spectrum.

RESULTS: From low to high notes, the frequencies of the first two resonances of all vowels tended to converge. Resonance tuning was most employed in the passaggio (first resonance tuned to the second harmonic, second resonance to the fourth harmonic) and at its upper limit (second resonance tuned to the third harmonic). In the highest notes, the balanced distribution of energy among the lower harmonics was more frequent, with the more dramatic voices exhibiting an equally strong singer's formant. Only in the vowel /i/ did first resonance tunings to the first harmonic occur.

CONCLUSIONS: The vowels became progressively less distinguishable towards the high notes. Systematic resonance tuning was not observed in the high notes, with a greater occurrence of similarly strong lower harmonics, without strong distinct spectrum envelope peaks. Where resonance tuning was identified, there was no apparent preference for positioning the resonances above or below the frequency of their closest harmonics.

Additional Links: PMID-40050171

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40050171,
year = {2025},
author = {Ambros, GDA and Andrada E Silva, MA},
title = {Resonance Strategies in the Upper Range of Western Operatic Tenors.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.02.025},
pmid = {40050171},
issn = {1873-4588},
abstract = {BACKGROUND: High notes pose a challenge for classical tenors due to physiological and acoustic aspects. According to nonlinear source-filter interactions, it is beneficial in these notes to position the resonances just above the frequency of their closest harmonics, amplifying them while avoiding phonatory discontinuities. Intentional tuning of resonances to harmonics in the high tenor tessitura has been described in the literature.

OBJECTIVES: Identify the resonance strategies employed by operatic tenors in high notes.

METHOD: Five professional tenors were recorded emitting the vowels /a, e, i, o, u/, sung in ascending scales between the notes C3 (131 Hz) and C5 (523 Hz) and spoken in carrier sentences. The frequencies of the first two resonances were extracted through inverse filtering, as well as the amplitudes of the first four harmonics and the peak in the singer's formant region in the radiated spectrum.

RESULTS: From low to high notes, the frequencies of the first two resonances of all vowels tended to converge. Resonance tuning was most employed in the passaggio (first resonance tuned to the second harmonic, second resonance to the fourth harmonic) and at its upper limit (second resonance tuned to the third harmonic). In the highest notes, the balanced distribution of energy among the lower harmonics was more frequent, with the more dramatic voices exhibiting an equally strong singer's formant. Only in the vowel /i/ did first resonance tunings to the first harmonic occur.

CONCLUSIONS: The vowels became progressively less distinguishable towards the high notes. Systematic resonance tuning was not observed in the high notes, with a greater occurrence of similarly strong lower harmonics, without strong distinct spectrum envelope peaks. Where resonance tuning was identified, there was no apparent preference for positioning the resonances above or below the frequency of their closest harmonics.},
}

RevDate: 2025-03-05

Lou Q, Wang X, Wan T, et al (2024)

Speech Acoustic Analysis in Adult Patients With Cleft Palate After Cleft Palate Repair and Speech Therapy.

The Journal of craniofacial surgery pii:00001665-990000000-01814 [Epub ahead of print].

OBJECTIVE: This study aims to evaluate the enhancement of speech functionality in adult patients with cleft palate through acoustic analysis, assessing pronunciation level improvements before and after palatopharyngoplasty and speech treatment. The findings aim to provide an objective assessment of the treatment efficacy for older patients with cleft palate.

PARTICIPANTS AND INTERVENTION: The study involved acoustic comparisons encompassing vowel formants, voice onset time (VOT) of consonant syllables, syllable duration, and voice characteristic analysis. Speech functionality in each adult cleft palate patient was evaluated thrice: before palatopharyngoplasty, after palatopharyngoplasty, and following speech therapy, using a self-comparative analysis method to discern phonological differences.

RESULTS: No significant alteration in vowel formants was observed in adult cleft palate patients pre-palatopharyngoplasty and post-palatopharyngoplasty. Post-speech treatment, the F2 and F3 values for the anterior high vowel /i/ significantly improved, aligning closely with those of the normal adult group. Similarly, while consonant parameters (VOT value and syllable duration) remained unchanged post-surgery, both metrics showed significant improvement after speech therapy. Except for the prolonged syllable duration of /s/ compared with normal adults, other indicators were not significantly different. Voice parameter analysis revealed no significant change post-operation; however, both HNR and CPPS values post-speech treatment notably increased, matching those of normal adults.

CONCLUSION: Surgical intervention addresses the physical closure of the cleft palate and reconstructs the resonator's structure. Conversely, consonant improvement predominantly occurs through targeted speech therapy aimed at rectifying pronunciation habits and tutoring patients on the effective utilization of repaired articulatory organs. The combined intervention of cleft palate surgery and speech therapy plays a complementary role in speech restoration for cleft palate patients.

Additional Links: PMID-40043206

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40043206,
year = {2024},
author = {Lou, Q and Wang, X and Wan, T and Wang, B},
title = {Speech Acoustic Analysis in Adult Patients With Cleft Palate After Cleft Palate Repair and Speech Therapy.},
journal = {The Journal of craniofacial surgery},
volume = {},
number = {},
pages = {},
doi = {10.1097/SCS.0000000000010495},
pmid = {40043206},
issn = {1536-3732},
abstract = {OBJECTIVE: This study aims to evaluate the enhancement of speech functionality in adult patients with cleft palate through acoustic analysis, assessing pronunciation level improvements before and after palatopharyngoplasty and speech treatment. The findings aim to provide an objective assessment of the treatment efficacy for older patients with cleft palate.

PARTICIPANTS AND INTERVENTION: The study involved acoustic comparisons encompassing vowel formants, voice onset time (VOT) of consonant syllables, syllable duration, and voice characteristic analysis. Speech functionality in each adult cleft palate patient was evaluated thrice: before palatopharyngoplasty, after palatopharyngoplasty, and following speech therapy, using a self-comparative analysis method to discern phonological differences.

RESULTS: No significant alteration in vowel formants was observed in adult cleft palate patients pre-palatopharyngoplasty and post-palatopharyngoplasty. Post-speech treatment, the F2 and F3 values for the anterior high vowel /i/ significantly improved, aligning closely with those of the normal adult group. Similarly, while consonant parameters (VOT value and syllable duration) remained unchanged post-surgery, both metrics showed significant improvement after speech therapy. Except for the prolonged syllable duration of /s/ compared with normal adults, other indicators were not significantly different. Voice parameter analysis revealed no significant change post-operation; however, both HNR and CPPS values post-speech treatment notably increased, matching those of normal adults.

CONCLUSION: Surgical intervention addresses the physical closure of the cleft palate and reconstructs the resonator's structure. Conversely, consonant improvement predominantly occurs through targeted speech therapy aimed at rectifying pronunciation habits and tutoring patients on the effective utilization of repaired articulatory organs. The combined intervention of cleft palate surgery and speech therapy plays a complementary role in speech restoration for cleft palate patients.},
}

RevDate: 2025-02-28

Muegge JB, B McMurray (2025)

Understanding the Process of Integration in Binaural Cochlear Implant Configurations.

Ear and hearing [Epub ahead of print].

OBJECTIVES: Cochlear implant (CI) users with access to hearing in both ears (binaural configurations) tend to perform better in speech perception tasks than users with a single-hearing ear alone. This benefit derives from several sources, but one central contributor may be that binaural hearing allows listeners to integrate content across ears. A substantial literature demonstrates that binaural integration differs between CI users and normal hearing controls. However, there are still questions about the underlying process of this integration. Here, we test both normal-hearing listeners and CI users to examine this process.

DESIGN: Twenty-three CI users (7 bimodal, 7 bilateral, and 9 single sided deafness CI users) and 28 age-matched normal-hearing listeners completed a dichotic listening task, in which first and second formants from one of four vowels were played to each ear in various configurations: with both formants heard diotically, with one formant heard diotically, or with one formant heard in one ear and the second formant heard in the other (dichotically). Each formant heard alone should provide minimal information for identifying the vowel. Thus, listeners must successfully integrate information from both ears if they are to show good performance in the dichotic condition.

RESULTS: Normal-hearing listeners showed no noticeable difference in performance when formants were heard diotically or dichotically. CI users showed significantly reduced performance in the dichotic condition relative to when formants were heard diotically. A deeper examination of individual participants suggests that CI users show important variation in their integration process.

CONCLUSIONS: Using a dichotic listening task we provide evidence that while normal-hearing listeners successfully integrate content dichotically, CI users show remarkable differences in how they approach integration. This opens further questions regarding the circumstances in which listeners display different integration profiles and has implications for understanding variation in real-world performance outcomes.

Additional Links: PMID-40016877

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40016877,
year = {2025},
author = {Muegge, JB and McMurray, B},
title = {Understanding the Process of Integration in Binaural Cochlear Implant Configurations.},
journal = {Ear and hearing},
volume = {},
number = {},
pages = {},
pmid = {40016877},
issn = {1538-4667},
abstract = {OBJECTIVES: Cochlear implant (CI) users with access to hearing in both ears (binaural configurations) tend to perform better in speech perception tasks than users with a single-hearing ear alone. This benefit derives from several sources, but one central contributor may be that binaural hearing allows listeners to integrate content across ears. A substantial literature demonstrates that binaural integration differs between CI users and normal hearing controls. However, there are still questions about the underlying process of this integration. Here, we test both normal-hearing listeners and CI users to examine this process.

DESIGN: Twenty-three CI users (7 bimodal, 7 bilateral, and 9 single sided deafness CI users) and 28 age-matched normal-hearing listeners completed a dichotic listening task, in which first and second formants from one of four vowels were played to each ear in various configurations: with both formants heard diotically, with one formant heard diotically, or with one formant heard in one ear and the second formant heard in the other (dichotically). Each formant heard alone should provide minimal information for identifying the vowel. Thus, listeners must successfully integrate information from both ears if they are to show good performance in the dichotic condition.

RESULTS: Normal-hearing listeners showed no noticeable difference in performance when formants were heard diotically or dichotically. CI users showed significantly reduced performance in the dichotic condition relative to when formants were heard diotically. A deeper examination of individual participants suggests that CI users show important variation in their integration process.

CONCLUSIONS: Using a dichotic listening task we provide evidence that while normal-hearing listeners successfully integrate content dichotically, CI users show remarkable differences in how they approach integration. This opens further questions regarding the circumstances in which listeners display different integration profiles and has implications for understanding variation in real-world performance outcomes.},
}

RevDate: 2025-02-25
CmpDate: 2025-02-25

Persson A, Barreda S, TF Jaeger (2025)

Comparing accounts of formant normalization against US English listeners' vowel perception.

The Journal of the Acoustical Society of America, 157(2):1458-1482.

Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.

Additional Links: PMID-39998127

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39998127,
year = {2025},
author = {Persson, A and Barreda, S and Jaeger, TF},
title = {Comparing accounts of formant normalization against US English listeners' vowel perception.},
journal = {The Journal of the Acoustical Society of America},
volume = {157},
number = {2},
pages = {1458-1482},
doi = {10.1121/10.0035476},
pmid = {39998127},
issn = {1520-8524},
mesh = {Humans ; *Speech Perception ; *Phonetics ; Female ; Male ; *Speech Acoustics ; Adult ; Young Adult ; Language ; Acoustic Stimulation ; Recognition, Psychology ; },
abstract = {Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Perception
*Phonetics
Female
Male
*Speech Acoustics
Adult
Young Adult
Language
Acoustic Stimulation
Recognition, Psychology

RevDate: 2025-02-15

Liu W, Y Wang (2025)

Acoustic Characteristics of Tenors and Sopranos in Chinese National Singing and Bel Canto.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00038-4 [Epub ahead of print].

BACKGROUND: With the advancement of vocal arts, Chinese National Singing and Western Classical Singing (Bel Canto) encounter challenges in cross-cultural adaptation. Investigating formant tuning strategies and the singer's formant is crucial for scientifically characterizing the vocal production techniques in Chinese singing styles.

METHOD: Eight singers-Chinese National Singing tenors, Chinese National Singing sopranos, Bel Canto tenors, and Bel Canto sopranos-were recruited. The fundamental frequency (F0), intensity, formants, and long-term average spectrum (LTAS) were analyzed using a series of designed tasks to examine the phonation and articulation characteristics of these two singing genres in the context of cross-cultural adaptation.

RESULTS: A positive correlation between F0 and intensity was generally observed, though variations existed across vowels and singers. Both linear and non-linear relationships were found between F0 and formants. The first formant (F1) was proportional to F0, with greater variability for female singers in the vowel /a/. LTAS analysis revealed that the tenors exhibited the singer's formant in sung vowels and songs, whereas the sopranos did not exhibit this feature when singing vowels but did so in specific songs. Moreover, the primary and secondary spectral peaks in Bel Canto were less influenced by songs compared to Chinese National Singing.

CONCLUSIONS: (i) Intensity can provide an objective basis for differentiating subjective differences between singing genres, and individual differences are evident in how singers handle the relationship between F0 and intensity. (ii) Vowel modification and vowel migration in sopranos reflect consistency and variability across linguistic and cultural contexts. (iii) The presence and characteristics of the singer's formant are influenced by sexes, singing genres, and songs. Differences in the degree of spectral influence between the two singing genres suggest that Bel Canto emphasizes yi qiang xing zi (ie, phonation drives articulation), while Chinese National Singing emphasizes yi zi xing qiang (ie, articulation drives phonation).

Additional Links: PMID-39955192

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39955192,
year = {2025},
author = {Liu, W and Wang, Y},
title = {Acoustic Characteristics of Tenors and Sopranos in Chinese National Singing and Bel Canto.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.01.039},
pmid = {39955192},
issn = {1873-4588},
abstract = {BACKGROUND: With the advancement of vocal arts, Chinese National Singing and Western Classical Singing (Bel Canto) encounter challenges in cross-cultural adaptation. Investigating formant tuning strategies and the singer's formant is crucial for scientifically characterizing the vocal production techniques in Chinese singing styles.

METHOD: Eight singers-Chinese National Singing tenors, Chinese National Singing sopranos, Bel Canto tenors, and Bel Canto sopranos-were recruited. The fundamental frequency (F0), intensity, formants, and long-term average spectrum (LTAS) were analyzed using a series of designed tasks to examine the phonation and articulation characteristics of these two singing genres in the context of cross-cultural adaptation.

RESULTS: A positive correlation between F0 and intensity was generally observed, though variations existed across vowels and singers. Both linear and non-linear relationships were found between F0 and formants. The first formant (F1) was proportional to F0, with greater variability for female singers in the vowel /a/. LTAS analysis revealed that the tenors exhibited the singer's formant in sung vowels and songs, whereas the sopranos did not exhibit this feature when singing vowels but did so in specific songs. Moreover, the primary and secondary spectral peaks in Bel Canto were less influenced by songs compared to Chinese National Singing.

CONCLUSIONS: (i) Intensity can provide an objective basis for differentiating subjective differences between singing genres, and individual differences are evident in how singers handle the relationship between F0 and intensity. (ii) Vowel modification and vowel migration in sopranos reflect consistency and variability across linguistic and cultural contexts. (iii) The presence and characteristics of the singer's formant are influenced by sexes, singing genres, and songs. Differences in the degree of spectral influence between the two singing genres suggest that Bel Canto emphasizes yi qiang xing zi (ie, phonation drives articulation), while Chinese National Singing emphasizes yi zi xing qiang (ie, articulation drives phonation).},
}

RevDate: 2025-02-09

Pan AY, Grail GPO, Albert G, et al (2025)

What Contributes to Masculine Perception of Voice Among Transmasculine People on Testosterone Therapy?.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00471-5 [Epub ahead of print].

Voice is a highly salient and complex signal that people use to categorize another's gender. For transmasculine individuals seeking to align their gender expression with their gender identity, vocal presentation is a major concern. Voice-gender incongruence, where one's voice does not match their gender identity, can lead to vocal strain, fatigue, emotional distress, and increased risk of suicidality. Testosterone therapy, which uses exogenous testosterone to masculinize or androgynize the voice and other secondary sexual characteristics in individuals assigned female at birth, is one method to address this issue. However, many individuals remain dissatisfied with their voice post therapy, indicating that hormonal voice modification is a complex process not fully understood. In the present study, we use unmodified voice samples from 30 transmasculine individuals undergoing testosterone therapy and utilized multivariate analysis to determine the relative and combined effects of four acoustic parameters on two measures of gender perception. The results show that transmasculine individuals' speech is perceived as equally "masculine" as that of cisgender males, with both groups being statistically categorized as male at similar rates. Although mean fundamental frequency and formant-estimated vocal tract length together account for a significant portion of the variance in gender perceptions, a substantial amount of variance in gender perception remains unexplained. Understanding the acoustic and sociolinguistic factors that contribute to masculine voice presentation can lead to more informed and individualized care for transmasculine individuals experiencing voice-gender incongruence and considering testosterone therapy. For this population, addressing voice-gender incongruence has important implications for life satisfaction, quality of life, and self-esteem.

Additional Links: PMID-39924373

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39924373,
year = {2025},
author = {Pan, AY and Grail, GPO and Albert, G and Groll, MD and Stepp, CE and Arnocky, SA and Hodges-Simeon, CR},
title = {What Contributes to Masculine Perception of Voice Among Transmasculine People on Testosterone Therapy?.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.12.037},
pmid = {39924373},
issn = {1873-4588},
abstract = {Voice is a highly salient and complex signal that people use to categorize another's gender. For transmasculine individuals seeking to align their gender expression with their gender identity, vocal presentation is a major concern. Voice-gender incongruence, where one's voice does not match their gender identity, can lead to vocal strain, fatigue, emotional distress, and increased risk of suicidality. Testosterone therapy, which uses exogenous testosterone to masculinize or androgynize the voice and other secondary sexual characteristics in individuals assigned female at birth, is one method to address this issue. However, many individuals remain dissatisfied with their voice post therapy, indicating that hormonal voice modification is a complex process not fully understood. In the present study, we use unmodified voice samples from 30 transmasculine individuals undergoing testosterone therapy and utilized multivariate analysis to determine the relative and combined effects of four acoustic parameters on two measures of gender perception. The results show that transmasculine individuals' speech is perceived as equally "masculine" as that of cisgender males, with both groups being statistically categorized as male at similar rates. Although mean fundamental frequency and formant-estimated vocal tract length together account for a significant portion of the variance in gender perceptions, a substantial amount of variance in gender perception remains unexplained. Understanding the acoustic and sociolinguistic factors that contribute to masculine voice presentation can lead to more informed and individualized care for transmasculine individuals experiencing voice-gender incongruence and considering testosterone therapy. For this population, addressing voice-gender incongruence has important implications for life satisfaction, quality of life, and self-esteem.},
}

RevDate: 2025-01-31

Luo X, Lv J, Liu W, et al (2024)

Double-formant PCF-SPR refractive index sensor with ultra-high double-peak-shift sensitivity and a wide detection range.

Journal of the Optical Society of America. A, Optics, image science, and vision, 41(10):1873-1883.

A dual-resonance-peak photonic crystal fiber-surface plasmon resonance (PCF-SPR) refractive index (RI) sensor is designed for different wavelength ranges. The first resonance peak of the sensor is distributed in the wavelength range of 700-2350 nm, while the second peak is distributed in the range of 2350-5550 nm. In addition to detecting analytes using the full spectrum of constraint losses (CLs), it is also possible to use a single resonance peak to achieve the detection of analytes. By systematically optimizing the nanowire diameter, the diameter of the inner and outer layer air hole, the width of the groove, the polishing depth, and the distance from the outer layer air hole to the fiber core, the optimal structure of the sensor is finally determined. In this study, the sensor was studied by numerical analysis, and the characteristics of the sensor were evaluated by wavelength detection technology. The results show that within the RI range of 1.24-1.37, the sensor has a maximum wavelength sensitivity (WS) of 54700 nm/RIU for detecting the RI of analytes. Within the above refractive index range, the regression coefficient R [2] of the dual-peak-resonance wavelength is 0.99993, ensuring the accuracy of the estimated resonance wavelength of the sensor. In addition, the sensor can also use dual-peak-shift sensitivity (DPSS) to detect the refractive index, which is a relatively new sensing technology. The maximum DPSS of the sensor is 95300 nm/RIU. Due to its high sensitivity and unique dual-peak characteristics, this sensor has wide application prospects in medical diagnosis, environmental monitoring, food safety, and other fields.

Additional Links: PMID-39889010

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39889010,
year = {2024},
author = {Luo, X and Lv, J and Liu, W and Mi, C and Wang, J and Yang, L and Chu, PK and Liu, C},
title = {Double-formant PCF-SPR refractive index sensor with ultra-high double-peak-shift sensitivity and a wide detection range.},
journal = {Journal of the Optical Society of America. A, Optics, image science, and vision},
volume = {41},
number = {10},
pages = {1873-1883},
doi = {10.1364/JOSAA.530505},
pmid = {39889010},
issn = {1520-8532},
abstract = {A dual-resonance-peak photonic crystal fiber-surface plasmon resonance (PCF-SPR) refractive index (RI) sensor is designed for different wavelength ranges. The first resonance peak of the sensor is distributed in the wavelength range of 700-2350 nm, while the second peak is distributed in the range of 2350-5550 nm. In addition to detecting analytes using the full spectrum of constraint losses (CLs), it is also possible to use a single resonance peak to achieve the detection of analytes. By systematically optimizing the nanowire diameter, the diameter of the inner and outer layer air hole, the width of the groove, the polishing depth, and the distance from the outer layer air hole to the fiber core, the optimal structure of the sensor is finally determined. In this study, the sensor was studied by numerical analysis, and the characteristics of the sensor were evaluated by wavelength detection technology. The results show that within the RI range of 1.24-1.37, the sensor has a maximum wavelength sensitivity (WS) of 54700 nm/RIU for detecting the RI of analytes. Within the above refractive index range, the regression coefficient R [2] of the dual-peak-resonance wavelength is 0.99993, ensuring the accuracy of the estimated resonance wavelength of the sensor. In addition, the sensor can also use dual-peak-shift sensitivity (DPSS) to detect the refractive index, which is a relatively new sensing technology. The maximum DPSS of the sensor is 95300 nm/RIU. Due to its high sensitivity and unique dual-peak characteristics, this sensor has wide application prospects in medical diagnosis, environmental monitoring, food safety, and other fields.},
}

RevDate: 2025-01-17

Đinh LG, Brunelle M, TT Tạ (2025)

Relating production and perception in two Raglai dialects at different stages of registrogenesis.

Phonetica [Epub ahead of print].

This paper explores the perception of two diachronically related and mutually intelligible phonological oppositions, the onset voicing contrast of Northern Raglai and the register contrast of Southern Raglai. It is the continuation of a previous acoustic study that revealed that Northern Raglai onset stops maintain a voicing distinction accompanied by weak formant and voice quality modulations on following vowels, while Southern Raglai has transphonologized this voicing contrast into a register contrast marked by vowel and voice quality distinctions. Our findings indicate that the two dialects partially differ in their use of identification cues, Northern Raglai listeners using both voicing and F1 as major cues while Southern Raglai listeners largely focus on F1. Production and perception are thus not perfectly aligned in Northern Raglai, because F1 plays a stronger role in perception than production in this dialect. We conclude that mutual intelligibility between dialects is possible because they both use F1 for identification.

Additional Links: PMID-39824758

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39824758,
year = {2025},
author = {Đinh, LG and Brunelle, M and Tạ, TT},
title = {Relating production and perception in two Raglai dialects at different stages of registrogenesis.},
journal = {Phonetica},
volume = {},
number = {},
pages = {},
pmid = {39824758},
issn = {1423-0321},
abstract = {This paper explores the perception of two diachronically related and mutually intelligible phonological oppositions, the onset voicing contrast of Northern Raglai and the register contrast of Southern Raglai. It is the continuation of a previous acoustic study that revealed that Northern Raglai onset stops maintain a voicing distinction accompanied by weak formant and voice quality modulations on following vowels, while Southern Raglai has transphonologized this voicing contrast into a register contrast marked by vowel and voice quality distinctions. Our findings indicate that the two dialects partially differ in their use of identification cues, Northern Raglai listeners using both voicing and F1 as major cues while Southern Raglai listeners largely focus on F1. Production and perception are thus not perfectly aligned in Northern Raglai, because F1 plays a stronger role in perception than production in this dialect. We conclude that mutual intelligibility between dialects is possible because they both use F1 for identification.},
}

RevDate: 2025-01-08

Jv X, Wu J, Mao Q, et al (2024)

Development on Light and Thin Broadband Sound Absorption Structure Based on Unequal-Cross-Section Microperforated Plate Series Connection.

Materials (Basel, Switzerland), 17(24):.

The sound absorption structure of a microperforated plate has many advantages and has great potential in the field of noise control. In order to solve the problem of broadband sound absorption of microperforated plates, a series acoustic structure of microperforated plates of unequal cross-section was designed based on the traditional microperforated plate series acoustic structure. Compared with the traditional series structure, the sudden change of cross-section increases the sound energy dissipation and greatly improves the sound absorption performance. Through the analysis of its parameters, when the overall thickness of the structure is 20 mm, its sound absorption coefficient is above 0.5 in the frequency range of 1000-3450 Hz; there are three formants, and the sound absorption coefficients corresponding to the three formants reach 1. This study provides new ideas and methods for the design of broadband acoustic structures.

Additional Links: PMID-39769881

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39769881,
year = {2024},
author = {Jv, X and Wu, J and Mao, Q and Li, Q and Zhang, T},
title = {Development on Light and Thin Broadband Sound Absorption Structure Based on Unequal-Cross-Section Microperforated Plate Series Connection.},
journal = {Materials (Basel, Switzerland)},
volume = {17},
number = {24},
pages = {},
pmid = {39769881},
issn = {1996-1944},
support = {51965041//National Natural Science Foundation of China/ ; YC2022-s735//Jiangxi Postgraduate Innovation Special Fund Project/ ; },
abstract = {The sound absorption structure of a microperforated plate has many advantages and has great potential in the field of noise control. In order to solve the problem of broadband sound absorption of microperforated plates, a series acoustic structure of microperforated plates of unequal cross-section was designed based on the traditional microperforated plate series acoustic structure. Compared with the traditional series structure, the sudden change of cross-section increases the sound energy dissipation and greatly improves the sound absorption performance. Through the analysis of its parameters, when the overall thickness of the structure is 20 mm, its sound absorption coefficient is above 0.5 in the frequency range of 1000-3450 Hz; there are three formants, and the sound absorption coefficients corresponding to the three formants reach 1. This study provides new ideas and methods for the design of broadband acoustic structures.},
}

RevDate: 2025-01-07
CmpDate: 2025-01-07

Caragli V, Zacheo E, Nodari R, et al (2024)

Effects of face protector devices on acoustic parameters of voice.

Acta otorhinolaryngologica Italica : organo ufficiale della Societa italiana di otorinolaringologia e chirurgia cervico-facciale, 44(6):377-391.

OBJECTIVES: The SARS-CoV-2 pandemic required the use of personal protective equipment (PPE) in medical and social contexts to reduce exposure and prevent pathogen transmission. This study aims to analyse possible changes in voice and speech parameters with and without PPE.

METHODS: Speech samples using different types of PPE were obtained. Recordings were then analysed using PRAAT software (version 6.1.42). Statistical analysis was conducted using ANOVA in Jamovi software. A post-hoc test was performed to compare PPE-related results.

RESULTS: Statistically significant differences were found in Cepstral Peak of Prominence-Smoothed, Harmonic to Noise Ratio (HNR), slope of Long-Term Average Spectrum (LTAS), tilt of trendline through LTAS, shimmer parameters, HNR mean and standard deviation of vowels, vowels and consonants formants. HNR values increased whereas shimmer parameters and formant values reduced using PPE [PPE combined>filtering face piece (FFP)> surgical masks>no PPE].

CONCLUSIONS: Our data show improvement in many parameters of voice and speech quality and modification of speech articulation when using masks, particularly in case of combined PPE. The most relevant changes were found with a combination of face shield and FFP2 masks. This may be due to unconscious improvements in speech articulation and increased demand on vocal folds to achieve better speech intelligibility.

Additional Links: PMID-39763462

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39763462,
year = {2024},
author = {Caragli, V and Zacheo, E and Nodari, R and Genovese, E and Mancuso, A and Mazzoni, L},
title = {Effects of face protector devices on acoustic parameters of voice.},
journal = {Acta otorhinolaryngologica Italica : organo ufficiale della Societa italiana di otorinolaringologia e chirurgia cervico-facciale},
volume = {44},
number = {6},
pages = {377-391},
pmid = {39763462},
issn = {1827-675X},
mesh = {Humans ; *COVID-19/prevention & control/transmission ; Male ; Adult ; Female ; *Personal Protective Equipment ; *Voice Quality ; *Speech Acoustics ; Masks ; Young Adult ; Middle Aged ; Voice ; },
abstract = {OBJECTIVES: The SARS-CoV-2 pandemic required the use of personal protective equipment (PPE) in medical and social contexts to reduce exposure and prevent pathogen transmission. This study aims to analyse possible changes in voice and speech parameters with and without PPE.

METHODS: Speech samples using different types of PPE were obtained. Recordings were then analysed using PRAAT software (version 6.1.42). Statistical analysis was conducted using ANOVA in Jamovi software. A post-hoc test was performed to compare PPE-related results.

RESULTS: Statistically significant differences were found in Cepstral Peak of Prominence-Smoothed, Harmonic to Noise Ratio (HNR), slope of Long-Term Average Spectrum (LTAS), tilt of trendline through LTAS, shimmer parameters, HNR mean and standard deviation of vowels, vowels and consonants formants. HNR values increased whereas shimmer parameters and formant values reduced using PPE [PPE combined>filtering face piece (FFP)> surgical masks>no PPE].

CONCLUSIONS: Our data show improvement in many parameters of voice and speech quality and modification of speech articulation when using masks, particularly in case of combined PPE. The most relevant changes were found with a combination of face shield and FFP2 masks. This may be due to unconscious improvements in speech articulation and increased demand on vocal folds to achieve better speech intelligibility.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*COVID-19/prevention & control/transmission
Male
Adult
Female
*Personal Protective Equipment
*Voice Quality
*Speech Acoustics
Masks
Young Adult
Middle Aged
Voice

RevDate: 2024-12-31
CmpDate: 2024-12-31

Hu Z, Zhang Z, Li H, et al (2024)

Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.

Behavior research methods, 57(1):35.

In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.

Additional Links: PMID-39738817

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39738817,
year = {2024},
author = {Hu, Z and Zhang, Z and Li, H and Yang, LZ},
title = {Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.},
journal = {Behavior research methods},
volume = {57},
number = {1},
pages = {35},
pmid = {39738817},
issn = {1554-3528},
support = {82371931//Natural Science Fund of China/ ; YZJJ202207-TS//HFIPS Director's Fund/ ; 202204295107020004//Anhui Province Key Research and Development Project/ ; },
mesh = {Humans ; Reproducibility of Results ; Male ; Female ; Adult ; Young Adult ; *Speech Acoustics ; Smartphone ; Computers, Handheld ; Speech/physiology ; },
abstract = {In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Reproducibility of Results
Male
Female
Adult
Young Adult
*Speech Acoustics
Smartphone
Computers, Handheld
Speech/physiology

RevDate: 2025-01-04

Lobmaier JS, Klatt WK, SR Schweinberger (2024)

Voice of a woman: influence of interaction partner characteristics on cycle dependent vocal changes in women.

Frontiers in psychology, 15:1401158.

INTRODUCTION: Research has shown that women's vocal characteristics change during the menstrual cycle. Further, evidence suggests that individuals alter their voices depending on the context, such as when speaking to a highly attractive person, or a person with a different social status. The present study aimed at investigating the degree to which women's voices change depending on the vocal characteristics of the interaction partner, and how any such changes are modulated by the woman's current menstrual cycle phase.

METHODS: Forty-two naturally cycling women were recorded once during the late follicular phase (high fertility) and once during the luteal phase (low fertility) while reproducing utterances of men and women who were previously assessed to have either attractive or unattractive voices.

RESULTS: Phonetic analyses revealed that women's voices in response to speakers changed depending on their menstrual cycle phase (F0 variation, maximum F0, Centre of gravity) and depending on the stimulus speaker's vocal attractiveness (HNR, Formants 1-3, Centre of gravity), and sex (Formant 2). Also, the vocal characteristics differed when reproducing spoken sentences of the stimulus speakers compared to when they read out written sentences (minimum F0, Formants 2-4).

DISCUSSION: These results provide further evidence that women alter their voice depending on the vocal characteristics of the interaction partner and that these changes are modulated by the menstrual cycle phase. Specifically, the present findings suggest that cyclic shifts on women's voices may occur only in social contexts (i.e., when a putative interaction partner is involved).

Additional Links: PMID-39734777

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39734777,
year = {2024},
author = {Lobmaier, JS and Klatt, WK and Schweinberger, SR},
title = {Voice of a woman: influence of interaction partner characteristics on cycle dependent vocal changes in women.},
journal = {Frontiers in psychology},
volume = {15},
number = {},
pages = {1401158},
pmid = {39734777},
issn = {1664-1078},
abstract = {INTRODUCTION: Research has shown that women's vocal characteristics change during the menstrual cycle. Further, evidence suggests that individuals alter their voices depending on the context, such as when speaking to a highly attractive person, or a person with a different social status. The present study aimed at investigating the degree to which women's voices change depending on the vocal characteristics of the interaction partner, and how any such changes are modulated by the woman's current menstrual cycle phase.

METHODS: Forty-two naturally cycling women were recorded once during the late follicular phase (high fertility) and once during the luteal phase (low fertility) while reproducing utterances of men and women who were previously assessed to have either attractive or unattractive voices.

RESULTS: Phonetic analyses revealed that women's voices in response to speakers changed depending on their menstrual cycle phase (F0 variation, maximum F0, Centre of gravity) and depending on the stimulus speaker's vocal attractiveness (HNR, Formants 1-3, Centre of gravity), and sex (Formant 2). Also, the vocal characteristics differed when reproducing spoken sentences of the stimulus speakers compared to when they read out written sentences (minimum F0, Formants 2-4).

DISCUSSION: These results provide further evidence that women alter their voice depending on the vocal characteristics of the interaction partner and that these changes are modulated by the menstrual cycle phase. Specifically, the present findings suggest that cyclic shifts on women's voices may occur only in social contexts (i.e., when a putative interaction partner is involved).},
}

RevDate: 2024-12-25

Xiu N, Liu L, Li W, et al (2024)

Correlation Analysis Between Cortical Structural Features and Acoustic Features in Patients With Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00426-0 [Epub ahead of print].

PURPOSE: Parkinson disease (PD) is a progressive neurodegenerative disease. The aim of this study is to investigate the association between acoustic and cortical brain features in Parkinson's disease patients.

METHODS: We recruited 19 (eight females, 11 males) Parkinson's disease patients and 19 (eight females, 11 males) healthy subjects to participate in the experiment. Speech samples of three vowels (/i/, /a/, /u/), six plosives (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/), and three voiced consonants (/l/, /m/, /n/) were collected for the experiment, and the acoustic parameters were extracted for fundamental frequency (F0), voice onset time (VOT), voicing onset-vocalic voicing onset (VO-VVO), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), Jitter, Shimmer, and Harmonics-to-noise ratio (HNR). We also used Ingenia CX 3.0 T to complete the cranial magnetic resonance scanning and did image processing based on the Desikan-Killiany-Tourville Atlas. We assessed the differences in acoustic and neuroimaging parameters between the PD and healthy controls (HCs) groups using the Levene's test (LT), two-sample independent t test (TT), and Mann-Whitney U test (MWUT), and calculated Spearman's bias correlations for acoustic and neuroimaging parameters in the PD and HC groups, respectively.

RESULTS: The results showed that in acoustic features, based on the results of the TT, it can be seen that the F3 of the PD group regarding the vowel /i/ is significantly smaller than that of the HC group. The jitter on the vowel /u/ was significantly higher in the male PD group than in the male HC group. For other acoustic measures, there were no statistically significant differences between the two groups. In the cortical features, the thickness, area, and volume of the cortex were reduced in the vast majority of the brains of the PD patients, however, there is also a small portion of the cortex that appears to be thickened. In the correlation analysis between cortical and acoustic features, F0, F1, F2, F3, B2, B3, VO-VVO, Jitter, HNR, and VOT acoustic parameters showed significant and strong correlation with thickness, area, and volume of cortical sites such as frontal, temporal, entorhinal, fusiform, and precuneus in PD patients, whereas no significant correlation was found in HC group.

CONCLUSIONS: This suggests that Parkinson's disease does have an effect on the acoustic and cortical features of the patient's brain, and that there is a correlation between the two features.

Additional Links: PMID-39721882

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39721882,
year = {2024},
author = {Xiu, N and Liu, L and Li, W and Cai, Z and Wang, Y and Wang, R and Vaxelaire, B and Sock, R and Ling, Z and Chen, J},
title = {Correlation Analysis Between Cortical Structural Features and Acoustic Features in Patients With Parkinson's Disease.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.11.042},
pmid = {39721882},
issn = {1873-4588},
abstract = {PURPOSE: Parkinson disease (PD) is a progressive neurodegenerative disease. The aim of this study is to investigate the association between acoustic and cortical brain features in Parkinson's disease patients.

METHODS: We recruited 19 (eight females, 11 males) Parkinson's disease patients and 19 (eight females, 11 males) healthy subjects to participate in the experiment. Speech samples of three vowels (/i/, /a/, /u/), six plosives (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/), and three voiced consonants (/l/, /m/, /n/) were collected for the experiment, and the acoustic parameters were extracted for fundamental frequency (F0), voice onset time (VOT), voicing onset-vocalic voicing onset (VO-VVO), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), Jitter, Shimmer, and Harmonics-to-noise ratio (HNR). We also used Ingenia CX 3.0 T to complete the cranial magnetic resonance scanning and did image processing based on the Desikan-Killiany-Tourville Atlas. We assessed the differences in acoustic and neuroimaging parameters between the PD and healthy controls (HCs) groups using the Levene's test (LT), two-sample independent t test (TT), and Mann-Whitney U test (MWUT), and calculated Spearman's bias correlations for acoustic and neuroimaging parameters in the PD and HC groups, respectively.

RESULTS: The results showed that in acoustic features, based on the results of the TT, it can be seen that the F3 of the PD group regarding the vowel /i/ is significantly smaller than that of the HC group. The jitter on the vowel /u/ was significantly higher in the male PD group than in the male HC group. For other acoustic measures, there were no statistically significant differences between the two groups. In the cortical features, the thickness, area, and volume of the cortex were reduced in the vast majority of the brains of the PD patients, however, there is also a small portion of the cortex that appears to be thickened. In the correlation analysis between cortical and acoustic features, F0, F1, F2, F3, B2, B3, VO-VVO, Jitter, HNR, and VOT acoustic parameters showed significant and strong correlation with thickness, area, and volume of cortical sites such as frontal, temporal, entorhinal, fusiform, and precuneus in PD patients, whereas no significant correlation was found in HC group.

CONCLUSIONS: This suggests that Parkinson's disease does have an effect on the acoustic and cortical features of the patient's brain, and that there is a correlation between the two features.},
}

RevDate: 2025-01-04

Song J, Kim H, YO Lee (2024)

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.

Heliyon, 10(24):e40748.

INTRODUCTION: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

PROBLEM STATEMENT: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

OBJECTIVES: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

METHODS: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

RESULTS: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

CONCLUSION: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.

Additional Links: PMID-39720068

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39720068,
year = {2024},
author = {Song, J and Kim, H and Lee, YO},
title = {Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.},
journal = {Heliyon},
volume = {10},
number = {24},
pages = {e40748},
pmid = {39720068},
issn = {2405-8440},
abstract = {INTRODUCTION: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

PROBLEM STATEMENT: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

OBJECTIVES: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

METHODS: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

RESULTS: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

CONCLUSION: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.},
}

RevDate: 2025-01-04
CmpDate: 2024-12-10

Cavalcanti JC, Eriksson A, Barbosa PA, et al (2024)

Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.

PloS one, 19(12):e0311363.

Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants-resonance peaks in the vocal tract-in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants 'F3 + F4' outperformed the combination of lower formants 'F1 + F2'. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the 'F1 + F2' relation was concerned.

Additional Links: PMID-39656685

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39656685,
year = {2024},
author = {Cavalcanti, JC and Eriksson, A and Barbosa, PA and Madureira, S},
title = {Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.},
journal = {PloS one},
volume = {19},
number = {12},
pages = {e0311363},
pmid = {39656685},
issn = {1932-6203},
mesh = {Humans ; Male ; Adult ; *Speech/physiology ; Speech Acoustics ; Phonetics ; Likelihood Functions ; Young Adult ; Speech Production Measurement/methods ; Language ; },
abstract = {Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants-resonance peaks in the vocal tract-in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants 'F3 + F4' outperformed the combination of lower formants 'F1 + F2'. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the 'F1 + F2' relation was concerned.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Adult
*Speech/physiology
Speech Acoustics
Phonetics
Likelihood Functions
Young Adult
Speech Production Measurement/methods
Language

RevDate: 2024-12-10
CmpDate: 2024-12-10

Cervantes Constantino F, Á Caputi (2024)

Cortical tracking of speakers' spectral changes predicts selective listening.

Cerebral cortex (New York, N.Y. : 1991), 34(12):.

A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. The spectral spacing between neighboring instantaneous frequency contours (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.

Additional Links: PMID-39656649

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39656649,
year = {2024},
author = {Cervantes Constantino, F and Caputi, Á},
title = {Cortical tracking of speakers' spectral changes predicts selective listening.},
journal = {Cerebral cortex (New York, N.Y. : 1991)},
volume = {34},
number = {12},
pages = {},
doi = {10.1093/cercor/bhae472},
pmid = {39656649},
issn = {1460-2199},
support = {FCE_1_2019_1_155889//Agencia Nacional de Investigación e Innovación/ ; },
mesh = {Humans ; Male ; Female ; *Speech Perception/physiology ; Adult ; *Electroencephalography/methods ; Young Adult ; Cerebral Cortex/physiology ; Acoustic Stimulation/methods ; },
abstract = {A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. The spectral spacing between neighboring instantaneous frequency contours (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
*Speech Perception/physiology
Adult
*Electroencephalography/methods
Young Adult
Cerebral Cortex/physiology
Acoustic Stimulation/methods

RevDate: 2024-12-12

Heiszenberger E, Reinisch E, Hartmann F, et al (2024)

Perceptually Easy Second-Language Phones Are Not Always Easy: The Role of Orthography and Phonology in Schwa Realization in Second-Language French.

Language and speech [Epub ahead of print].

Encoding and establishing a new second-language (L2) phonological category is notoriously difficult. This is particularly true for phonological contrasts that do not exist in the learners' native language (L1). Phonological categories that also exist in the L1 do not seem to pose any problems. However, foreign-language learners are not only presented with oral input. Instructed L2 learning often involves heavy reliance on written forms of the target language. The present study investigates the contribution of orthography to the quality of phonolexical encoding by examining the acoustics of French schwa by Austrian German learners-a perceptually and articulatorily easy L2 phone with incongruent grapheme-phoneme correspondences between the L1 and L2. We compared production patterns in an auditory word-repetition task (without orthographic input) with those in a word-reading task. We analyzed the formant values (F1, F2, F3) of the schwa realizations of two groups of Austrian high-school students who had been learning French for 1 and 6 years. The results show that production patterns are more likely to be affected by L1 grapheme-to-phoneme correspondences when orthographic input is present. However, orthography does not appear to play the dominant role, as L2 development patterns are strongly determined by both the speaker and especially the lexical item, suggesting a highly complex interaction of multiple internal and external factors in the establishment of L2 phonological categories beyond orthography and phonology.

Additional Links: PMID-39665279

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39665279,
year = {2024},
author = {Heiszenberger, E and Reinisch, E and Hartmann, F and Brown, E and Pustka, E},
title = {Perceptually Easy Second-Language Phones Are Not Always Easy: The Role of Orthography and Phonology in Schwa Realization in Second-Language French.},
journal = {Language and speech},
volume = {},
number = {},
pages = {238309241277995},
doi = {10.1177/00238309241277995},
pmid = {39665279},
issn = {1756-6053},
abstract = {Encoding and establishing a new second-language (L2) phonological category is notoriously difficult. This is particularly true for phonological contrasts that do not exist in the learners' native language (L1). Phonological categories that also exist in the L1 do not seem to pose any problems. However, foreign-language learners are not only presented with oral input. Instructed L2 learning often involves heavy reliance on written forms of the target language. The present study investigates the contribution of orthography to the quality of phonolexical encoding by examining the acoustics of French schwa by Austrian German learners-a perceptually and articulatorily easy L2 phone with incongruent grapheme-phoneme correspondences between the L1 and L2. We compared production patterns in an auditory word-repetition task (without orthographic input) with those in a word-reading task. We analyzed the formant values (F1, F2, F3) of the schwa realizations of two groups of Austrian high-school students who had been learning French for 1 and 6 years. The results show that production patterns are more likely to be affected by L1 grapheme-to-phoneme correspondences when orthographic input is present. However, orthography does not appear to play the dominant role, as L2 development patterns are strongly determined by both the speaker and especially the lexical item, suggesting a highly complex interaction of multiple internal and external factors in the establishment of L2 phonological categories beyond orthography and phonology.},
}

RevDate: 2024-12-09
CmpDate: 2024-12-06

Fadeev KA, Romero Reyes IV, Goiaeva DE, et al (2024)

Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism.

Journal of neurodevelopmental disorders, 16(1):67.

BACKGROUND: Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory "object".

METHODS: We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio.

RESULTS: SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150-200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise.

CONCLUSIONS: These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.

Additional Links: PMID-39643915

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39643915,
year = {2024},
author = {Fadeev, KA and Romero Reyes, IV and Goiaeva, DE and Obukhova, TS and Ovsiannikova, TM and Prokofyev, AO and Rytikova, AM and Novikov, AY and Kozunov, VV and Stroganova, TA and Orekhova, EV},
title = {Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism.},
journal = {Journal of neurodevelopmental disorders},
volume = {16},
number = {1},
pages = {67},
pmid = {39643915},
issn = {1866-1955},
mesh = {Humans ; Male ; *Speech Perception/physiology ; *Magnetoencephalography ; Child ; *Temporal Lobe/physiopathology ; *Noise ; Acoustic Stimulation ; Evoked Potentials, Auditory/physiology ; Autism Spectrum Disorder/physiopathology/complications ; Adolescent ; Auditory Cortex/physiopathology ; Autistic Disorder/physiopathology/complications ; },
abstract = {BACKGROUND: Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory "object".

METHODS: We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio.

RESULTS: SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150-200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise.

CONCLUSIONS: These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
*Speech Perception/physiology
*Magnetoencephalography
Child
*Temporal Lobe/physiopathology
*Noise
Acoustic Stimulation
Evoked Potentials, Auditory/physiology
Autism Spectrum Disorder/physiopathology/complications
Adolescent
Auditory Cortex/physiopathology
Autistic Disorder/physiopathology/complications

RevDate: 2024-11-28
CmpDate: 2024-11-28

Xie B, Li Z, Wang H, et al (2024)

[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].

Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery, 38(12):1149-1153.

Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels（/a/, /(?)/, /i/ and /u/） and different sound intensities（lowest sound, comfort sound, highest true sound and highest falsetto sound） were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.

Additional Links: PMID-39605265

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39605265,
year = {2024},
author = {Xie, B and Li, Z and Wang, H and Kuang, X and Ni, W and Zhong, R and Li, Y},
title = {[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].},
journal = {Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery},
volume = {38},
number = {12},
pages = {1149-1153},
doi = {10.13201/j.issn.2096-7993.2024.12.011},
pmid = {39605265},
issn = {2096-7993},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Acoustics ; Voice Quality ; Phonetics ; Voice/physiology ; Adult ; },
abstract = {Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels（/a/, /(?)/, /i/ and /u/） and different sound intensities（lowest sound, comfort sound, highest true sound and highest falsetto sound） were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Young Adult
*Speech Acoustics
Voice Quality
Phonetics
Voice/physiology
Adult

RevDate: 2025-01-02
CmpDate: 2025-01-02

Fagniart S, Delvaux V, Harmegnies B, et al (2025)

Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.

Journal of speech, language, and hearing research : JSLHR, 68(1):301-322.

PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.

METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.

RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.

CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.

Additional Links: PMID-39589237

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39589237,
year = {2025},
author = {Fagniart, S and Delvaux, V and Harmegnies, B and Huberlant, A and Huet, K and Piccaluga, M and Watterman, I and Charlier, B},
title = {Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {68},
number = {1},
pages = {301-322},
doi = {10.1044/2024_JSLHR-24-00083},
pmid = {39589237},
issn = {1558-9102},
mesh = {Humans ; *Cochlear Implants ; Female ; Male ; Child ; *Speech Acoustics ; *Phonetics ; *Cues ; *Speech Perception/physiology ; *Judgment ; Speech Production Measurement/methods ; Speech/physiology ; Nose/physiology ; Deafness/rehabilitation ; },
abstract = {PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.

METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.

RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.

CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Cochlear Implants
Female
Male
Child
*Speech Acoustics
*Phonetics
*Cues
*Speech Perception/physiology
*Judgment
Speech Production Measurement/methods
Speech/physiology
Nose/physiology
Deafness/rehabilitation

RevDate: 2024-11-16

Bøyesen B, Ø Hide (2024)

Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00363-1 [Epub ahead of print].

OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.

METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.

RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.

CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.

Additional Links: PMID-39550323

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39550323,
year = {2024},
author = {Bøyesen, B and Hide, Ø},
title = {Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.10.020},
pmid = {39550323},
issn = {1873-4588},
abstract = {OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.

METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.

RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.

CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.},
}

RevDate: 2024-11-12
CmpDate: 2024-11-12

Ponsonnet M, Coupé C, Pellegrino F, et al (2024)

Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).

The Journal of the Acoustical Society of America, 156(5):3118-3139.

In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.

Additional Links: PMID-39531311

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39531311,
year = {2024},
author = {Ponsonnet, M and Coupé, C and Pellegrino, F and Garcia Arasco, A and Pisanski, K},
title = {Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {5},
pages = {3118-3139},
doi = {10.1121/10.0032454},
pmid = {39531311},
issn = {1520-8524},
mesh = {Humans ; *Emotions ; Phonetics ; Language ; Speech Acoustics ; Pain/psychology ; Voice Quality ; Happiness ; },
abstract = {In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Emotions
Phonetics
Language
Speech Acoustics
Pain/psychology
Voice Quality
Happiness

RevDate: 2024-11-16
CmpDate: 2024-11-08

Carranante G, Cany C, Farri P, et al (2024)

Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.

Scientific reports, 14(1):27183.

Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.

Additional Links: PMID-39516258

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39516258,
year = {2024},
author = {Carranante, G and Cany, C and Farri, P and Giavazzi, M and Varnet, L},
title = {Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {27183},
pmid = {39516258},
issn = {2045-2322},
support = {ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-17-EURE-0017//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; },
mesh = {Humans ; *Speech Perception/physiology ; Female ; Male ; *Noise ; Adult ; *Phonetics ; Young Adult ; Language ; Cues ; Speech Acoustics ; France ; Acoustic Stimulation ; },
abstract = {Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Perception/physiology
Female
Male
*Noise
Adult
*Phonetics
Young Adult
Language
Cues
Speech Acoustics
France
Acoustic Stimulation

RevDate: 2025-01-05
CmpDate: 2024-11-08

Lin YC, Yan HT, Lin CH, et al (2024)

Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.

Journal of medical Internet research, 26:e58466.

BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.

OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.

METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.

RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.

CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.

Additional Links: PMID-39515817

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39515817,
year = {2024},
author = {Lin, YC and Yan, HT and Lin, CH and Chang, HH},
title = {Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.},
journal = {Journal of medical Internet research},
volume = {26},
number = {},
pages = {e58466},
pmid = {39515817},
issn = {1438-8871},
mesh = {Humans ; Aged ; Cross-Sectional Studies ; *Frailty/physiopathology ; Male ; Female ; *Phenotype ; *Biomarkers ; Middle Aged ; Voice/physiology ; Aged, 80 and over ; Taiwan ; Frail Elderly/statistics & numerical data ; Sarcopenia/physiopathology/diagnosis ; },
abstract = {BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.

OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.

METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.

RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.

CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Aged
Cross-Sectional Studies
*Frailty/physiopathology
Male
Female
*Phenotype
*Biomarkers
Middle Aged
Voice/physiology
Aged, 80 and over
Taiwan
Frail Elderly/statistics & numerical data
Sarcopenia/physiopathology/diagnosis

RevDate: 2025-01-04
CmpDate: 2024-12-16

Hullebus M, Gafos A, Boll-Avetisyan N, et al (2025)

Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.

Infancy : the official journal of the International Society on Infant Studies, 30(1):e12630.

Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.

Additional Links: PMID-39487102

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39487102,
year = {2025},
author = {Hullebus, M and Gafos, A and Boll-Avetisyan, N and Langus, A and Fritzsche, T and Höhle, B},
title = {Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.},
journal = {Infancy : the official journal of the International Society on Infant Studies},
volume = {30},
number = {1},
pages = {e12630},
pmid = {39487102},
issn = {1532-7078},
support = {317633480 - SFB 1287//Deutsche Forschungsgemeinschaft/ ; },
mesh = {Humans ; *Cues ; *Speech Perception ; *Phonetics ; Male ; Female ; Infant ; Speech Acoustics ; Adult ; Acoustic Stimulation ; Language Development ; },
abstract = {Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Cues
*Speech Perception
*Phonetics
Male
Female
Infant
Speech Acoustics
Adult
Acoustic Stimulation
Language Development

RevDate: 2024-10-30

Ayadi H, Elbéji A, Despotovic V, et al (2024)

Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.

Digital biomarkers, 8(1):159-170.

INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.

METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.

RESULTS: A total of 1,332‬ participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.

CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.

Additional Links: PMID-39473806

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39473806,
year = {2024},
author = {Ayadi, H and Elbéji, A and Despotovic, V and Fagherazzi, G},
title = {Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.},
journal = {Digital biomarkers},
volume = {8},
number = {1},
pages = {159-170},
pmid = {39473806},
issn = {2504-110X},
abstract = {INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.

METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.

RESULTS: A total of 1,332‬ participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.

CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.},
}

RevDate: 2025-01-09
CmpDate: 2024-11-22

Li JJ, Daliri A, Kim KS, et al (2024)

Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.

Neuroscience letters, 843:138025.

Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).

Additional Links: PMID-39461704

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39461704,
year = {2024},
author = {Li, JJ and Daliri, A and Kim, KS and Max, L},
title = {Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.},
journal = {Neuroscience letters},
volume = {843},
number = {},
pages = {138025},
pmid = {39461704},
issn = {1872-7972},
support = {R01 DC014510/DC/NIDCD NIH HHS/United States ; R01 DC017444/DC/NIDCD NIH HHS/United States ; R01 DC020162/DC/NIDCD NIH HHS/United States ; R01 DC020707/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; Male ; Female ; *Speech/physiology ; Adult ; Young Adult ; *Electroencephalography/methods ; *Speech Perception/physiology ; Auditory Cortex/physiology ; Acoustic Stimulation/methods ; Movement/physiology ; Auditory Perception/physiology ; },
abstract = {Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
*Speech/physiology
Adult
Young Adult
*Electroencephalography/methods
*Speech Perception/physiology
Auditory Cortex/physiology
Acoustic Stimulation/methods
Movement/physiology
Auditory Perception/physiology

RevDate: 2024-10-24

Pekdemir A, Kemaloğlu YK, Gölaç H, et al (2024)

The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00309-6 [Epub ahead of print].

PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.

METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.

RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.

CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.

Additional Links: PMID-39448279

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39448279,
year = {2024},
author = {Pekdemir, A and Kemaloğlu, YK and Gölaç, H and İriz, A and Köktürk, O and Mengü, G},
title = {The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.018},
pmid = {39448279},
issn = {1873-4588},
abstract = {PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.

METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.

RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.

CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.},
}

RevDate: 2024-11-04
CmpDate: 2024-10-24

Feng S, X Jiang (2024)

Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.

The Journal of the Acoustical Society of America, 156(4):2860-2876.

Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.

Additional Links: PMID-39445770

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39445770,
year = {2024},
author = {Feng, S and Jiang, X},
title = {Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2860-2876},
doi = {10.1121/10.0032400},
pmid = {39445770},
issn = {1520-8524},
mesh = {Adult ; Female ; Humans ; Male ; Intention ; *Language ; Multilingualism ; Phonetics ; *Speech Acoustics ; *Speech Perception ; },
abstract = {Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Adult
Female
Humans
Male
Intention
*Language
Multilingualism
Phonetics
*Speech Acoustics
*Speech Perception

RevDate: 2024-12-02
CmpDate: 2024-12-02

Persson A (2024)

The acoustic characteristics of Swedish vowels.

Phonetica, 81(6):599-643.

The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.

Additional Links: PMID-39443329

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39443329,
year = {2024},
author = {Persson, A},
title = {The acoustic characteristics of Swedish vowels.},
journal = {Phonetica},
volume = {81},
number = {6},
pages = {599-643},
pmid = {39443329},
issn = {1423-0321},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; Sweden ; *Language ; Speech Perception ; Sound Spectrography ; Female ; Male ; Cues ; Adult ; },
abstract = {The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
*Phonetics
Sweden
*Language
Speech Perception
Sound Spectrography
Female
Male
Cues
Adult

RevDate: 2024-10-22

Martínez-Olalla R, Hidalgo-De la Guía I, Gayarzábal-Heinze E, et al (2024)

Analysis of Voice Quality in Children With Smith-Magenis Syndrome.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00319-9 [Epub ahead of print].

UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.

OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.

METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.

RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.

CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.

Additional Links: PMID-39438167

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39438167,
year = {2024},
author = {Martínez-Olalla, R and Hidalgo-De la Guía, I and Gayarzábal-Heinze, E and Fernández-Ruiz, R and Núñez-Vidal, E and Álvarez-Marquina, A and Palacios-Alonso, D},
title = {Analysis of Voice Quality in Children With Smith-Magenis Syndrome.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.026},
pmid = {39438167},
issn = {1873-4588},
abstract = {UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.

OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.

METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.

RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.

CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.},
}

RevDate: 2024-11-17
CmpDate: 2024-11-07

Krakauer J, Naber C, Niziolek CA, et al (2024)

Divided Attention Has Limited Effects on Speech Sensorimotor Control.

Journal of speech, language, and hearing research : JSLHR, 67(11):4358-4368.

PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.

METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.

RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.

CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.

Additional Links: PMID-39418590

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39418590,
year = {2024},
author = {Krakauer, J and Naber, C and Niziolek, CA and Parrell, B},
title = {Divided Attention Has Limited Effects on Speech Sensorimotor Control.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {11},
pages = {4358-4368},
pmid = {39418590},
issn = {1558-9102},
support = {R01 DC017091/DC/NIDCD NIH HHS/United States ; R01 DC019134/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Attention/physiology ; Male ; Female ; Young Adult ; *Speech/physiology ; Adult ; Feedback, Sensory/physiology ; Adaptation, Physiological/physiology ; Speech Perception/physiology ; Visual Perception/physiology ; Adolescent ; },
abstract = {PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.

METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.

RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.

CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Attention/physiology
Male
Female
Young Adult
*Speech/physiology
Adult
Feedback, Sensory/physiology
Adaptation, Physiological/physiology
Speech Perception/physiology
Visual Perception/physiology
Adolescent

RevDate: 2024-10-16

He Y, Wang X, Huang T, et al (2024)

The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00334-5 [Epub ahead of print].

The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.

Additional Links: PMID-39414424

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39414424,
year = {2024},
author = {He, Y and Wang, X and Huang, T and Zhao, W and Fu, Z and Zheng, Q and Jin, L and Kim, H and Liu, H},
title = {The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.041},
pmid = {39414424},
issn = {1873-4588},
abstract = {The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.},
}

RevDate: 2024-10-16

Wang Y, Y Zhao (2024)

Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00316-3 [Epub ahead of print].

OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.

METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).

RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.

CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.

Additional Links: PMID-39414423

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39414423,
year = {2024},
author = {Wang, Y and Zhao, Y},
title = {Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.022},
pmid = {39414423},
issn = {1873-4588},
abstract = {OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.

METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).

RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.

CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.},
}

RevDate: 2024-10-14
CmpDate: 2024-10-14

Clopper CG (2024)

Dynamic acoustic vowel distances within and across dialects.

The Journal of the Acoustical Society of America, 156(4):2497-2507.

Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.

Additional Links: PMID-39400271

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39400271,
year = {2024},
author = {Clopper, CG},
title = {Dynamic acoustic vowel distances within and across dialects.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2497-2507},
doi = {10.1121/10.0032385},
pmid = {39400271},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; *Speech Production Measurement/methods ; Voice Quality ; Acoustics ; Female ; Male ; Time Factors ; Language ; Sound Spectrography ; Adult ; },
abstract = {Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
*Phonetics
*Speech Production Measurement/methods
Voice Quality
Acoustics
Female
Male
Time Factors
Language
Sound Spectrography
Adult

RevDate: 2024-11-10

Ozkan Atak HB, Aslan F, Sennaroglu G, et al (2024)

Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.

Audiology & neuro-otology pii:000541716 [Epub ahead of print].

INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABIs) throughout the first 3 years of life are difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.

METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development-Third Edition (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.

RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.

CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model, and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.

Additional Links: PMID-39396508

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39396508,
year = {2024},
author = {Ozkan Atak, HB and Aslan, F and Sennaroglu, G and Sennaroglu, L},
title = {Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.},
journal = {Audiology & neuro-otology},
volume = {},
number = {},
pages = {1-12},
doi = {10.1159/000541716},
pmid = {39396508},
issn = {1421-9700},
abstract = {INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABIs) throughout the first 3 years of life are difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.

METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development-Third Edition (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.

RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.

CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model, and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.},
}

RevDate: 2024-10-11
CmpDate: 2024-10-11

Yegnanarayana B, V Pannala (2024)

Processing group delay spectrograms for study of formant and harmonic contours in speech signals.

The Journal of the Acoustical Society of America, 156(4):2422-2433.

This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.

Additional Links: PMID-39392353

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39392353,
year = {2024},
author = {Yegnanarayana, B and Pannala, V},
title = {Processing group delay spectrograms for study of formant and harmonic contours in speech signals.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2422-2433},
doi = {10.1121/10.0032364},
pmid = {39392353},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; Sound Spectrography ; Signal Processing, Computer-Assisted ; Speech Production Measurement/methods ; Voice Quality ; Time Factors ; Phonetics ; },
abstract = {This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
Sound Spectrography
Signal Processing, Computer-Assisted
Speech Production Measurement/methods
Voice Quality
Time Factors
Phonetics

RevDate: 2024-11-20
CmpDate: 2024-11-07

Parrell B, Niziolek CA, T Chen (2024)

Sensorimotor adaptation to a nonuniform formant perturbation generalizes to untrained vowels.

Journal of neurophysiology, 132(5):1437-1444.

When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a nonuniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.NEW & NOTEWORTHY We show that sensorimotor adaptation of vowels at the edges of the articulatory working space generalizes to intermediate vowels through local transfer of learning from adjacent vowels. These results extend findings on the locality of sensorimotor learning from upper limb control to speech, a complex task with an opaque and nonlinear transformation between motor actions and sensory consequences. Our results also suggest that our paradigm has potential to drive behaviorally relevant changes that improve communication effectiveness.

Additional Links: PMID-39356074

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39356074,
year = {2024},
author = {Parrell, B and Niziolek, CA and Chen, T},
title = {Sensorimotor adaptation to a nonuniform formant perturbation generalizes to untrained vowels.},
journal = {Journal of neurophysiology},
volume = {132},
number = {5},
pages = {1437-1444},
pmid = {39356074},
issn = {1522-1598},
support = {P50 HD105353/HD/NICHD NIH HHS/United States ; R01 DC017091/DC/NIDCD NIH HHS/United States ; R01 DC019134/DC/NIDCD NIH HHS/United States ; BCS 2120506//National Science Foundation (NSF)/ ; },
mesh = {Humans ; Male ; Female ; Adult ; *Adaptation, Physiological/physiology ; Young Adult ; *Speech/physiology ; Learning/physiology ; Speech Perception/physiology ; Generalization, Psychological/physiology ; Phonetics ; Feedback, Sensory/physiology ; },
abstract = {When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a nonuniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.NEW & NOTEWORTHY We show that sensorimotor adaptation of vowels at the edges of the articulatory working space generalizes to intermediate vowels through local transfer of learning from adjacent vowels. These results extend findings on the locality of sensorimotor learning from upper limb control to speech, a complex task with an opaque and nonlinear transformation between motor actions and sensory consequences. Our results also suggest that our paradigm has potential to drive behaviorally relevant changes that improve communication effectiveness.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Adult
*Adaptation, Physiological/physiology
Young Adult
*Speech/physiology
Learning/physiology
Speech Perception/physiology
Generalization, Psychological/physiology
Phonetics
Feedback, Sensory/physiology

RevDate: 2024-09-25

Huang T, Wang X, Xu T, et al (2024)

Acoustic Analysis of Mandarin-Speaking Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00291-1 [Epub ahead of print].

OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).

METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.

RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.

CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.

Additional Links: PMID-39322510

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39322510,
year = {2024},
author = {Huang, T and Wang, X and Xu, T and Zhao, W and Cao, Y and Kim, H and Yi, B},
title = {Acoustic Analysis of Mandarin-Speaking Transgender Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.037},
pmid = {39322510},
issn = {1873-4588},
abstract = {OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).

METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.

RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.

CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.},
}

RevDate: 2024-10-09
CmpDate: 2024-09-17

Kim H, Ratkute V, B Epp (2024)

Monaural and binaural masking release with speech-like stimuli.

JASA express letters, 4(9):.

The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.

Additional Links: PMID-39287502

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39287502,
year = {2024},
author = {Kim, H and Ratkute, V and Epp, B},
title = {Monaural and binaural masking release with speech-like stimuli.},
journal = {JASA express letters},
volume = {4},
number = {9},
pages = {},
doi = {10.1121/10.0028736},
pmid = {39287502},
issn = {2691-1191},
mesh = {Humans ; *Perceptual Masking/physiology ; *Speech Perception/physiology ; Adult ; Acoustic Stimulation ; Male ; Female ; Young Adult ; },
abstract = {The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Perceptual Masking/physiology
*Speech Perception/physiology
Adult
Acoustic Stimulation
Male
Female
Young Adult

RevDate: 2024-10-23
CmpDate: 2024-10-03

Chen S, Whalen DH, PPK Mok (2024)

What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.

Phonetica, 81(5):509-552.

Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.

Additional Links: PMID-39279469

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39279469,
year = {2024},
author = {Chen, S and Whalen, DH and Mok, PPK},
title = {What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.},
journal = {Phonetica},
volume = {81},
number = {5},
pages = {509-552},
pmid = {39279469},
issn = {1423-0321},
support = {R01 DC002717/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Phonetics ; *Speech Acoustics ; *Tongue/physiology ; Female ; Male ; China ; *Language ; Adult ; Young Adult ; Speech Production Measurement ; Ultrasonography ; East Asian People ; },
abstract = {Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Phonetics
*Speech Acoustics
*Tongue/physiology
Female
Male
China
*Language
Adult
Young Adult
Speech Production Measurement
Ultrasonography
East Asian People

RevDate: 2024-10-18
CmpDate: 2024-10-08

Thompson A, Y Kim (2024)

Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 67(10):3595-3611.

PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.

RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.

CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.

OPEN SCIENCE FORM: https://doi.org/10.23641/asha.27011281.

Additional Links: PMID-39259883

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39259883,
year = {2024},
author = {Thompson, A and Kim, Y},
title = {Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {10},
pages = {3595-3611},
pmid = {39259883},
issn = {1558-9102},
support = {F31 DC020121/DC/NIDCD NIH HHS/United States ; R03 DC012405/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Parkinson Disease/physiopathology/complications ; *Speech Intelligibility/physiology ; Female ; Male ; Biomechanical Phenomena ; Aged ; *Dysarthria/etiology/physiopathology ; *Speech Acoustics ; Middle Aged ; Speech Production Measurement/methods ; Case-Control Studies ; Phonetics ; },
abstract = {PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.

RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.

CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.

OPEN SCIENCE FORM: https://doi.org/10.23641/asha.27011281.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Parkinson Disease/physiopathology/complications
*Speech Intelligibility/physiology
Female
Male
Biomechanical Phenomena
Aged
*Dysarthria/etiology/physiopathology
*Speech Acoustics
Middle Aged
Speech Production Measurement/methods
Case-Control Studies
Phonetics

RevDate: 2024-09-07

Subrahmanya A, Ranasinghe KG, Kothare H, et al (2024)

Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.

Frontiers in human neuroscience, 18:1424920.

Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.

Additional Links: PMID-39234407

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39234407,
year = {2024},
author = {Subrahmanya, A and Ranasinghe, KG and Kothare, H and Raharjo, I and Kim, KS and Houde, JF and Nagarajan, SS},
title = {Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1424920},
pmid = {39234407},
issn = {1662-5161},
abstract = {Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.},
}

RevDate: 2024-09-01

Vampola T, Horáček J, AM Laukkanen (2024)

Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00248-0 [Epub ahead of print].

This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.

Additional Links: PMID-39218756

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39218756,
year = {2024},
author = {Vampola, T and Horáček, J and Laukkanen, AM},
title = {Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.07.035},
pmid = {39218756},
issn = {1873-4588},
abstract = {This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.},
}

RevDate: 2024-08-31

Figueroa C, Guillén V, Huenupán F, et al (2024)

Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00254-6 [Epub ahead of print].

UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.

OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.

METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.

RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).

CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.

Additional Links: PMID-39217086

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39217086,
year = {2024},
author = {Figueroa, C and Guillén, V and Huenupán, F and Vallejos, C and Henríquez, E and Urrutia, F and Sanhueza, F and Alarcón, E},
title = {Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.006},
pmid = {39217086},
issn = {1873-4588},
abstract = {UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.

OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.

METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.

RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).

CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.},
}

RevDate: 2024-09-04
CmpDate: 2024-08-30

Zaltz Y (2024)

The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.

Trends in hearing, 28:23312165241275895.

Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.

Additional Links: PMID-39212078

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39212078,
year = {2024},
author = {Zaltz, Y},
title = {The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.},
journal = {Trends in hearing},
volume = {28},
number = {},
pages = {23312165241275895},
pmid = {39212078},
issn = {2331-2165},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Perception/physiology ; *Generalization, Psychological ; *Cues ; *Noise/adverse effects ; *Acoustic Stimulation ; Adult ; Recognition, Psychology ; Perceptual Masking ; Adolescent ; Speech Acoustics ; Voice Quality ; Discrimination Learning/physiology ; Voice/physiology ; },
abstract = {Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Young Adult
*Speech Perception/physiology
*Generalization, Psychological
*Cues
*Noise/adverse effects
*Acoustic Stimulation
Adult
Recognition, Psychology
Perceptual Masking
Adolescent
Speech Acoustics
Voice Quality
Discrimination Learning/physiology
Voice/physiology

RevDate: 2024-12-04

Parrell B, Naber C, Kim OA, et al (2024)

Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.

bioRxiv : the preprint server for biology.

Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.

Additional Links: PMID-39185222

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39185222,
year = {2024},
author = {Parrell, B and Naber, C and Kim, OA and Nizolek, CA and McDougle, SD},
title = {Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
pmid = {39185222},
issn = {2692-8205},
support = {R01 DC017091/DC/NIDCD NIH HHS/United States ; R01 DC019134/DC/NIDCD NIH HHS/United States ; R01 NS132926/NS/NINDS NIH HHS/United States ; },
abstract = {Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.},
}

RevDate: 2024-09-08
CmpDate: 2024-09-08

Chan RKW, BX Wang (2024)

Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?.

Forensic science international, 363:112199.

A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.

Additional Links: PMID-39182457

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39182457,
year = {2024},
author = {Chan, RKW and Wang, BX},
title = {Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?.},
journal = {Forensic science international},
volume = {363},
number = {},
pages = {112199},
doi = {10.1016/j.forsciint.2024.112199},
pmid = {39182457},
issn = {1872-6283},
mesh = {Humans ; Male ; *Phonetics ; *Speech Acoustics ; Sound Spectrography ; *Voice Quality ; Adult ; Forensic Sciences/methods ; Middle Aged ; Young Adult ; Signal Processing, Computer-Assisted ; },
abstract = {A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
*Phonetics
*Speech Acoustics
Sound Spectrography
*Voice Quality
Adult
Forensic Sciences/methods
Middle Aged
Young Adult
Signal Processing, Computer-Assisted

RevDate: 2024-08-24
CmpDate: 2024-08-23

Huang L, Yang H, Che Y, et al (2024)

Automatic speech analysis for detecting cognitive decline of older adults.

Frontiers in public health, 12:1417966.

BACKGROUND: Speech analysis has been expected to help as a screening tool for early detection of Alzheimer's disease (AD) and mild-cognitively impairment (MCI). Acoustic features and linguistic features are usually used in speech analysis. However, no studies have yet determined which type of features provides better screening effectiveness, especially in the large aging population of China.

OBJECTIVE: Firstly, to compare the screening effectiveness of acoustic features, linguistic features, and their combination using the same dataset. Secondly, to develop Chinese automated diagnosis model using self-collected natural discourse data obtained from native Chinese speakers.

METHODS: A total of 92 participants from communities in Shanghai, completed MoCA-B and a picture description task based on the Cookie Theft under the guidance of trained operators, and were divided into three groups including AD, MCI, and heathy control (HC) based on their MoCA-B score. Acoustic features (Pitches, Jitter, Shimmer, MFCCs, Formants) and linguistic features (part-of-speech, type-token ratio, information words, information units) are extracted. The machine algorithms used in this study included logistic regression, random forest (RF), support vector machines (SVM), Gaussian Naive Bayesian (GNB), and k-Nearest neighbor (kNN). The validation accuracies of the same ML model using acoustic features, linguistic features, and their combination were compared.

RESULTS: The accuracy with linguistic features is generally higher than acoustic features in training. The highest accuracy to differentiate HC and AD is 80.77% achieved by SVM, based on all the features extracted from the speech data, while the highest accuracy to differentiate HC and AD or MCI is 80.43% achieved by RF, based only on linguistic features.

CONCLUSION: Our results suggest the utility and validity of linguistic features in the automated diagnosis of cognitive impairment, and validated the applicability of automated diagnosis for Chinese language data.

Additional Links: PMID-39175901

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39175901,
year = {2024},
author = {Huang, L and Yang, H and Che, Y and Yang, J},
title = {Automatic speech analysis for detecting cognitive decline of older adults.},
journal = {Frontiers in public health},
volume = {12},
number = {},
pages = {1417966},
pmid = {39175901},
issn = {2296-2565},
mesh = {Humans ; Aged ; Female ; Male ; *Cognitive Dysfunction/diagnosis ; China ; Alzheimer Disease/diagnosis ; Aged, 80 and over ; Speech ; Middle Aged ; Bayes Theorem ; Support Vector Machine ; Algorithms ; },
abstract = {BACKGROUND: Speech analysis has been expected to help as a screening tool for early detection of Alzheimer's disease (AD) and mild-cognitively impairment (MCI). Acoustic features and linguistic features are usually used in speech analysis. However, no studies have yet determined which type of features provides better screening effectiveness, especially in the large aging population of China.

OBJECTIVE: Firstly, to compare the screening effectiveness of acoustic features, linguistic features, and their combination using the same dataset. Secondly, to develop Chinese automated diagnosis model using self-collected natural discourse data obtained from native Chinese speakers.

METHODS: A total of 92 participants from communities in Shanghai, completed MoCA-B and a picture description task based on the Cookie Theft under the guidance of trained operators, and were divided into three groups including AD, MCI, and heathy control (HC) based on their MoCA-B score. Acoustic features (Pitches, Jitter, Shimmer, MFCCs, Formants) and linguistic features (part-of-speech, type-token ratio, information words, information units) are extracted. The machine algorithms used in this study included logistic regression, random forest (RF), support vector machines (SVM), Gaussian Naive Bayesian (GNB), and k-Nearest neighbor (kNN). The validation accuracies of the same ML model using acoustic features, linguistic features, and their combination were compared.

RESULTS: The accuracy with linguistic features is generally higher than acoustic features in training. The highest accuracy to differentiate HC and AD is 80.77% achieved by SVM, based on all the features extracted from the speech data, while the highest accuracy to differentiate HC and AD or MCI is 80.43% achieved by RF, based only on linguistic features.

CONCLUSION: Our results suggest the utility and validity of linguistic features in the automated diagnosis of cognitive impairment, and validated the applicability of automated diagnosis for Chinese language data.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Aged
Female
Male
*Cognitive Dysfunction/diagnosis
China
Alzheimer Disease/diagnosis
Aged, 80 and over
Speech
Middle Aged
Bayes Theorem
Support Vector Machine
Algorithms

RevDate: 2024-08-23

Holmes L, Rieger G, S Paulmann (2024)

The effect of sexual orientation on voice acoustic properties.

Frontiers in psychology, 15:1412372.

INTRODUCTION: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses.

METHODS: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout.

RESULTS: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex.

DISCUSSION: Recommendations for future "natural voice" investigations are outlined.

Additional Links: PMID-39171236

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39171236,
year = {2024},
author = {Holmes, L and Rieger, G and Paulmann, S},
title = {The effect of sexual orientation on voice acoustic properties.},
journal = {Frontiers in psychology},
volume = {15},
number = {},
pages = {1412372},
pmid = {39171236},
issn = {1664-1078},
abstract = {INTRODUCTION: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses.

METHODS: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout.

RESULTS: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex.

DISCUSSION: Recommendations for future "natural voice" investigations are outlined.},
}

RevDate: 2024-08-24
CmpDate: 2024-08-24

Goncharova M, Jadoul Y, Reichmuth C, et al (2024)

Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.

Annals of the New York Academy of Sciences, 1538(1):107-116.

Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures-such as jaw, tongue, lip, and soft palate movements-shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame-by-frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production.

Additional Links: PMID-39091036

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39091036,
year = {2024},
author = {Goncharova, M and Jadoul, Y and Reichmuth, C and Fitch, WT and Ravignani, A},
title = {Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.},
journal = {Annals of the New York Academy of Sciences},
volume = {1538},
number = {1},
pages = {107-116},
doi = {10.1111/nyas.15189},
pmid = {39091036},
issn = {1749-6632},
support = {(#W1262-B29)//Austrian Science Foundation Grant/ ; DNRF117//Danmarks Grundforskningsfond/ ; N00014-04-1-0284//Office of Naval Research/ ; Independent Max Planck Research Group Leader funding//Max-Planck-Gesellschaft/ ; Advanced Grant SOMACCA/ERC_/European Research Council/International ; },
mesh = {Animals ; *Vocalization, Animal/physiology ; Male ; Tongue/physiology ; Jaw/physiology/anatomy & histology ; Phocoena/physiology ; Humans ; },
abstract = {Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures-such as jaw, tongue, lip, and soft palate movements-shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame-by-frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
*Vocalization, Animal/physiology
Male
Tongue/physiology
Jaw/physiology/anatomy & histology
Phocoena/physiology
Humans

RevDate: 2024-08-02

Dorman MF, Natale SC, Stohl JS, et al (2024)

Close approximations to the sound of a cochlear implant.

Frontiers in human neuroscience, 18:1434786.

Cochlear implant (CI) systems differ in terms of electrode design and signal processing. It is likely that patients fit with different implant systems will experience different percepts when presented speech via their implant. The sound quality of speech can be evaluated by asking single-sided-deaf (SSD) listeners fit with a cochlear implant (CI) to modify clean signals presented to their typically hearing ear to match the sound quality of signals presented to their CI ear. In this paper, we describe very close matches to CI sound quality, i.e., similarity ratings of 9.5 to 10 on a 10-point scale, by ten patients fit with a 28 mm electrode array and MED EL signal processing. The modifications required to make close approximations to CI sound quality fell into two groups: One consisted of a restricted frequency bandwidth and spectral smearing while a second was characterized by a wide bandwidth and no spectral smearing. Both sets of modifications were different from those found for patients with shorter electrode arrays who chose upshifts in voice pitch and formant frequencies to match CI sound quality. The data from matching-based metrics of CI sound quality document that speech sound-quality differs for patients fit with different CIs and among patients fit with the same CI.

Additional Links: PMID-39086377

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39086377,
year = {2024},
author = {Dorman, MF and Natale, SC and Stohl, JS and Felder, J},
title = {Close approximations to the sound of a cochlear implant.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1434786},
pmid = {39086377},
issn = {1662-5161},
abstract = {Cochlear implant (CI) systems differ in terms of electrode design and signal processing. It is likely that patients fit with different implant systems will experience different percepts when presented speech via their implant. The sound quality of speech can be evaluated by asking single-sided-deaf (SSD) listeners fit with a cochlear implant (CI) to modify clean signals presented to their typically hearing ear to match the sound quality of signals presented to their CI ear. In this paper, we describe very close matches to CI sound quality, i.e., similarity ratings of 9.5 to 10 on a 10-point scale, by ten patients fit with a 28 mm electrode array and MED EL signal processing. The modifications required to make close approximations to CI sound quality fell into two groups: One consisted of a restricted frequency bandwidth and spectral smearing while a second was characterized by a wide bandwidth and no spectral smearing. Both sets of modifications were different from those found for patients with shorter electrode arrays who chose upshifts in voice pitch and formant frequencies to match CI sound quality. The data from matching-based metrics of CI sound quality document that speech sound-quality differs for patients fit with different CIs and among patients fit with the same CI.},
}

RevDate: 2024-07-27

Bonacina S, Krizman J, Farley J, et al (2024)

Persistent post-concussion symptoms include neural auditory processing in young children.

Concussion (London, England), 9(1):CNC114.

AIM: Difficulty understanding speech following concussion is likely caused by auditory processing impairments. We hypothesized that concussion disrupts pitch and phonetic processing of a sound, cues in understanding a talker.

We obtained frequency following responses to a syllable from 120 concussed and 120 control. Encoding of the fundamental frequency (F0), a pitch cue and the first formant (F1), a phonetic cue, was poorer in concussed children. The F0 reduction was greater in the children assessed within 2 weeks of their injuries.

CONCLUSION: Concussions affect auditory processing. Results strengthen evidence of reduced F0 encoding in children with concussion and call for longitudinal study aimed at monitoring the recovery course with respect to the auditory system.

Additional Links: PMID-39056002

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39056002,
year = {2024},
author = {Bonacina, S and Krizman, J and Farley, J and Nicol, T and LaBella, CR and Kraus, N},
title = {Persistent post-concussion symptoms include neural auditory processing in young children.},
journal = {Concussion (London, England)},
volume = {9},
number = {1},
pages = {CNC114},
pmid = {39056002},
issn = {2056-3299},
abstract = {AIM: Difficulty understanding speech following concussion is likely caused by auditory processing impairments. We hypothesized that concussion disrupts pitch and phonetic processing of a sound, cues in understanding a talker.

We obtained frequency following responses to a syllable from 120 concussed and 120 control. Encoding of the fundamental frequency (F0), a pitch cue and the first formant (F1), a phonetic cue, was poorer in concussed children. The F0 reduction was greater in the children assessed within 2 weeks of their injuries.

CONCLUSION: Concussions affect auditory processing. Results strengthen evidence of reduced F0 encoding in children with concussion and call for longitudinal study aimed at monitoring the recovery course with respect to the auditory system.},
}

RevDate: 2024-11-03

Li JJ, Daliri A, Kim KS, et al (2024)

Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.

bioRxiv : the preprint server for biology.

Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production and its possible relation to PSAM at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).

Additional Links: PMID-39026879

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39026879,
year = {2024},
author = {Li, JJ and Daliri, A and Kim, KS and Max, L},
title = {Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
pmid = {39026879},
issn = {2692-8205},
support = {R01 DC007603/DC/NIDCD NIH HHS/United States ; R01 DC017444/DC/NIDCD NIH HHS/United States ; R01 DC020707/DC/NIDCD NIH HHS/United States ; R01 DC014510/DC/NIDCD NIH HHS/United States ; R01 DC020162/DC/NIDCD NIH HHS/United States ; },
abstract = {Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production and its possible relation to PSAM at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).},
}

RevDate: 2024-10-08

Doyle KA, Harel D, Feeny GT, et al (2024)

Word and Gender Identification in the Speech of Transgender Individuals.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00178-4 [Epub ahead of print].

Listeners use speech to identify both linguistic information, such as the word being produced, and indexical attributes, such as the gender of the speaker. Previous research has shown that these two aspects of speech perception are interrelated. It is important to understand this relationship in the context of gender-affirming voice training (GAVT), where changes in speech production as part of a speaker's gender-affirming care could potentially influence listeners' recognition of the intended utterance. This study conducted a secondary analysis of data from an experiment in which trans women matched shifted targets for the second formant frequency using visual-acoustic biofeedback. Utterances were synthetically altered to feature a gender-ambiguous fundamental frequency and were presented to blinded listeners for rating on a visual analog scale representing the gender spectrum, as well as word identification in a forced-choice task. We found a statistically significant association between the accuracy of word identification and the gender rating of utterances. However, there was no statistically significant difference in word identification accuracy for the formant-shifted conditions relative to an unshifted condition. Overall, these results support previous research in finding that word identification and speaker gender identification are interrelated processes; however, the findings also suggest that a small magnitude of shift in formant frequencies (of the type that might be pursued in a GAVT context) does not have a significant negative impact on the perceptual recoverability of isolated words.

Additional Links: PMID-39019670

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39019670,
year = {2024},
author = {Doyle, KA and Harel, D and Feeny, GT and Novak, VD and McAllister, T},
title = {Word and Gender Identification in the Speech of Transgender Individuals.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.06.007},
pmid = {39019670},
issn = {1873-4588},
support = {R21 DC021537/DC/NIDCD NIH HHS/United States ; },
abstract = {Listeners use speech to identify both linguistic information, such as the word being produced, and indexical attributes, such as the gender of the speaker. Previous research has shown that these two aspects of speech perception are interrelated. It is important to understand this relationship in the context of gender-affirming voice training (GAVT), where changes in speech production as part of a speaker's gender-affirming care could potentially influence listeners' recognition of the intended utterance. This study conducted a secondary analysis of data from an experiment in which trans women matched shifted targets for the second formant frequency using visual-acoustic biofeedback. Utterances were synthetically altered to feature a gender-ambiguous fundamental frequency and were presented to blinded listeners for rating on a visual analog scale representing the gender spectrum, as well as word identification in a forced-choice task. We found a statistically significant association between the accuracy of word identification and the gender rating of utterances. However, there was no statistically significant difference in word identification accuracy for the formant-shifted conditions relative to an unshifted condition. Overall, these results support previous research in finding that word identification and speaker gender identification are interrelated processes; however, the findings also suggest that a small magnitude of shift in formant frequencies (of the type that might be pursued in a GAVT context) does not have a significant negative impact on the perceptual recoverability of isolated words.},
}

RevDate: 2024-10-24
CmpDate: 2024-07-10

Lorenzoni DC, Henriques JFC, Silva LKD, et al (2024)

Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.

Dental press journal of orthodontics, 29(3):e2423277.

OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech.

MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison.

RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks.

CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.

Additional Links: PMID-38985077

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38985077,
year = {2024},
author = {Lorenzoni, DC and Henriques, JFC and Silva, LKD and Rosa, RR and Berretin-Felix, G and Freitas, KMS and Janson, G},
title = {Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.},
journal = {Dental press journal of orthodontics},
volume = {29},
number = {3},
pages = {e2423277},
pmid = {38985077},
issn = {2177-6709},
mesh = {Humans ; *Orthodontic Retainers ; Female ; Male ; Adult ; *Cross-Over Studies ; Orthodontic Appliance Design ; Young Adult ; Speech/physiology ; },
abstract = {OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech.

MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison.

RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks.

CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Orthodontic Retainers
Female
Male
Adult
*Cross-Over Studies
Orthodontic Appliance Design
Young Adult
Speech/physiology

RevDate: 2024-08-07

Liu B, Lei J, Wischhoff OP, et al (2024)

Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000540255 [Epub ahead of print].

INTRODUCTION: Benign and malignant vocal fold lesions (VFLs) are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aimed to determine the important acoustic characteristics specific to benign and malignant VFLs.

METHODS: The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose, and Throat Hospital of Fudan University between May 2020 and July 2021. The acoustic variables of the participants were analyzed using Principal Component Analysis (PCA) to present important acoustic characteristics that are specific to normal vocal folds, benign VFLs, and malignant VFLs. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia.

RESULTS: Using the PCA method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch (F0) were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion (FD) as a weighted factor in PCA was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics (SHR) and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch (F0) and coefficients of variation for FD were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia.

CONCLUSION: Experimental evidence demonstrates the utility of the PCA method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the PCA analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.

Additional Links: PMID-38981448

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38981448,
year = {2024},
author = {Liu, B and Lei, J and Wischhoff, OP and Smereka, KA and Jiang, JJ},
title = {Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices.},
journal = {Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP)},
volume = {},
number = {},
pages = {1-10},
doi = {10.1159/000540255},
pmid = {38981448},
issn = {1421-9972},
abstract = {INTRODUCTION: Benign and malignant vocal fold lesions (VFLs) are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aimed to determine the important acoustic characteristics specific to benign and malignant VFLs.

METHODS: The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose, and Throat Hospital of Fudan University between May 2020 and July 2021. The acoustic variables of the participants were analyzed using Principal Component Analysis (PCA) to present important acoustic characteristics that are specific to normal vocal folds, benign VFLs, and malignant VFLs. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia.

RESULTS: Using the PCA method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch (F0) were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion (FD) as a weighted factor in PCA was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics (SHR) and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch (F0) and coefficients of variation for FD were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia.

CONCLUSION: Experimental evidence demonstrates the utility of the PCA method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the PCA analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.},
}

RevDate: 2024-07-04
CmpDate: 2024-07-02

Fletcher MD, Akis E, Verschuur CA, et al (2024)

Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion.

Scientific reports, 14(1):15029.

Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.

Additional Links: PMID-38951556

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38951556,
year = {2024},
author = {Fletcher, MD and Akis, E and Verschuur, CA and Perry, SW},
title = {Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {15029},
pmid = {38951556},
issn = {2045-2322},
support = {EP/W032422/1//Engineering and Physical Sciences Research Council/ ; EP/T517859/1//Engineering and Physical Sciences Research Council/ ; },
mesh = {Humans ; *Speech Perception/physiology ; Male ; Female ; Adult ; *Noise ; *Hearing Aids ; Young Adult ; Touch/physiology ; Acoustic Stimulation/methods ; Touch Perception/physiology ; Hearing Loss/physiopathology ; },
abstract = {Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Perception/physiology
Male
Female
Adult
*Noise
*Hearing Aids
Young Adult
Touch/physiology
Acoustic Stimulation/methods
Touch Perception/physiology
Hearing Loss/physiopathology

RevDate: 2024-06-26

Sahoo AK, Sahoo PK, Gupta V, et al (2024)

Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.

Cureus, 16(5):e60873.

Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.

Additional Links: PMID-38916010

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38916010,
year = {2024},
author = {Sahoo, AK and Sahoo, PK and Gupta, V and Behera, G and Sidam, S and Mishra, UP and Chavan, A and Binu, R and Gour, S and Velayutham, DK and Pooja, and Chatterjee, T and Pal, D},
title = {Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.},
journal = {Cureus},
volume = {16},
number = {5},
pages = {e60873},
pmid = {38916010},
issn = {2168-8184},
abstract = {Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.},
}

RevDate: 2024-06-18

Xiu N, Li W, Liu L, et al (2024)

A Study on Voice Measures in Patients with Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00168-1 [Epub ahead of print].

PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers.

METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features.

RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers.

CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.

Additional Links: PMID-38890016

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38890016,
year = {2024},
author = {Xiu, N and Li, W and Liu, L and Liu, Z and Cai, Z and Li, L and Vaxelaire, B and Sock, R and Ling, Z and Chen, J and Wang, Y},
title = {A Study on Voice Measures in Patients with Parkinson's Disease.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.05.018},
pmid = {38890016},
issn = {1873-4588},
abstract = {PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers.

METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features.

RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers.

CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.},
}

RevDate: 2024-07-13
CmpDate: 2024-07-13

Weirich M, Simpson AP, N Knutti (2024)

Effects of testosterone on speech production and perception: Linking hormone levels in males to vocal cues and female voice attractiveness ratings.

Physiology & behavior, 283:114615.

This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.

Additional Links: PMID-38880296

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38880296,
year = {2024},
author = {Weirich, M and Simpson, AP and Knutti, N},
title = {Effects of testosterone on speech production and perception: Linking hormone levels in males to vocal cues and female voice attractiveness ratings.},
journal = {Physiology & behavior},
volume = {283},
number = {},
pages = {114615},
doi = {10.1016/j.physbeh.2024.114615},
pmid = {38880296},
issn = {1873-507X},
mesh = {Humans ; *Testosterone/metabolism/pharmacology ; Male ; Adult ; Young Adult ; *Saliva/metabolism/chemistry ; *Hydrocortisone/metabolism ; *Speech Perception/physiology/drug effects ; *Speech/physiology/drug effects ; *Voice/drug effects ; *Cues ; Female ; Beauty ; Acoustic Stimulation ; },
abstract = {This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Testosterone/metabolism/pharmacology
Male
Adult
Young Adult
*Saliva/metabolism/chemistry
*Hydrocortisone/metabolism
*Speech Perception/physiology/drug effects
*Speech/physiology/drug effects
*Voice/drug effects
*Cues
Female
Beauty
Acoustic Stimulation

RevDate: 2024-06-09

Krupić F, Moravcova M, Dervišević E, et al (2024)

When time does not heal all wounds: three decades' experience of immigrants living in Sweden.

Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina, 21(2): [Epub ahead of print].

AIM: To investigate how immigrants from the Balkan region experienced their current life situation after living in Sweden for 30 years or more.

MATERIALS: The study was designed as a qualitative study using data from interviews with informants from five Balkan countries. The inclusion criteria were informants who were immigrants to Sweden and had lived in Sweden for more than 30 years. Five groups comprising sixteen informants were invited to participate in the study, and they all agreed.

RESULTS: The analysis of the interviews resulted in three main categories: "from someone to no one", "labour market", and "discrimination". All the informants reported that having an education and life experience was worth-less, having a life but having to start over, re-educating, applying for many jobs but often not being answered, and finally getting a job for which every in-formant was educated but being humiliated every day and treated separately as well as being discriminated against.

CONCLUSION: Coming to Sweden with all their problems, having an education and work experience that was equal to zero in Sweden, studying Swedish and re-reading/repeating all their education, looking for a job and not receiving answers to applications, and finally getting a job but being treated differently and discriminated against on a daily basis was experienced by all the in-formants as terrible. Even though there are enough similar studies in Sweden, it is always good to write more to help prospective immigrants and prospective employers in Sweden.

Additional Links: PMID-38852197

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38852197,
year = {2024},
author = {Krupić, F and Moravcova, M and Dervišević, E and Čustović, S and Grbić, K and Lindström, P},
title = {When time does not heal all wounds: three decades' experience of immigrants living in Sweden.},
journal = {Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina},
volume = {21},
number = {2},
pages = {},
doi = {10.17392/1696-21-02},
pmid = {38852197},
issn = {1840-2445},
abstract = {AIM: To investigate how immigrants from the Balkan region experienced their current life situation after living in Sweden for 30 years or more.

MATERIALS: The study was designed as a qualitative study using data from interviews with informants from five Balkan countries. The inclusion criteria were informants who were immigrants to Sweden and had lived in Sweden for more than 30 years. Five groups comprising sixteen informants were invited to participate in the study, and they all agreed.

RESULTS: The analysis of the interviews resulted in three main categories: "from someone to no one", "labour market", and "discrimination". All the informants reported that having an education and life experience was worth-less, having a life but having to start over, re-educating, applying for many jobs but often not being answered, and finally getting a job for which every in-formant was educated but being humiliated every day and treated separately as well as being discriminated against.

CONCLUSION: Coming to Sweden with all their problems, having an education and work experience that was equal to zero in Sweden, studying Swedish and re-reading/repeating all their education, looking for a job and not receiving answers to applications, and finally getting a job but being treated differently and discriminated against on a daily basis was experienced by all the in-formants as terrible. Even though there are enough similar studies in Sweden, it is always good to write more to help prospective immigrants and prospective employers in Sweden.},
}

RevDate: 2024-06-07

Mittapalle KR, P Alku (2024)

Classification of phonation types in singing voice using wavelet scattering network-based features.

JASA express letters, 4(6):.

The automatic classification of phonation types in singing voice is essential for tasks such as identification of singing style. In this study, it is proposed to use wavelet scattering network (WSN)-based features for classification of phonation types in singing voice. WSN, which has a close similarity with auditory physiological models, generates acoustic features that greatly characterize the information related to pitch, formants, and timbre. Hence, the WSN-based features can effectively capture the discriminative information across phonation types in singing voice. The experimental results show that the proposed WSN-based features improved phonation classification accuracy by at least 9% compared to state-of-the-art features.

Additional Links: PMID-38847582

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38847582,
year = {2024},
author = {Mittapalle, KR and Alku, P},
title = {Classification of phonation types in singing voice using wavelet scattering network-based features.},
journal = {JASA express letters},
volume = {4},
number = {6},
pages = {},
doi = {10.1121/10.0026241},
pmid = {38847582},
issn = {2691-1191},
abstract = {The automatic classification of phonation types in singing voice is essential for tasks such as identification of singing style. In this study, it is proposed to use wavelet scattering network (WSN)-based features for classification of phonation types in singing voice. WSN, which has a close similarity with auditory physiological models, generates acoustic features that greatly characterize the information related to pitch, formants, and timbre. Hence, the WSN-based features can effectively capture the discriminative information across phonation types in singing voice. The experimental results show that the proposed WSN-based features improved phonation classification accuracy by at least 9% compared to state-of-the-art features.},
}

RevDate: 2024-06-07

Gorina-Careta N, Arenillas-Alcón S, Puertollano M, et al (2024)

Exposure to bilingual or monolingual maternal speech during pregnancy affects the neurophysiological encoding of speech sounds in neonates differently.

Frontiers in human neuroscience, 18:1379660.

INTRODUCTION: Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother's voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus' neural mechanisms underlying speech sound encoding.

METHODS: In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 129 healthy term neonates within 1 to 3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F0) and first formant (F1) frequencies of each vowel were, respectively, taken as measures of pitch and formant structure neural encoding.

RESULTS: Our results reveal that while spectral amplitudes at F0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F1 frequencies.

DISCUSSION: We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.

Additional Links: PMID-38841122

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38841122,
year = {2024},
author = {Gorina-Careta, N and Arenillas-Alcón, S and Puertollano, M and Mondéjar-Segovia, A and Ijjou-Kadiri, S and Costa-Faidella, J and Gómez-Roig, MD and Escera, C},
title = {Exposure to bilingual or monolingual maternal speech during pregnancy affects the neurophysiological encoding of speech sounds in neonates differently.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1379660},
pmid = {38841122},
issn = {1662-5161},
abstract = {INTRODUCTION: Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother's voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus' neural mechanisms underlying speech sound encoding.

METHODS: In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 129 healthy term neonates within 1 to 3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F0) and first formant (F1) frequencies of each vowel were, respectively, taken as measures of pitch and formant structure neural encoding.

RESULTS: Our results reveal that while spectral amplitudes at F0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F1 frequencies.

DISCUSSION: We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.},
}

RevDate: 2024-07-09
CmpDate: 2024-07-09

Wu HY (2024)

Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach.

Journal of speech, language, and hearing research : JSLHR, 67(7):2021-2037.

PURPOSE: This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender.

METHOD: Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models.

RESULTS: Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception.

CONCLUSIONS: The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.

Additional Links: PMID-38820240

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38820240,
year = {2024},
author = {Wu, HY},
title = {Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {7},
pages = {2021-2037},
doi = {10.1044/2024_JSLHR-23-00288},
pmid = {38820240},
issn = {1558-9102},
mesh = {Humans ; Female ; Male ; *Speech Acoustics ; Young Adult ; Adult ; *Deception ; *Language ; Glottis/physiology ; Sex Factors ; China ; Electrodiagnosis ; },
abstract = {PURPOSE: This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender.

METHOD: Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models.

RESULTS: Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception.

CONCLUSIONS: The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Female
Male
*Speech Acoustics
Young Adult
Adult
*Deception
*Language
Glottis/physiology
Sex Factors
China
Electrodiagnosis

RevDate: 2024-05-24

Neuhaus TJ, Scherer RC, JA Whitfield (2024)

Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00016-X [Epub ahead of print].

OBJECTIVE: To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender.

METHODS: Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment.

RESULTS: Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short.

CONCLUSIONS: The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.

Additional Links: PMID-38789366

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38789366,
year = {2024},
author = {Neuhaus, TJ and Scherer, RC and Whitfield, JA},
title = {Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.01.014},
pmid = {38789366},
issn = {1873-4588},
abstract = {OBJECTIVE: To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender.

METHODS: Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment.

RESULTS: Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short.

CONCLUSIONS: The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.},
}

RevDate: 2024-05-26
CmpDate: 2024-05-23

Balolia KL, PL Fitzgerald (2024)

Male proboscis monkey cranionasal size and shape is associated with visual and acoustic signalling.

Scientific reports, 14(1):10715.

The large nose adorned by adult male proboscis monkeys is hypothesised to serve as an audiovisual signal of sexual selection. It serves as a visual signal of male quality and social status, and as an acoustic signal, through the expression of loud, low-formant nasalised calls in dense rainforests, where visibility is poor. However, it is unclear how the male proboscis monkey nasal complex, including the internal structure of the nose, plays a role in visual or acoustic signalling. Here, we use cranionasal data to assess whether large noses found in male proboscis monkeys serve visual and/or acoustic signalling functions. Our findings support a visual signalling function for male nasal enlargement through a relatively high degree of nasal aperture sexual size dimorphism, the craniofacial region to which nasal soft tissue attaches. We additionally find nasal aperture size increases beyond dental maturity among male proboscis monkeys, consistent with the visual signalling hypothesis. We show that the cranionasal region has an acoustic signalling role through pronounced nasal cavity sexual shape dimorphism, wherein male nasal cavity shape allows the expression of loud, low-formant nasalised calls. Our findings provide robust support for the male proboscis monkey nasal complex serving both visual and acoustic functions.

Additional Links: PMID-38782960

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38782960,
year = {2024},
author = {Balolia, KL and Fitzgerald, PL},
title = {Male proboscis monkey cranionasal size and shape is associated with visual and acoustic signalling.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {10715},
pmid = {38782960},
issn = {2045-2322},
mesh = {Animals ; Male ; *Sex Characteristics ; Nasal Cavity/anatomy & histology/physiology ; Nose/anatomy & histology ; Animal Communication ; Acoustics ; Skull/anatomy & histology ; Vocalization, Animal/physiology ; Female ; },
abstract = {The large nose adorned by adult male proboscis monkeys is hypothesised to serve as an audiovisual signal of sexual selection. It serves as a visual signal of male quality and social status, and as an acoustic signal, through the expression of loud, low-formant nasalised calls in dense rainforests, where visibility is poor. However, it is unclear how the male proboscis monkey nasal complex, including the internal structure of the nose, plays a role in visual or acoustic signalling. Here, we use cranionasal data to assess whether large noses found in male proboscis monkeys serve visual and/or acoustic signalling functions. Our findings support a visual signalling function for male nasal enlargement through a relatively high degree of nasal aperture sexual size dimorphism, the craniofacial region to which nasal soft tissue attaches. We additionally find nasal aperture size increases beyond dental maturity among male proboscis monkeys, consistent with the visual signalling hypothesis. We show that the cranionasal region has an acoustic signalling role through pronounced nasal cavity sexual shape dimorphism, wherein male nasal cavity shape allows the expression of loud, low-formant nasalised calls. Our findings provide robust support for the male proboscis monkey nasal complex serving both visual and acoustic functions.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
Male
*Sex Characteristics
Nasal Cavity/anatomy & histology/physiology
Nose/anatomy & histology
Animal Communication
Acoustics
Skull/anatomy & histology
Vocalization, Animal/physiology
Female

RevDate: 2024-07-26
CmpDate: 2024-07-24

Beach SD, CA Niziolek (2024)

Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task.

Cognitive neuropsychology, 41(1-2):51-69.

How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.

Additional Links: PMID-38778635

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38778635,
year = {2024},
author = {Beach, SD and Niziolek, CA},
title = {Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task.},
journal = {Cognitive neuropsychology},
volume = {41},
number = {1-2},
pages = {51-69},
pmid = {38778635},
issn = {1464-0627},
support = {R00 DC014520/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Stroop Test ; *Inhibition, Psychological ; Male ; *Speech/physiology ; Female ; *Reaction Time/physiology ; Adult ; Young Adult ; Reading ; Phonetics ; Attention/physiology ; },
abstract = {How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Stroop Test
*Inhibition, Psychological
Male
*Speech/physiology
Female
*Reaction Time/physiology
Adult
Young Adult
Reading
Phonetics
Attention/physiology

RevDate: 2024-05-16

Aaen M, C Sadolin (2024)

Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00117-6 [Epub ahead of print].

UNLABELLED: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.

METHODS: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.

RESULTS: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.

CONCLUSION: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.

Additional Links: PMID-38755075

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38755075,
year = {2024},
author = {Aaen, M and Sadolin, C},
title = {Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.03.039},
pmid = {38755075},
issn = {1873-4588},
abstract = {UNLABELLED: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.

METHODS: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.

RESULTS: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.

CONCLUSION: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.},
}

RevDate: 2024-06-06
CmpDate: 2024-06-06

Ning LH, TC Hui (2024)

The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.

Journal of speech, language, and hearing research : JSLHR, 67(6):1731-1751.

PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study.

METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /ɛ/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains.

RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch.

CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.

Additional Links: PMID-38754028

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38754028,
year = {2024},
author = {Ning, LH and Hui, TC},
title = {The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {6},
pages = {1731-1751},
doi = {10.1044/2024_JSLHR-23-00543},
pmid = {38754028},
issn = {1558-9102},
mesh = {Humans ; Male ; Female ; Young Adult ; *Pitch Perception/physiology ; Adult ; *Bayes Theorem ; Speech Perception/physiology ; Loudness Perception/physiology ; Feedback, Sensory/physiology ; Voice/physiology ; Acoustic Stimulation/methods ; Speech Acoustics ; },
abstract = {PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study.

METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /ɛ/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains.

RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch.

CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Young Adult
*Pitch Perception/physiology
Adult
*Bayes Theorem
Speech Perception/physiology
Loudness Perception/physiology
Feedback, Sensory/physiology
Voice/physiology
Acoustic Stimulation/methods
Speech Acoustics

RevDate: 2024-07-18
CmpDate: 2024-07-17

Ekström AG (2024)

Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022).

American journal of primatology, 86(8):e23637.

The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.

Additional Links: PMID-38741274

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38741274,
year = {2024},
author = {Ekström, AG},
title = {Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022).},
journal = {American journal of primatology},
volume = {86},
number = {8},
pages = {e23637},
doi = {10.1002/ajp.23637},
pmid = {38741274},
issn = {1098-2345},
mesh = {Animals ; *Vocalization, Animal ; *Phonetics ; *Primates/physiology/anatomy & histology ; Humans ; History, 20th Century ; Speech/physiology ; Biological Evolution ; },
abstract = {The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
*Vocalization, Animal
*Phonetics
*Primates/physiology/anatomy & histology
Humans
History, 20th Century
Speech/physiology
Biological Evolution

RevDate: 2024-05-14

Cao S, Rosenzweig I, Bilotta F, et al (2024)

Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review.

Journal of thoracic disease, 16(4):2654-2667.

BACKGROUND AND OBJECTIVE: Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach.

METHODS: PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed.

KEY CONTENT AND FINDINGS: Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA.

CONCLUSIONS: Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.

Additional Links: PMID-38738242

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38738242,
year = {2024},
author = {Cao, S and Rosenzweig, I and Bilotta, F and Jiang, H and Xia, M},
title = {Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review.},
journal = {Journal of thoracic disease},
volume = {16},
number = {4},
pages = {2654-2667},
pmid = {38738242},
issn = {2072-1439},
abstract = {BACKGROUND AND OBJECTIVE: Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach.

METHODS: PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed.

KEY CONTENT AND FINDINGS: Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA.

CONCLUSIONS: Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.},
}

RevDate: 2024-09-19
CmpDate: 2024-05-08

Feng H, L Wang (2024)

Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.

The Journal of the Acoustical Society of America, 155(5):3071-3089.

This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.

Additional Links: PMID-38717213

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38717213,
year = {2024},
author = {Feng, H and Wang, L},
title = {Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {5},
pages = {3071-3089},
doi = {10.1121/10.0025931},
pmid = {38717213},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; Male ; Female ; *Multilingualism ; Young Adult ; Speech Production Measurement ; Adult ; Language ; Acoustics ; Learning ; Voice Quality ; Sound Spectrography ; East Asian People ; },
abstract = {This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
*Phonetics
Male
Female
*Multilingualism
Young Adult
Speech Production Measurement
Adult
Language
Acoustics
Learning
Voice Quality
Sound Spectrography
East Asian People

RevDate: 2024-05-10
CmpDate: 2024-05-07

Ostrega J, Shiramizu V, Lee AJ, et al (2024)

No evidence that averaging voices influences attractiveness.

Scientific reports, 14(1):10488.

Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.

Additional Links: PMID-38714709

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38714709,
year = {2024},
author = {Ostrega, J and Shiramizu, V and Lee, AJ and Jones, BC and Feinberg, DR},
title = {No evidence that averaging voices influences attractiveness.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {10488},
pmid = {38714709},
issn = {2045-2322},
support = {EP/T023783/1//Engineering and Physical Sciences Research Council/ ; RGPIN-2023-05146//Natural Sciences and Engineering Research Council of Canada/ ; },
mesh = {Humans ; Male ; Female ; *Voice/physiology ; Adult ; Young Adult ; *Beauty ; Judgment/physiology ; Adolescent ; },
abstract = {Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
*Voice/physiology
Adult
Young Adult
*Beauty
Judgment/physiology
Adolescent

RevDate: 2024-05-04

Leyns C, Adriaansen A, Daelman J, et al (2024)

Long-term Acoustic Effects of Gender-Affirming Voice Training in Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00123-1 [Epub ahead of print].

OBJECTIVES: One role of a speech-language pathologist (SLP) is to help transgender clients in developing a healthy, gender-congruent communication. Transgender women frequently approach SLPs to train their voices to sound more feminine, however, long-term acoustic effects of the training needs to be rigorously examined in effectiveness studies. The aim of this study was to investigate the long-term effects (follow-up 1: 3months and follow-up 2: 1year after last session) of gender-affirming voice training for transgender women, in terms of acoustic parameters.

STUDY DESIGN: This study was a randomized sham-controlled trial with a cross-over design.

METHODS: Twenty-six transgender women were included for follow-up 1 and 18 for follow-up 2. All participants received 14weeks of gender-affirming voice training (4weeks sham training, 10weeks of voice feminization training: 5weeks pitch elevation training and 5weeks articulation-resonance training), but in a different order. Speech samples were recorded with Praat at four different time points (pre, post, follow-up 1, follow-up 2). Acoustic analysis included fo of sustained vowel /a:/, reading and spontaneous speech. Formant frequencies (F1-F2-F3) of vowels /a/, /i/, and /u/ were determined and vowel space was calculated. A linear mixed model was used to compare the acoustic voice measurements between measurements (pre - post, pre - follow-up 1, pre - follow-up 2, post - follow-up 1, post - follow-up 2, follow-up 1 - follow-up 2).

RESULTS: Most of the fo measurements and formant frequencies that increased immediately after the intervention, were stable at both follow-up measurements. The median fo during the sustained vowel, reading and spontaneous speech stayed increased at both follow-ups compared to the pre-measurement. However, a decrease of 16 Hz/1.7 ST (reading) and 12 Hz/1.5 ST (spontaneous speech) was detected between the post-measurement (169 Hz for reading, 144 Hz for spontaneous speech) and 1year after the last session (153 Hz and 132 Hz, respectively). The lower limit of fo did not change during reading and spontaneous speech, both directly after the intervention and during both follow-ups. F1-2 of vowel /a/ and the vowel space increased after the intervention and both follow-ups. Individual analyses showed that more aspects should be controlled after the intervention, such as exercises that were performed at home, or the duration of extra gender-affirming voice training sessions.

CONCLUSIONS: After 10 sessions of voice feminization training and follow-up measurements after 3months and 1year, stable increases were found for some formant frequencies and fo measurements, but not all of them. More time should be spent on increasing the fifth percentile of fo, as the lower limit of fo also contributes to the perception of more feminine voice.

Additional Links: PMID-38704279

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38704279,
year = {2024},
author = {Leyns, C and Adriaansen, A and Daelman, J and Bostyn, L and Meerschman, I and T'Sjoen, G and D'haeseleer, E},
title = {Long-term Acoustic Effects of Gender-Affirming Voice Training in Transgender Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.04.007},
pmid = {38704279},
issn = {1873-4588},
abstract = {OBJECTIVES: One role of a speech-language pathologist (SLP) is to help transgender clients in developing a healthy, gender-congruent communication. Transgender women frequently approach SLPs to train their voices to sound more feminine, however, long-term acoustic effects of the training needs to be rigorously examined in effectiveness studies. The aim of this study was to investigate the long-term effects (follow-up 1: 3months and follow-up 2: 1year after last session) of gender-affirming voice training for transgender women, in terms of acoustic parameters.

STUDY DESIGN: This study was a randomized sham-controlled trial with a cross-over design.

METHODS: Twenty-six transgender women were included for follow-up 1 and 18 for follow-up 2. All participants received 14weeks of gender-affirming voice training (4weeks sham training, 10weeks of voice feminization training: 5weeks pitch elevation training and 5weeks articulation-resonance training), but in a different order. Speech samples were recorded with Praat at four different time points (pre, post, follow-up 1, follow-up 2). Acoustic analysis included fo of sustained vowel /a:/, reading and spontaneous speech. Formant frequencies (F1-F2-F3) of vowels /a/, /i/, and /u/ were determined and vowel space was calculated. A linear mixed model was used to compare the acoustic voice measurements between measurements (pre - post, pre - follow-up 1, pre - follow-up 2, post - follow-up 1, post - follow-up 2, follow-up 1 - follow-up 2).

RESULTS: Most of the fo measurements and formant frequencies that increased immediately after the intervention, were stable at both follow-up measurements. The median fo during the sustained vowel, reading and spontaneous speech stayed increased at both follow-ups compared to the pre-measurement. However, a decrease of 16 Hz/1.7 ST (reading) and 12 Hz/1.5 ST (spontaneous speech) was detected between the post-measurement (169 Hz for reading, 144 Hz for spontaneous speech) and 1year after the last session (153 Hz and 132 Hz, respectively). The lower limit of fo did not change during reading and spontaneous speech, both directly after the intervention and during both follow-ups. F1-2 of vowel /a/ and the vowel space increased after the intervention and both follow-ups. Individual analyses showed that more aspects should be controlled after the intervention, such as exercises that were performed at home, or the duration of extra gender-affirming voice training sessions.

CONCLUSIONS: After 10 sessions of voice feminization training and follow-up measurements after 3months and 1year, stable increases were found for some formant frequencies and fo measurements, but not all of them. More time should be spent on increasing the fifth percentile of fo, as the lower limit of fo also contributes to the perception of more feminine voice.},
}

RevDate: 2024-05-02

Kocjančič T, Bořil T, S Hofmann (2024)

Acoustic and Articulatory Visual Feedback in Classroom L2 Vowel Remediation.

Language and speech [Epub ahead of print].

This paper presents L2 vowel remediation in a classroom setting via two real-time visual feedback methods: articulatory ultrasound tongue imaging, which shows tongue shape and position, and a newly developed acoustic formant analyzer, which visualizes a point correlating with the combined effect of tongue position and lip rounding in a vowel quadrilateral. Ten Czech students of the Swedish language participated in the study. Swedish vowel production is difficult for Czech speakers since the languages differ significantly in their vowel systems. The students selected the vowel targets on their own and practiced in two classroom groups, with six students receiving two ultrasound training lessons, followed by one acoustic, and four students receiving two acoustic lessons, followed by one ultrasound. Audio data were collected pre-training, after the two sessions employing the first visual feedback method, and at post-training, allowing measuring Euclidean distance among selected groups of vowels and observing the direction of change within the vowel quadrilateral as a result of practice. Perception tests were performed before and after training, revealing that most learners perceived selected vowels correctly already before the practice. The study showed that both feedback methods can be successfully applied to L2 classroom learning, and both lead to the improvement in the pronunciation of the selected vowels, as well as the Swedish vowel set as a whole. However, ultrasound tongue imaging seems to have an advantage as it resulted in a greater number of improved targets.

Additional Links: PMID-38693788

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid38693788,
year = {2024},
author = {Kocjančič, T and Bořil, T and Hofmann, S},
title = {Acoustic and Articulatory Visual Feedback in Classroom L2 Vowel Remediation.},
journal = {Language and speech},
volume = {},
number = {},
pages = {238309231223736},
doi = {10.1177/00238309231223736},
pmid = {38693788},
issn = {1756-6053},
abstract = {This paper presents L2 vowel remediation in a classroom setting via two real-time visual feedback methods: articulatory ultrasound tongue imaging, which shows tongue shape and position, and a newly developed acoustic formant analyzer, which visualizes a point correlating with the combined effect of tongue position and lip rounding in a vowel quadrilateral. Ten Czech students of the Swedish language participated in the study. Swedish vowel production is difficult for Czech speakers since the languages differ significantly in their vowel systems. The students selected the vowel targets on their own and practiced in two classroom groups, with six students receiving two ultrasound training lessons, followed by one acoustic, and four students receiving two acoustic lessons, followed by one ultrasound. Audio data were collected pre-training, after the two sessions employing the first visual feedback method, and at post-training, allowing measuring Euclidean distance among selected groups of vowels and observing the direction of change within the vowel quadrilateral as a result of practice. Perception tests were performed before and after training, revealing that most learners perceived selected vowels correctly already before the practice. The study showed that both feedback methods can be successfully applied to L2 classroom learning, and both lead to the improvement in the pronunciation of the selected vowels, as well as the Swedish vowel set as a whole. However, ultrasound tongue imaging seems to have an advantage as it resulted in a greater number of improved targets.},
}

RevDate: 2024-04-24

Saldías O'Hrens M, Castro C, Espinoza VM, et al (2024)

Spectral features related to the auditory perception of twang-like voices.

Logopedics, phoniatrics, vocology [Epub ahead of print].

BACKGROUND: To the best of our knowledge, studies on the relationship between spectral energy distribution and the degree of perceived twang-like voices are still sparse. Through an auditory-perceptual test we aimed to explore the spectral features that may relate with the auditory-perception of twang-like voices.

METHODS: Ten judges who were blind to the test's tasks and stimuli rated the amount of twang perceived on seventy-six audio samples. The stimuli consisted of twenty voices recorded from eight CCM singers who sustained the vowel [a:] in different pitches, with and without a twang-like voice. Also, forty filtered and sixteen synthesized-manipulated stimuli were included.

RESULTS AND CONCLUSIONS: Based on the intra-rater reliability scores, four judges were identified as suitable to be included in the analyses. Results showed that the frequency of F1 and F2 correlated strongly with the auditory-perception of twang-like voices (0.90 and 0.74, respectively), whereas F3 showed a moderate negative correlation (-0.52). The frequency difference between F1 and F3 showed a strong negative correlation (-0.82). The mean energy between 1-2 kHz and 2-3 kHz correlated moderately (0.51 and 0.42, respectively). The frequency of F4 and F5, and the energy above 3 kHz showed weak correlations. Since the spectral changes under 2 kHz have been associated with the jaw, lips, and tongue adjustments (i.e. vowel articulation) and a higher vertical laryngeal position might affect the frequency of all formants (including F1 and F2), our results suggest that vowel articulation and the laryngeal height may be relevant when performing twang-like voices.

Additional Links: PMID-38656176

Publisher:

PubMed:

Google: