RJ-ROBBINS Formants: Modulators of Communication

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40825839,
year = {2025},
author = {Kim, KS and Kitchen, NM and Mitsuya, T and Max, L},
title = {Auditory-motor adaptation and de-adaptation for speech depend more on time in the new environment than on the amount of practice.},
journal = {Communications psychology},
volume = {3},
number = {1},
pages = {127},
pmid = {40825839},
issn = {2731-9121},
support = {R01DC014510//U.S. Department of Health & Human Services | NIH | National Institute on Deafness and Other Communication Disorders (NIDCD)/ ; R01DC017444//U.S. Department of Health & Human Services | NIH | National Institute on Deafness and Other Communication Disorders (NIDCD)/ ; R01DC020707//U.S. Department of Health & Human Services | NIH | National Institute on Deafness and Other Communication Disorders (NIDCD)/ ; },
abstract = {Sensorimotor adaptation is critical for learning and refining voluntary movements. One common assumption is that the number of practice trials fully determines the amount of adaptation. It is possible, however, that for some tasks the sensorimotor system continues to learn during the time in-between executed movements as long as there is no evidence that the environment has changed. The amount of time spent in the altered environment (total exposure time) then would be more important than the number of practice movements performed during that time. In the current study, we investigated adaptation and de-adaptation as a function of practice trials versus exposure time using speech articulation as the model system. Four separate groups of 14 participants read out loud monosyllabic words at a rate of either 18 words per minute or only 6 words per minute during the adaptation and de-adaptation phases of a speaking task with formant-shifted auditory feedback. The data demonstrate that both auditory-motor adaptation and de-adaptation depend more on exposure time than amount of practice. COIN model simulations suggest that this common effect is consistent with de-adaptation constituting active re-learning of the unaltered environment rather than forgetting of the learned behavior.},
}

RevDate: 2025-08-13

Pereira ÁA, Lima DP, Almeida ANS, et al (2025)

Phonetic-acoustic and ultrasonographic characteristics of speech after lingual frenectomy: a case report.

CoDAS, 37(4):e20240202 pii:S2317-17822025000400400.

This case report aimed to verify the effect of lingual frenectomy on the functional anatomical aspects of the tongue, the phonetic-acoustic characteristics, and the magnitude of tongue movement in the phonemes [ɾ] and [l] after the lingual frenectomy. The anatomical characteristics of the lingual frenum and the functional aspects of the tongue were evaluated using the Protocol for Evaluation of the Lingual Frenum. The phonetic-acoustic particularities of speech were assessed through formant analysis using PRAAT software, and the evaluation of the magnitude of tongue movements was conducted via ultrasonographic analysis with Articulate Assistant Advanced (AAA) software. After the assessments, the patient was referred for the lingual frenectomy and was reevaluated after 7 and 14 days of healing. It was observed through the functional anatomical evaluation that the patient showed modifications in the shape of the tongue tip, greater elevation of the tongue in the oral cavity, and improvement in the contact of the tongue tip with the labial commissures. The acoustic evaluation of speech and the ultrasonographic assessment of tongue movements indicated a longer emission time for the words, increased verticalization and anteriorization of the tongue during speech production, which were more evident for the phoneme [ɾ]. Thus, the instrumental evaluations contributed to the clinical assessment, facilitating the observation of the patient's progress after the lingual frenectomy, identified in the analysis of the formants and highlighted through the ultrasonographic analysis of the tongue.

Additional Links: PMID-40802298

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40802298,
year = {2025},
author = {Pereira, ÁA and Lima, DP and Almeida, ANS and Pinto, PVDN and Alencar, RC and Cruz, IJADS and Alves, NTP and Lira, ZZ and Cunha, DAD and Silva, HJD},
title = {Phonetic-acoustic and ultrasonographic characteristics of speech after lingual frenectomy: a case report.},
journal = {CoDAS},
volume = {37},
number = {4},
pages = {e20240202},
doi = {10.1590/2317-1782/e20240202pt},
pmid = {40802298},
issn = {2317-1782},
abstract = {This case report aimed to verify the effect of lingual frenectomy on the functional anatomical aspects of the tongue, the phonetic-acoustic characteristics, and the magnitude of tongue movement in the phonemes [ɾ] and [l] after the lingual frenectomy. The anatomical characteristics of the lingual frenum and the functional aspects of the tongue were evaluated using the Protocol for Evaluation of the Lingual Frenum. The phonetic-acoustic particularities of speech were assessed through formant analysis using PRAAT software, and the evaluation of the magnitude of tongue movements was conducted via ultrasonographic analysis with Articulate Assistant Advanced (AAA) software. After the assessments, the patient was referred for the lingual frenectomy and was reevaluated after 7 and 14 days of healing. It was observed through the functional anatomical evaluation that the patient showed modifications in the shape of the tongue tip, greater elevation of the tongue in the oral cavity, and improvement in the contact of the tongue tip with the labial commissures. The acoustic evaluation of speech and the ultrasonographic assessment of tongue movements indicated a longer emission time for the words, increased verticalization and anteriorization of the tongue during speech production, which were more evident for the phoneme [ɾ]. Thus, the instrumental evaluations contributed to the clinical assessment, facilitating the observation of the patient's progress after the lingual frenectomy, identified in the analysis of the formants and highlighted through the ultrasonographic analysis of the tongue.},
}

RevDate: 2025-08-12

Yoshitani T, Miyazaki R, Seino S, et al (2025)

Individual vocal identity is enhanced by the enlarged external nose in male proboscis monkeys (Nasalis larvatus).

Journal of the Royal Society, Interface, 22(229):20250098.

Adult male proboscis monkeys, Nasalis larvatus, develop an enlarged external nose. Males often produce loud, long-distance calls filtered through the nasal passage. The enlarged nose probably functions as a visual badge of social status and a visual key representing the owner's physical and sexual quality, and thus is useful for females in selecting mates. In addition to such visual signalling, a larger external nose enhances the lower frequencies in calls, possibly exaggerating acoustic signals related to body size. Here, we used computational simulations with three-dimensional models of the nasal passage to show how the external nose modifies the acoustic property, indicating that the external nose develops to enhance lower frequencies in adults but varies in a specific formant position among adult males. This finding suggests that the external nose generates acoustic signals about physical-sexual maturity in adult males and individual identity among them. The unusual features of the social organization in this species, a patrilineality of a multilevel community consisting of one-male-multi-female units, may reinforce the functional importance of individual male recognition for males and females to monitor the location of both their own units and those of other males.

Additional Links: PMID-40795985

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40795985,
year = {2025},
author = {Yoshitani, T and Miyazaki, R and Seino, S and Edamura, K and Murata, K and Matsuda, I and Nishimura, T and Tokuda, IT},
title = {Individual vocal identity is enhanced by the enlarged external nose in male proboscis monkeys (Nasalis larvatus).},
journal = {Journal of the Royal Society, Interface},
volume = {22},
number = {229},
pages = {20250098},
doi = {10.1098/rsif.2025.0098},
pmid = {40795985},
issn = {1742-5662},
support = {//Japan Society for the Promotion of Science/ ; //Japan Science and Technology Agency/ ; },
abstract = {Adult male proboscis monkeys, Nasalis larvatus, develop an enlarged external nose. Males often produce loud, long-distance calls filtered through the nasal passage. The enlarged nose probably functions as a visual badge of social status and a visual key representing the owner's physical and sexual quality, and thus is useful for females in selecting mates. In addition to such visual signalling, a larger external nose enhances the lower frequencies in calls, possibly exaggerating acoustic signals related to body size. Here, we used computational simulations with three-dimensional models of the nasal passage to show how the external nose modifies the acoustic property, indicating that the external nose develops to enhance lower frequencies in adults but varies in a specific formant position among adult males. This finding suggests that the external nose generates acoustic signals about physical-sexual maturity in adult males and individual identity among them. The unusual features of the social organization in this species, a patrilineality of a multilevel community consisting of one-male-multi-female units, may reinforce the functional importance of individual male recognition for males and females to monitor the location of both their own units and those of other males.},
}

RevDate: 2025-08-12

Yang Q, Zeng L, B Li (2025)

Independently tunable dual-channel angle-sensitive narrow-band perfect absorber.

Applied optics, 64(6):1464-1470.

In this correspondence, we introduce a versatile adjustable absorber featuring two distinct channels. Its primary composition includes strontium titanate (STO) and graphene. The refractive index of STO is influenced by temperature variations and the existence of the structural cavity, allowing for dynamic regulation of the absorption spectrum through external temperature changes and the angle of incident light. The developed device is capable of achieving dual-channel narrow-band perfect absorption at frequencies of 0.394 and 1.24 THz, demonstrating absorption rates of 99% and 98%, respectively. Importantly, we examined the Fermi level transition from 0 to 0.5 eV, revealing that the first resonance absorption peak can be adjusted within a range of 60% to 99%, accompanied by a redshift. This phenomenon is attributed to the local surface plasmon resonance induced by the graphene layer. The absorption characteristics of the second resonance remain relatively stable due to the peak formations within the Fabry-Perot cavity situated inside the STO layer. Given that the formant is influenced by temperature, it can be utilized as a temperature sensor. Furthermore, the absorptivity can be modified by altering the angle of the incident light. As a result of this angle dependence, optical switching can be achieved with a 22 dB ON/OFF ratio and a modulation depth close to 100%. Due to the symmetry of the absorption structure, the device remains unaffected by the polarization of the incoming light. The proposed tunable absorber has potential applications in electromagnetic absorption, optical switching, and various other domains.

Additional Links: PMID-40793438

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40793438,
year = {2025},
author = {Yang, Q and Zeng, L and Li, B},
title = {Independently tunable dual-channel angle-sensitive narrow-band perfect absorber.},
journal = {Applied optics},
volume = {64},
number = {6},
pages = {1464-1470},
doi = {10.1364/AO.550867},
pmid = {40793438},
issn = {1539-4522},
abstract = {In this correspondence, we introduce a versatile adjustable absorber featuring two distinct channels. Its primary composition includes strontium titanate (STO) and graphene. The refractive index of STO is influenced by temperature variations and the existence of the structural cavity, allowing for dynamic regulation of the absorption spectrum through external temperature changes and the angle of incident light. The developed device is capable of achieving dual-channel narrow-band perfect absorption at frequencies of 0.394 and 1.24 THz, demonstrating absorption rates of 99% and 98%, respectively. Importantly, we examined the Fermi level transition from 0 to 0.5 eV, revealing that the first resonance absorption peak can be adjusted within a range of 60% to 99%, accompanied by a redshift. This phenomenon is attributed to the local surface plasmon resonance induced by the graphene layer. The absorption characteristics of the second resonance remain relatively stable due to the peak formations within the Fabry-Perot cavity situated inside the STO layer. Given that the formant is influenced by temperature, it can be utilized as a temperature sensor. Furthermore, the absorptivity can be modified by altering the angle of the incident light. As a result of this angle dependence, optical switching can be achieved with a 22 dB ON/OFF ratio and a modulation depth close to 100%. Due to the symmetry of the absorption structure, the device remains unaffected by the polarization of the incoming light. The proposed tunable absorber has potential applications in electromagnetic absorption, optical switching, and various other domains.},
}

RevDate: 2025-08-11

Kornder L, Alharbi AS, A Foltz (2025)

Second-Language Acquisition and First-Language Attrition of Speech: The Production of Arabic and English Short Vowels.

Language and speech [Epub ahead of print].

This study investigates if two groups of experienced late bilinguals (Arabic-English, English-Arabic) produce the Arabic vowels /ɪ, u, a/ and the English vowels /ɪ, ʊ, æ/ with nativelike formant values (F1, F2) compared with Arabic and English monolinguals, respectively. We aimed to characterize the relationship between second-language (L2) acquisition and first-language (L1) attrition of vowels, that is, does nativelike acquisition of an L2 vowel correspond to attrition of a phonetically similar L1 vowel, and vice versa? Moreover, we explored if nativelikeness of bilingual vowel productions is influenced by the predictor variable sound discrimination aptitude. Results show that bilinguals who produce nativelike L2 vowels are also able to maintain native L1 productions, suggesting that an increased L2 proficiency does not inevitably entail a decline in L1 proficiency.

Additional Links: PMID-40785265

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40785265,
year = {2025},
author = {Kornder, L and Alharbi, AS and Foltz, A},
title = {Second-Language Acquisition and First-Language Attrition of Speech: The Production of Arabic and English Short Vowels.},
journal = {Language and speech},
volume = {},
number = {},
pages = {238309251344889},
doi = {10.1177/00238309251344889},
pmid = {40785265},
issn = {1756-6053},
abstract = {This study investigates if two groups of experienced late bilinguals (Arabic-English, English-Arabic) produce the Arabic vowels /ɪ, u, a/ and the English vowels /ɪ, ʊ, æ/ with nativelike formant values (F1, F2) compared with Arabic and English monolinguals, respectively. We aimed to characterize the relationship between second-language (L2) acquisition and first-language (L1) attrition of vowels, that is, does nativelike acquisition of an L2 vowel correspond to attrition of a phonetically similar L1 vowel, and vice versa? Moreover, we explored if nativelikeness of bilingual vowel productions is influenced by the predictor variable sound discrimination aptitude. Results show that bilinguals who produce nativelike L2 vowels are also able to maintain native L1 productions, suggesting that an increased L2 proficiency does not inevitably entail a decline in L1 proficiency.},
}

RevDate: 2025-08-06
CmpDate: 2025-08-06

Lyu Y, Jiang QC, Yuan S, et al (2025)

Non-invasive acoustic classification of adult asthma using an XGBoost model with vocal biomarkers.

Scientific reports, 15(1):28682.

Traditional diagnostic methods for asthma, a widespread chronic respiratory illness, are often limited by factors such as patient cooperation with spirometry. Non-invasive acoustic analysis using machine learning offers a promising alternative for objective diagnosis by analyzing vocal characteristics. This study aimed to develop and validate a robust classification model for adult asthma using acoustic features from the vocalized /ɑː/ sound. In a case-control study, voice recordings of the /ɑː/ sound were collected from a primary cohort of 214 adults and an independent external validation cohort of 200 adults. This study extracted features using a modified extended Geneva Minimalistic Acoustic Parameter Set and compared seven machine learning models. The top-performing model, Extreme Gradient Boosting, was further assessed through ten-fold cross-validation, external validation, and feature analysis using SHapley Additive exPlanations and Local Interpretable Model-Agnostic Explanations. The Extreme Gradient Boosting classifier achieved the highest performance on the test set, with an accuracy of 0.8514, an Area Under the Curve of 0.9130, a recall of 0.8804, a precision of 0.8387, an F1-score of 0.8567, a Kappa coefficient of 0.7018, and a Matthews Correlation Coefficient of 0.7071. On the external validation set, the model maintained strong performance with an accuracy of 0.8100, AUC of 0.8755, recall of 0.8300, precision of 0.7981, F1-score of 0.8137, Kappa of 0.6200, and Matthews Correlation Coefficient of 0.6205. Interpretability analysis identified formant frequencies as the most significant acoustic predictors. An Extreme Gradient Boosting model utilizing features from the extended Geneva Minimalistic Acoustic Parameter Set is an accurate and viable non-invasive method for classifying adult asthma, holding significant potential for developing accessible tools for early diagnosis, remote monitoring, and improved asthma management.

Additional Links: PMID-40770052

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40770052,
year = {2025},
author = {Lyu, Y and Jiang, QC and Yuan, S and Hong, J and Chen, CF and Wu, HM and Wang, YQ and Shi, YJ and Yan, HX and Xu, J},
title = {Non-invasive acoustic classification of adult asthma using an XGBoost model with vocal biomarkers.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {28682},
pmid = {40770052},
issn = {2045-2322},
support = {21DZ2271000//Shanghai Key Laboratory of Health Identification and Assessment Project/ ; 21DZ2271000//Shanghai Key Laboratory of Health Identification and Assessment Project/ ; 21DZ2271000//Shanghai Key Laboratory of Health Identification and Assessment Project/ ; 21DZ2271000//Shanghai Key Laboratory of Health Identification and Assessment Project/ ; 21DZ2271000//Shanghai Key Laboratory of Health Identification and Assessment Project/ ; 24KFL011//Science and Technology Development Project of Shanghai University of Traditional Chinese Medicine/ ; 81673880//National Natural Science Foundation of China/ ; 81673880//National Natural Science Foundation of China/ ; 81673880//National Natural Science Foundation of China/ ; 81673880//National Natural Science Foundation of China/ ; 81673880//National Natural Science Foundation of China/ ; ZY(2021-2023)-0212//Shanghai Three-Year Action Plan (2021-2023) for Accelerating the Development of TCM Career "Construction of a Highland for the International Standardization of TCM"/ ; ZY(2021-2023)-0212//Shanghai Three-Year Action Plan (2021-2023) for Accelerating the Development of TCM Career "Construction of a Highland for the International Standardization of TCM"/ ; ZY(2021-2023)-0212//Shanghai Three-Year Action Plan (2021-2023) for Accelerating the Development of TCM Career "Construction of a Highland for the International Standardization of TCM"/ ; ZY(2021-2023)-0212//Shanghai Three-Year Action Plan (2021-2023) for Accelerating the Development of TCM Career "Construction of a Highland for the International Standardization of TCM"/ ; },
mesh = {Humans ; *Asthma/diagnosis/physiopathology/classification ; Adult ; Male ; Female ; Middle Aged ; Case-Control Studies ; Machine Learning ; Acoustics ; Biomarkers ; *Voice/physiology ; Young Adult ; Boosting Machine Learning Algorithms ; },
abstract = {Traditional diagnostic methods for asthma, a widespread chronic respiratory illness, are often limited by factors such as patient cooperation with spirometry. Non-invasive acoustic analysis using machine learning offers a promising alternative for objective diagnosis by analyzing vocal characteristics. This study aimed to develop and validate a robust classification model for adult asthma using acoustic features from the vocalized /ɑː/ sound. In a case-control study, voice recordings of the /ɑː/ sound were collected from a primary cohort of 214 adults and an independent external validation cohort of 200 adults. This study extracted features using a modified extended Geneva Minimalistic Acoustic Parameter Set and compared seven machine learning models. The top-performing model, Extreme Gradient Boosting, was further assessed through ten-fold cross-validation, external validation, and feature analysis using SHapley Additive exPlanations and Local Interpretable Model-Agnostic Explanations. The Extreme Gradient Boosting classifier achieved the highest performance on the test set, with an accuracy of 0.8514, an Area Under the Curve of 0.9130, a recall of 0.8804, a precision of 0.8387, an F1-score of 0.8567, a Kappa coefficient of 0.7018, and a Matthews Correlation Coefficient of 0.7071. On the external validation set, the model maintained strong performance with an accuracy of 0.8100, AUC of 0.8755, recall of 0.8300, precision of 0.7981, F1-score of 0.8137, Kappa of 0.6200, and Matthews Correlation Coefficient of 0.6205. Interpretability analysis identified formant frequencies as the most significant acoustic predictors. An Extreme Gradient Boosting model utilizing features from the extended Geneva Minimalistic Acoustic Parameter Set is an accurate and viable non-invasive method for classifying adult asthma, holding significant potential for developing accessible tools for early diagnosis, remote monitoring, and improved asthma management.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Asthma/diagnosis/physiopathology/classification
Adult
Male
Female
Middle Aged
Case-Control Studies
Machine Learning
Acoustics
Biomarkers
*Voice/physiology
Young Adult
Boosting Machine Learning Algorithms

RevDate: 2025-08-05

Huang T, Liu A, Xu T, et al (2025)

Vocal Fatigue in Mandarin-Speaking Teachers: Characteristics of Phonation and Resonance.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00280-2 [Epub ahead of print].

OBJECTIVE: To investigate phonatory and resonatory characteristics in Mandarin-speaking female teachers with vocal fatigue (VF).

METHODS: A case-control study was conducted among teachers in Shanghai, China. Participants included 18 vocal-fatigued teachers (VF), 17 non-vocal-fatigued teachers (NVF), and 16 nonoccupational voice users (NOVU). Subjective assessment included the Vocal Fatigue Index (VFI), Voice Handicap Index-10 (VHI-10), and GRBAS scale. Objective assessments included maximum phonation time (MPT), dysphonia severity index (DSI), fundamental frequency (F0), cepstral peak prominence, intensity, jitter, shimmer, harmonics-to-noise ratio (HNR), the rate, and regularity (jitter) of the laryngeal diadochokinesis (LDDK) (/ʔa/, /ha/, /ʔʌ/, /hʌ/). Articulatory features included vowel space area (VSA), formant centralization ratio (FCR), the formants (F1-F3), and their bandwidths (B1-B3). These measures were collected during vowels, passage reading, and free talking. A decision tree model was developed to identify key factors associated with VF.

RESULTS: Subjective voice assessments: the VF group showed significantly higher VHI-10 scores and grade (G) ratings of GRBAS compared with the NVF and NOVU (P < 0.05). Objective voice assessments: the VF had significantly lower DSI (P < 0.05), higher FCR, F1/i/, and F1/u/ (P < 0.05) than NVF and NOVU, and reduced MPT and VSA compared with the NVF (P < 0.05). F2/u/ was significantly higher in the VF than in the NVF (P < 0.05). Compared with the NOVU, the VF exhibited significantly higher jitter/i/, F2/a/, and B3/u/ (P < 0.05), as well as significantly lower HNR/u/, F2/i/, B3/i/, /ha/, and /hʌ/ rate (P < 0.05). Decision tree model: F2/u/, VHI-10, and DSI were identified as key discriminators of VF in teachers.

CONCLUSIONS: Teachers with VF demonstrate insufficient respiratory support, poor vocal quality, reduced LDDK ability, and limited articulatory movement range. F2/u/, VHI-10, and DSI play a crucial role in distinguishing VF.

Additional Links: PMID-40764155

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40764155,
year = {2025},
author = {Huang, T and Liu, A and Xu, T and Lee, W and Kim, H},
title = {Vocal Fatigue in Mandarin-Speaking Teachers: Characteristics of Phonation and Resonance.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.07.016},
pmid = {40764155},
issn = {1873-4588},
abstract = {OBJECTIVE: To investigate phonatory and resonatory characteristics in Mandarin-speaking female teachers with vocal fatigue (VF).

METHODS: A case-control study was conducted among teachers in Shanghai, China. Participants included 18 vocal-fatigued teachers (VF), 17 non-vocal-fatigued teachers (NVF), and 16 nonoccupational voice users (NOVU). Subjective assessment included the Vocal Fatigue Index (VFI), Voice Handicap Index-10 (VHI-10), and GRBAS scale. Objective assessments included maximum phonation time (MPT), dysphonia severity index (DSI), fundamental frequency (F0), cepstral peak prominence, intensity, jitter, shimmer, harmonics-to-noise ratio (HNR), the rate, and regularity (jitter) of the laryngeal diadochokinesis (LDDK) (/ʔa/, /ha/, /ʔʌ/, /hʌ/). Articulatory features included vowel space area (VSA), formant centralization ratio (FCR), the formants (F1-F3), and their bandwidths (B1-B3). These measures were collected during vowels, passage reading, and free talking. A decision tree model was developed to identify key factors associated with VF.

RESULTS: Subjective voice assessments: the VF group showed significantly higher VHI-10 scores and grade (G) ratings of GRBAS compared with the NVF and NOVU (P < 0.05). Objective voice assessments: the VF had significantly lower DSI (P < 0.05), higher FCR, F1/i/, and F1/u/ (P < 0.05) than NVF and NOVU, and reduced MPT and VSA compared with the NVF (P < 0.05). F2/u/ was significantly higher in the VF than in the NVF (P < 0.05). Compared with the NOVU, the VF exhibited significantly higher jitter/i/, F2/a/, and B3/u/ (P < 0.05), as well as significantly lower HNR/u/, F2/i/, B3/i/, /ha/, and /hʌ/ rate (P < 0.05). Decision tree model: F2/u/, VHI-10, and DSI were identified as key discriminators of VF in teachers.

CONCLUSIONS: Teachers with VF demonstrate insufficient respiratory support, poor vocal quality, reduced LDDK ability, and limited articulatory movement range. F2/u/, VHI-10, and DSI play a crucial role in distinguishing VF.},
}

RevDate: 2025-08-03

Punamäki RL, Diab SY, Drosos K, et al (2025)

The impact of mother's mental health, infant characteristics and war trauma on the acoustic features of infant-directed singing.

Infant mental health journal [Epub ahead of print].

Infant-directed singing (IDSi) is a natural means of dyadic communication that contributes to children's mental health by enhancing emotion expression, close relationships, exploration and learning. Therefore, it is important to learn about factors that impact the IDSi. This study modeled the mother- (mental health), infant- (emotional responses and health status) and environment (war trauma)-related factors influencing acoustic IDSi features, such as pitch (F0) variability, amplitude and vibration and the F0 contour of shapes and movements. The participants were 236 mothers and infants from Gaza, the Occupied Palestinian Territories. The mothers reported their mental health problems, infants' emotionality and regulation skills, and, along with pediatric checkups, illnesses and disorders, as well as traumatic war events that were also photo documented. The results showed that the mothers' mental health problems and infants' poor health status were associated with IDSi, characterized by narrow and lifeless amplitude and vibration, and poor health was also associated with the limited and rigid shapes and movements of F0 contours. Traumatic war events were associated with flat and narrow F0 variability and the monotonous and invariable resonance and rhythm of IDSi formants. The infants' emotional responses did not impact IDSi. The potential of protomusical singing to help war-affected dyads is discussed.

Additional Links: PMID-40753586

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40753586,
year = {2025},
author = {Punamäki, RL and Diab, SY and Drosos, K and Quota, SR},
title = {The impact of mother's mental health, infant characteristics and war trauma on the acoustic features of infant-directed singing.},
journal = {Infant mental health journal},
volume = {},
number = {},
pages = {},
doi = {10.1002/imhj.70036},
pmid = {40753586},
issn = {1097-0355},
support = {#5585//Jacobs Foundation/ ; #275197//The Research Council of Finland/ ; },
abstract = {Infant-directed singing (IDSi) is a natural means of dyadic communication that contributes to children's mental health by enhancing emotion expression, close relationships, exploration and learning. Therefore, it is important to learn about factors that impact the IDSi. This study modeled the mother- (mental health), infant- (emotional responses and health status) and environment (war trauma)-related factors influencing acoustic IDSi features, such as pitch (F0) variability, amplitude and vibration and the F0 contour of shapes and movements. The participants were 236 mothers and infants from Gaza, the Occupied Palestinian Territories. The mothers reported their mental health problems, infants' emotionality and regulation skills, and, along with pediatric checkups, illnesses and disorders, as well as traumatic war events that were also photo documented. The results showed that the mothers' mental health problems and infants' poor health status were associated with IDSi, characterized by narrow and lifeless amplitude and vibration, and poor health was also associated with the limited and rigid shapes and movements of F0 contours. Traumatic war events were associated with flat and narrow F0 variability and the monotonous and invariable resonance and rhythm of IDSi formants. The infants' emotional responses did not impact IDSi. The potential of protomusical singing to help war-affected dyads is discussed.},
}

RevDate: 2025-07-26

Ghaemi H, Aghaie F, Ghaemi H, et al (2025)

A Study of Persian English Teachers' Voices: Do Acoustic Voice Characteristics Change When Speaking in English and Farsi?.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00273-5 [Epub ahead of print].

OBJECTIVE: Voice characteristics can vary across languages due to distinct phonological and articulatory demands. However, limited research exists on how acoustic parameters differ when bilingual speakers switch between Farsi and English. This study aimed to compare the acoustic voice characteristics when Persian teachers of English languages switch between English and Farsi.

METHODS: Thirty-eight English language teachers (30 females, 8 males, age range: 20-28 years) with Persian as their native language and a minimum of 3 years of English teaching experience participated in this study. Participants were selected through random cluster sampling. All participants were screened for voice disorders using the Voice Handicap Index (VHI). Voice samples were collected using both Farsi and English versions of the Consensus Auditory Perceptual Evaluation of Voice (CAPE-V) sentences in a single quiet room. Recordings were made using a Shure SM 58 microphone and Sony ICD-PX240 recorder, maintaining a 10-cm mouth-to-microphone distance. Acoustic analysis was performed using Praat software to measure source-related measures (fundamental frequency, harmonics-to-noise ratio, and intensity) and related vocal-tract-related measures (first (F1), second (F2), and third (F3) formant frequencies).

RESULTS: No statistically significant differences were found in source-related acoustic measures (fundamental frequency, harmonics-to-noise ratio, and intensity) between English and Farsi speech (P > 0.05). However, vocal-tract-related measures (F1, F2, F3) demonstrated statistically significant differences between the two languages, with higher values during Farsi speech compared to English (P < 0.05).

CONCLUSION: Findings demonstrated that bilingual speakers produced notably different voice patterns contingent on language and speech task, indicating that inter- and intra-speaker variability in speakers' vocal features can be attributed in part to language effects. These differences were primarily observed in vocal-tract-related characteristics rather than source-related characteristics, suggesting that articulation strategies, rather than laryngeal functions, vary between languages.

Additional Links: PMID-40713390

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40713390,
year = {2025},
author = {Ghaemi, H and Aghaie, F and Ghaemi, H and Deliyski, DD and Hojabr, M and Zoleykhaie, Z},
title = {A Study of Persian English Teachers' Voices: Do Acoustic Voice Characteristics Change When Speaking in English and Farsi?.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.07.009},
pmid = {40713390},
issn = {1873-4588},
abstract = {OBJECTIVE: Voice characteristics can vary across languages due to distinct phonological and articulatory demands. However, limited research exists on how acoustic parameters differ when bilingual speakers switch between Farsi and English. This study aimed to compare the acoustic voice characteristics when Persian teachers of English languages switch between English and Farsi.

METHODS: Thirty-eight English language teachers (30 females, 8 males, age range: 20-28 years) with Persian as their native language and a minimum of 3 years of English teaching experience participated in this study. Participants were selected through random cluster sampling. All participants were screened for voice disorders using the Voice Handicap Index (VHI). Voice samples were collected using both Farsi and English versions of the Consensus Auditory Perceptual Evaluation of Voice (CAPE-V) sentences in a single quiet room. Recordings were made using a Shure SM 58 microphone and Sony ICD-PX240 recorder, maintaining a 10-cm mouth-to-microphone distance. Acoustic analysis was performed using Praat software to measure source-related measures (fundamental frequency, harmonics-to-noise ratio, and intensity) and related vocal-tract-related measures (first (F1), second (F2), and third (F3) formant frequencies).

RESULTS: No statistically significant differences were found in source-related acoustic measures (fundamental frequency, harmonics-to-noise ratio, and intensity) between English and Farsi speech (P > 0.05). However, vocal-tract-related measures (F1, F2, F3) demonstrated statistically significant differences between the two languages, with higher values during Farsi speech compared to English (P < 0.05).

CONCLUSION: Findings demonstrated that bilingual speakers produced notably different voice patterns contingent on language and speech task, indicating that inter- and intra-speaker variability in speakers' vocal features can be attributed in part to language effects. These differences were primarily observed in vocal-tract-related characteristics rather than source-related characteristics, suggesting that articulation strategies, rather than laryngeal functions, vary between languages.},
}

RevDate: 2025-07-22

Zhang W, J Steffman (2025)

Attentional influences on cue weighting in vowel perception: Examining prosodic prominence and informational masking.

Attention, perception & psychophysics [Epub ahead of print].

Beyond sources of listener-external variability such as variation in talker and acoustic context, listener-internal variation also plays a role in speech perception and cue weighting. The present study examines the effects of prosodic prominence, signaled by F0, and multi-talker babble noise as methods of boosting and decrementing listeners' attention, respectively. Listeners categorized four English vowel contrasts, including two high vowel contrasts and two non-high vowel contrasts, with both formant cues and vowel duration varying along a continuum. In Experiment 1, results showed that prominence boosted formant cue usage, whereas babble noise was detrimental to formant cue usage, aligning with predicted roles in modulating listener attention. Listeners' use of vowel duration, a secondary cue to the contrasts, was also impacted by prominence or babble noise. In Experiment 2, two methods of eliciting F0-based prominence, off-target (contextual) and on-target (target-internal), were investigated. Results showed that off-target prominence showed a very limited effect in boosting formant cue usage. Results are discussed in terms of the role of prosodic prominence in speech perception, and the role of attention in perceptual processing. The data and code for the experiments is available on the OSF at: https://osf.io/52khc/ .

Additional Links: PMID-40696109

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40696109,
year = {2025},
author = {Zhang, W and Steffman, J},
title = {Attentional influences on cue weighting in vowel perception: Examining prosodic prominence and informational masking.},
journal = {Attention, perception & psychophysics},
volume = {},
number = {},
pages = {},
pmid = {40696109},
issn = {1943-393X},
abstract = {Beyond sources of listener-external variability such as variation in talker and acoustic context, listener-internal variation also plays a role in speech perception and cue weighting. The present study examines the effects of prosodic prominence, signaled by F0, and multi-talker babble noise as methods of boosting and decrementing listeners' attention, respectively. Listeners categorized four English vowel contrasts, including two high vowel contrasts and two non-high vowel contrasts, with both formant cues and vowel duration varying along a continuum. In Experiment 1, results showed that prominence boosted formant cue usage, whereas babble noise was detrimental to formant cue usage, aligning with predicted roles in modulating listener attention. Listeners' use of vowel duration, a secondary cue to the contrasts, was also impacted by prominence or babble noise. In Experiment 2, two methods of eliciting F0-based prominence, off-target (contextual) and on-target (target-internal), were investigated. Results showed that off-target prominence showed a very limited effect in boosting formant cue usage. Results are discussed in terms of the role of prosodic prominence in speech perception, and the role of attention in perceptual processing. The data and code for the experiments is available on the OSF at: https://osf.io/52khc/ .},
}

RevDate: 2025-07-21

Devi I, Dahiya NK, Ruhil AP, et al (2025)

Prediction of different physiological conditions of riverine buffaloes (bubalus bubalis) based on their vocal cues through machine learning algorithms and a conventional statistical model.

The Journal of dairy research pii:S0022029925100976 [Epub ahead of print].

To understand the requirements of animals their calls can be analysed. This potentially enables specific and more precise individual care under different emotional and physiological conditions. This study was conducted to identify three different conditions (oestrus, delayed milking and isolation) of buffaloes based on vocalization patterns. A total of 600 acoustic samples of buffaloes for each condition were collected under different conditions consisting of 300 records for confirming and 300 for non-confirming of a particular condition. Important acoustic features like amplitude (P), total energy (P[2]s), pitch (Hz), intensity (dB), formants (Hz), number of pulses, number of periods, mean period (sec) and unvoiced frames (%) were extracted using the MFCC (mel frequency cepstrum coefficients) technique. Algorithms (model) were trained by partitioning the acoustic data into training and validation sets to develop predictive models. Three different ratios were assessed: 60%-40%, 70%-30% and 80%-20%. Decision tree models were optimized based on decision and average square error (probability) options and other parameters were set to default values of the software package to deveop the best model. The performance of algorithms was evaluated on the parameter accuracy rate. Decision tree models predicted the physiological conditions oestrus, isolation and delayed milking with an accuracy of 66.1, 84.3 and 71.3%, respectively, while the logistic regression model predicted with an accuracy rate of 59.5, 71.1 and 65.7%, respectively, and the artificial neural network (ANN) model predicted these three conditions with 77.7, 85.2 and 79.4% accuracy, respectively. The ANN model was found to be best on the basis of minimum misclassification rate (on 80%-20% portioning). However, decision tree algorithms also provided the additional information that intensity (maximum), amplitude (minimum) and formant (F1) are the most important features of vocal signals to identify physiological conditions like oestrus, isolation and delayed milking respectively in dairy buffalo.

Additional Links: PMID-40686040

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40686040,
year = {2025},
author = {Devi, I and Dahiya, NK and Ruhil, AP and Singh, Y and Tomar, DS},
title = {Prediction of different physiological conditions of riverine buffaloes (bubalus bubalis) based on their vocal cues through machine learning algorithms and a conventional statistical model.},
journal = {The Journal of dairy research},
volume = {},
number = {},
pages = {1-5},
doi = {10.1017/S0022029925100976},
pmid = {40686040},
issn = {1469-7629},
abstract = {To understand the requirements of animals their calls can be analysed. This potentially enables specific and more precise individual care under different emotional and physiological conditions. This study was conducted to identify three different conditions (oestrus, delayed milking and isolation) of buffaloes based on vocalization patterns. A total of 600 acoustic samples of buffaloes for each condition were collected under different conditions consisting of 300 records for confirming and 300 for non-confirming of a particular condition. Important acoustic features like amplitude (P), total energy (P[2]s), pitch (Hz), intensity (dB), formants (Hz), number of pulses, number of periods, mean period (sec) and unvoiced frames (%) were extracted using the MFCC (mel frequency cepstrum coefficients) technique. Algorithms (model) were trained by partitioning the acoustic data into training and validation sets to develop predictive models. Three different ratios were assessed: 60%-40%, 70%-30% and 80%-20%. Decision tree models were optimized based on decision and average square error (probability) options and other parameters were set to default values of the software package to deveop the best model. The performance of algorithms was evaluated on the parameter accuracy rate. Decision tree models predicted the physiological conditions oestrus, isolation and delayed milking with an accuracy of 66.1, 84.3 and 71.3%, respectively, while the logistic regression model predicted with an accuracy rate of 59.5, 71.1 and 65.7%, respectively, and the artificial neural network (ANN) model predicted these three conditions with 77.7, 85.2 and 79.4% accuracy, respectively. The ANN model was found to be best on the basis of minimum misclassification rate (on 80%-20% portioning). However, decision tree algorithms also provided the additional information that intensity (maximum), amplitude (minimum) and formant (F1) are the most important features of vocal signals to identify physiological conditions like oestrus, isolation and delayed milking respectively in dairy buffalo.},
}

RevDate: 2025-07-17

Munson B, DV Dolquist (2025)

The Perception of (Trans)masculinity in Speech: Effects of Acoustic Characteristics and Rater Identity.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Gender-affirming communication services are based on studies of speech produced and perceived by cisgender men and women. The current study examined the perception of gender and gender orientation (i.e., whether someone is cisgender or transgender) in the Palette of Voices, an openly available corpus of the speech of transgender and cisgender men, by cisgender heterosexual men (CHM) and cisgender heterosexual women (CHF), and a group of gender and sexuality expansive (GSE) listeners. We examined how both the acoustic characteristics of speech and listener identity affect gender and gender orientation categorization.

METHOD: Participants (n = 199) categorized the gender and gender orientation of 240 sentence productions produced by 20 male talkers in an online experiment, including tokens whose fundamental frequency (F0) and formant frequency scaling had been altered, and unmanipulated tokens.

RESULTS: Consistent with previous research, productions with lower F0 and lower formant frequencies were more likely to be categorized as male than ones with higher F0s and formants. The weighting of these variables differed systematically across listener groups, with the GSE group weighting these variables less than the CHM and CHF groups when categorizing gender, but more when categorizing gender orientation.

CONCLUSIONS: The relationship between the acoustic characteristics of a talker's speech and the categorization of their gender and gender orientation is highly variable across and within groups. The perception data and speech samples in this study are openly available. Suggestions are given for how they might be used to supplement existing gender-affirming communication services.

Additional Links: PMID-40673777

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40673777,
year = {2025},
author = {Munson, B and Dolquist, DV},
title = {The Perception of (Trans)masculinity in Speech: Effects of Acoustic Characteristics and Rater Identity.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-25},
doi = {10.1044/2025_JSLHR-24-00756},
pmid = {40673777},
issn = {1558-9102},
abstract = {PURPOSE: Gender-affirming communication services are based on studies of speech produced and perceived by cisgender men and women. The current study examined the perception of gender and gender orientation (i.e., whether someone is cisgender or transgender) in the Palette of Voices, an openly available corpus of the speech of transgender and cisgender men, by cisgender heterosexual men (CHM) and cisgender heterosexual women (CHF), and a group of gender and sexuality expansive (GSE) listeners. We examined how both the acoustic characteristics of speech and listener identity affect gender and gender orientation categorization.

METHOD: Participants (n = 199) categorized the gender and gender orientation of 240 sentence productions produced by 20 male talkers in an online experiment, including tokens whose fundamental frequency (F0) and formant frequency scaling had been altered, and unmanipulated tokens.

RESULTS: Consistent with previous research, productions with lower F0 and lower formant frequencies were more likely to be categorized as male than ones with higher F0s and formants. The weighting of these variables differed systematically across listener groups, with the GSE group weighting these variables less than the CHM and CHF groups when categorizing gender, but more when categorizing gender orientation.

CONCLUSIONS: The relationship between the acoustic characteristics of a talker's speech and the categorization of their gender and gender orientation is highly variable across and within groups. The perception data and speech samples in this study are openly available. Suggestions are given for how they might be used to supplement existing gender-affirming communication services.},
}

RevDate: 2025-07-16
CmpDate: 2025-07-16

Chouksey G, Singh GP, Gupta V, et al (2025)

A pilot study comparing conventional and digital impression techniques for speech analysis using Hindi vowels in maxillectomy patients rehabilitated with an obturator.

Journal of Indian Prosthodontic Society, 25(3):266-275.

AIM: Maxillectomy patients frequently have speech impairments resulting from the loss of the oral-nasal partition. Prosthodontic rehabilitation with an obturator helps restore speech intelligibility, with its success largely dependent on accurate impression recording of maxillary defects. This investigation evaluated the effectiveness of conventional versus digital impression techniques in the context of speech analysis, specifically using Hindi vowels, in maxillectomy patients rehabilitated with obturators.

STUDY SETTING AND DESIGN: This research, designed as a quasi-experimental study, was undertaken at a tertiary care hospital.

MATERIAL AND METHODS: The study included 20 patients needing obturators, assigned to two groups: one received prostheses fabricated with conventional impressions, and the other with digital techniques. Speech parameters, including fundamental frequency, formant frequencies (F1, F2, F3), intensity, jitter, shimmer, and maximum phonation duration (MPD), were analyzed using Praat software before and three months after rehabilitation. Hindi vowels aa /a:/, ii /i:/, and uu /u:/ were sustained at a controlled intensity. Swallowing efficiency was assessed via the water swallow test.

STATISTICAL ANALYSIS USED: The Wilcoxon Rank Sum test or exact test was used to compare the data, with a p-value < 0.05 considered significant.

RESULTS: The most common maxillectomy defects were Brown class 2b. After three months, significant improvement in speech parameters and swallowing efficiency was seen in both groups of participants. However, no statistically significant differences (P > 0.05) were found between the conventional and digital impression groups.

CONCLUSIONS: This novel study compared conventional and digital impressions for speech analysis using Hindi vowels in maxillectomy patients after rehabilitation with an obturator. Voice recording and acoustic analysis using Hindi vowels provide valuable insights into speech rehabilitation outcomes in maxillectomy patients. Both conventional and digital impression techniques effectively fabricate obturators, improving speech characteristics and intelligibility. Both methods can be used for maxillectomy patients, allowing flexibility in clinical practice.

Additional Links: PMID-40669000

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40669000,
year = {2025},
author = {Chouksey, G and Singh, GP and Gupta, V and Sahoo, P and Choure, R and Goyal, A},
title = {A pilot study comparing conventional and digital impression techniques for speech analysis using Hindi vowels in maxillectomy patients rehabilitated with an obturator.},
journal = {Journal of Indian Prosthodontic Society},
volume = {25},
number = {3},
pages = {266-275},
doi = {10.4103/jips.jips_64_25},
pmid = {40669000},
issn = {1998-4057},
mesh = {Humans ; Pilot Projects ; *Palatal Obturators ; *Maxilla/surgery ; Male ; Female ; *Dental Impression Technique ; Middle Aged ; Speech Intelligibility ; Adult ; Aged ; },
abstract = {AIM: Maxillectomy patients frequently have speech impairments resulting from the loss of the oral-nasal partition. Prosthodontic rehabilitation with an obturator helps restore speech intelligibility, with its success largely dependent on accurate impression recording of maxillary defects. This investigation evaluated the effectiveness of conventional versus digital impression techniques in the context of speech analysis, specifically using Hindi vowels, in maxillectomy patients rehabilitated with obturators.

STUDY SETTING AND DESIGN: This research, designed as a quasi-experimental study, was undertaken at a tertiary care hospital.

MATERIAL AND METHODS: The study included 20 patients needing obturators, assigned to two groups: one received prostheses fabricated with conventional impressions, and the other with digital techniques. Speech parameters, including fundamental frequency, formant frequencies (F1, F2, F3), intensity, jitter, shimmer, and maximum phonation duration (MPD), were analyzed using Praat software before and three months after rehabilitation. Hindi vowels aa /a:/, ii /i:/, and uu /u:/ were sustained at a controlled intensity. Swallowing efficiency was assessed via the water swallow test.

STATISTICAL ANALYSIS USED: The Wilcoxon Rank Sum test or exact test was used to compare the data, with a p-value < 0.05 considered significant.

RESULTS: The most common maxillectomy defects were Brown class 2b. After three months, significant improvement in speech parameters and swallowing efficiency was seen in both groups of participants. However, no statistically significant differences (P > 0.05) were found between the conventional and digital impression groups.

CONCLUSIONS: This novel study compared conventional and digital impressions for speech analysis using Hindi vowels in maxillectomy patients after rehabilitation with an obturator. Voice recording and acoustic analysis using Hindi vowels provide valuable insights into speech rehabilitation outcomes in maxillectomy patients. Both conventional and digital impression techniques effectively fabricate obturators, improving speech characteristics and intelligibility. Both methods can be used for maxillectomy patients, allowing flexibility in clinical practice.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Pilot Projects
*Palatal Obturators
*Maxilla/surgery
Male
Female
*Dental Impression Technique
Middle Aged
Speech Intelligibility
Adult
Aged

RevDate: 2025-07-10

Aaen M, M Frič (2025)

Going Beyond the Register-Vocal Mode Categorization Across Four Octaves in Professional Male and Female Singing Voice Using Voice Range Profile, EGG, Acoustic, and Vibroacoustic Measurements: Double-Case Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00222-X [Epub ahead of print].

INTRODUCTION: Registers are considered a fundamental yet controversial topic in relation to the lack of generally accepted definitions and lack of consensus on pedagogical application. Recent discoveries suggest that singers are able to use specific vocal tract configurations across upward of three octaves without register changes. Previous research has shown differences between registers at the glottal level as well as in perceptual, acoustic, physiological, and aerodynamic dimensions, while other studies have suggested that singers can maintain specific vocal configurations past previously observed register shifts. This study investigates several loudness and vocal quality conditions across 3-4 octaves.

METHODS: Double-case study of two professional singers (F1-AFAB, M1-AMAB) performed vocal tasks based on Complete Vocal Technique (CVT) modes, singing sustained vowels ([e:], [ʌ:] for Curbing) at varying intensities (MP, F, and FF) across their vocal range (three octaves and four octaves, respectively). A total of 572 (F1) and 516 (M1) samples were collected and analyzed using MATLAB and Praat, focusing on sound pressure level (SPL), fo, formants (F1-F4), spectral parameters (SPR, L1-L2, and SCO-2.5), jitter, shimmer, smoothed cepstral peak prominence (CPPS), noise-to-harmonic ratio (NHR), and electroglottographic (EGG) analysis. Statistical methods included normality testing, t tests, Wilcoxon tests, Bonferroni correction, linear regression, and factor analysis.

RESULTS: Both singers produced samples in M1- and M2-type vibratory mechanisms extending beyond typically observed pitch regions for traditional registers, with the male covering a total of four octaves (C2-C6) with all conditions represented in three octaves, while the female singer covered a total of three octaves (D3-D6) with all conditions represented in almost two octaves. Metal and density parameters systematically and linearly varied with fo and influenced SPL, acoustic spectra, and EGG measures. Observed vibrational patterns for fuller and reduced metallic and fuller and reduced density conditions resembled patterns previously reported for modal or M1 type of vibratory patterns, whereas falsetto conditions resembled patterns typically reported for M2 or head voice-type vibratory patterns. Fuller metallic conditions exhibited higher SPL and Qx, while non-metallic conditions showed lower SPL and EGG-waveform magnitude. Factorial analysis revealed distinct statistical differentiation for metal and density variations. Classical samples showed restricted use of vocal modes throughout the range, whereas Contemporary Commercial Music (CCM) samples did not. The findings highlight the complex and nuanced interplay between glottal behavior and vocal tract setup to offer new perspectives on how singers navigate throughout their vocal range.

CONCLUSION: Metal and density parameters offer a stable, measurable, and statistically significant nuancing beyond traditional register classifications. Participants produced stable vocal configurations across more than three octaves. Register boundaries are fluid and singers can manipulate vocal fold vibratory patterns and vocal tract setup to maintain timbral consistency across an extended range.

Additional Links: PMID-40640024

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40640024,
year = {2025},
author = {Aaen, M and Frič, M},
title = {Going Beyond the Register-Vocal Mode Categorization Across Four Octaves in Professional Male and Female Singing Voice Using Voice Range Profile, EGG, Acoustic, and Vibroacoustic Measurements: Double-Case Study.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.06.003},
pmid = {40640024},
issn = {1873-4588},
abstract = {INTRODUCTION: Registers are considered a fundamental yet controversial topic in relation to the lack of generally accepted definitions and lack of consensus on pedagogical application. Recent discoveries suggest that singers are able to use specific vocal tract configurations across upward of three octaves without register changes. Previous research has shown differences between registers at the glottal level as well as in perceptual, acoustic, physiological, and aerodynamic dimensions, while other studies have suggested that singers can maintain specific vocal configurations past previously observed register shifts. This study investigates several loudness and vocal quality conditions across 3-4 octaves.

METHODS: Double-case study of two professional singers (F1-AFAB, M1-AMAB) performed vocal tasks based on Complete Vocal Technique (CVT) modes, singing sustained vowels ([e:], [ʌ:] for Curbing) at varying intensities (MP, F, and FF) across their vocal range (three octaves and four octaves, respectively). A total of 572 (F1) and 516 (M1) samples were collected and analyzed using MATLAB and Praat, focusing on sound pressure level (SPL), fo, formants (F1-F4), spectral parameters (SPR, L1-L2, and SCO-2.5), jitter, shimmer, smoothed cepstral peak prominence (CPPS), noise-to-harmonic ratio (NHR), and electroglottographic (EGG) analysis. Statistical methods included normality testing, t tests, Wilcoxon tests, Bonferroni correction, linear regression, and factor analysis.

RESULTS: Both singers produced samples in M1- and M2-type vibratory mechanisms extending beyond typically observed pitch regions for traditional registers, with the male covering a total of four octaves (C2-C6) with all conditions represented in three octaves, while the female singer covered a total of three octaves (D3-D6) with all conditions represented in almost two octaves. Metal and density parameters systematically and linearly varied with fo and influenced SPL, acoustic spectra, and EGG measures. Observed vibrational patterns for fuller and reduced metallic and fuller and reduced density conditions resembled patterns previously reported for modal or M1 type of vibratory patterns, whereas falsetto conditions resembled patterns typically reported for M2 or head voice-type vibratory patterns. Fuller metallic conditions exhibited higher SPL and Qx, while non-metallic conditions showed lower SPL and EGG-waveform magnitude. Factorial analysis revealed distinct statistical differentiation for metal and density variations. Classical samples showed restricted use of vocal modes throughout the range, whereas Contemporary Commercial Music (CCM) samples did not. The findings highlight the complex and nuanced interplay between glottal behavior and vocal tract setup to offer new perspectives on how singers navigate throughout their vocal range.

CONCLUSION: Metal and density parameters offer a stable, measurable, and statistically significant nuancing beyond traditional register classifications. Participants produced stable vocal configurations across more than three octaves. Register boundaries are fluid and singers can manipulate vocal fold vibratory patterns and vocal tract setup to maintain timbral consistency across an extended range.},
}

RevDate: 2025-07-10

Türüdü S, F Gül (2025)

Effects of Moderate and Severe Obstructive Sleep Apnea on Subjective and Objective Voice Outcomes Excluding Confounding Factors.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00252-8 [Epub ahead of print].

OBJECTIVES: This study aimed to determine the impact of obstructive sleep apnea (OSA) severity (moderate vs severe) on subjective and objective voice outcomes after excluding common confounders such as obesity, smoking, alcohol, and laryngopharyngeal reflux.

METHODS: In total, 104 patients (49 female, 55 male) with moderate (n = 53) or severe (n = 51) OSA participated. Subjective voice was assessed using Turkish versions of the Voice Handicap Index (Tr-VHI) and the Voice-Related Quality of Life (Tr-V-RQOL). Objective voice analysis of sustained Turkish vowels measured fundamental frequency (F0), formants, perturbation (jitter, shimmer), harmonicity (HNR), and spectral parameters. Group comparisons and correlations with the Apnea-Hypopnea Index (AHI), controlling for age and gender, were performed.

RESULTS: Polysomnography confirmed greater disease severity in the severe OSA group. No significant differences emerged in subjective voice outcomes (Tr-VHI, Tr-V-RQOL) between moderate and severe OSA groups. Objective acoustic analyses, however, revealed significant alterations in the severe OSA group, including increased shimmer, reduced HNR, and changes in specific formant frequencies. The AHI correlated significantly with several objective acoustic parameters but not with subjective voice scores.

CONCLUSION: This confounder-controlled study reveals that severe OSA results in a more pronounced detrimental impact on objective acoustic parameters than moderate OSA. While subjective voice outcomes did not differ by OSA severity, the identified objective vocal alterations in severe OSA suggest clinical relevance, enhancing the understanding of voice changes as an OSA-related comorbidity.

Additional Links: PMID-40640022

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40640022,
year = {2025},
author = {Türüdü, S and Gül, F},
title = {Effects of Moderate and Severe Obstructive Sleep Apnea on Subjective and Objective Voice Outcomes Excluding Confounding Factors.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.06.025},
pmid = {40640022},
issn = {1873-4588},
abstract = {OBJECTIVES: This study aimed to determine the impact of obstructive sleep apnea (OSA) severity (moderate vs severe) on subjective and objective voice outcomes after excluding common confounders such as obesity, smoking, alcohol, and laryngopharyngeal reflux.

METHODS: In total, 104 patients (49 female, 55 male) with moderate (n = 53) or severe (n = 51) OSA participated. Subjective voice was assessed using Turkish versions of the Voice Handicap Index (Tr-VHI) and the Voice-Related Quality of Life (Tr-V-RQOL). Objective voice analysis of sustained Turkish vowels measured fundamental frequency (F0), formants, perturbation (jitter, shimmer), harmonicity (HNR), and spectral parameters. Group comparisons and correlations with the Apnea-Hypopnea Index (AHI), controlling for age and gender, were performed.

RESULTS: Polysomnography confirmed greater disease severity in the severe OSA group. No significant differences emerged in subjective voice outcomes (Tr-VHI, Tr-V-RQOL) between moderate and severe OSA groups. Objective acoustic analyses, however, revealed significant alterations in the severe OSA group, including increased shimmer, reduced HNR, and changes in specific formant frequencies. The AHI correlated significantly with several objective acoustic parameters but not with subjective voice scores.

CONCLUSION: This confounder-controlled study reveals that severe OSA results in a more pronounced detrimental impact on objective acoustic parameters than moderate OSA. While subjective voice outcomes did not differ by OSA severity, the identified objective vocal alterations in severe OSA suggest clinical relevance, enhancing the understanding of voice changes as an OSA-related comorbidity.},
}

RevDate: 2025-07-10

Daşdöğen Ü, Awan SN, Koilar R, et al (2025)

Preliminary Data on Cortical and Acoustical Correlates of Voice Training Under Internal vs External Focus of Attention Conditions.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00232-2 [Epub ahead of print].

PURPOSE: Prior research has emphasized the superiority of an external focus of attention for motor learning. According to the Constrained Action Hypothesis, an external focus encourages automaticity by directing attention to movement outcomes, while an internal focus is thought to impair learning by promoting controlled processing directed at biomechanics. However, prior research has largely conflated two attentional constructs: the locus of attention (external vs internal) and the target of attention (movement outcomes vs biomechanics). Therefore, the purpose of this study was to experimentally separate these constructs within the context of a voice learning task.

METHOD: Sixteen participants ages 18-38 years with no history of voice disorders or voice training, were randomly assigned to either an external or internal focus of attention group. The task was to produce voice as "clearly" as possible, using an external (sound in the room) or an internal (anterior oral vibrations) focus. A cognitive control task was the Tower of Hanoi game (TOH). Dependent variables across baseline, intervention, and retention phases were spectral moments, spectral slope, and formant cluster prominence (FCP), as well as cognitive effort estimated from functional near-infrared spectroscopy (fNIRS).

RESULTS: The external focus group showed significant increases in spectral mean, standard deviation, and FCP over time, while the internal focus group showed declines or stabilization for the same measures. Neurophysiological oxygenation measures were lower during voice tasks than for the TOH, with no group effect.

CONCLUSIONS: Findings challenge the notion that an internal focus is inferior for motor learning. When locus of attention (external vs internal) and target (outcomes vs biomechanics) are decoupled, external focus does not consistently outperform internal focus. Rather than magnitude of learning, locus of attention appeared to affect learning content. Participants instructed to focus on oral vibratory sensations aligned spectral output with oral vibratory receptor frequency sensitivities, while those focused on room sound aligned spectral output with auditory receptor frequency sensitivities. An internal focus on movement outcomes was not associated with increased cognitive effort over an external focus. Findings also call into question traditional stage models of motor learning suggesting a fixed progression from controlled to automatic processes earlier vs later in learning.

Additional Links: PMID-40640020

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40640020,
year = {2025},
author = {Daşdöğen, Ü and Awan, SN and Koilar, R and Getchell, N and Roth, D and Verdolini-Abbott, K},
title = {Preliminary Data on Cortical and Acoustical Correlates of Voice Training Under Internal vs External Focus of Attention Conditions.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.06.013},
pmid = {40640020},
issn = {1873-4588},
abstract = {PURPOSE: Prior research has emphasized the superiority of an external focus of attention for motor learning. According to the Constrained Action Hypothesis, an external focus encourages automaticity by directing attention to movement outcomes, while an internal focus is thought to impair learning by promoting controlled processing directed at biomechanics. However, prior research has largely conflated two attentional constructs: the locus of attention (external vs internal) and the target of attention (movement outcomes vs biomechanics). Therefore, the purpose of this study was to experimentally separate these constructs within the context of a voice learning task.

METHOD: Sixteen participants ages 18-38 years with no history of voice disorders or voice training, were randomly assigned to either an external or internal focus of attention group. The task was to produce voice as "clearly" as possible, using an external (sound in the room) or an internal (anterior oral vibrations) focus. A cognitive control task was the Tower of Hanoi game (TOH). Dependent variables across baseline, intervention, and retention phases were spectral moments, spectral slope, and formant cluster prominence (FCP), as well as cognitive effort estimated from functional near-infrared spectroscopy (fNIRS).

RESULTS: The external focus group showed significant increases in spectral mean, standard deviation, and FCP over time, while the internal focus group showed declines or stabilization for the same measures. Neurophysiological oxygenation measures were lower during voice tasks than for the TOH, with no group effect.

CONCLUSIONS: Findings challenge the notion that an internal focus is inferior for motor learning. When locus of attention (external vs internal) and target (outcomes vs biomechanics) are decoupled, external focus does not consistently outperform internal focus. Rather than magnitude of learning, locus of attention appeared to affect learning content. Participants instructed to focus on oral vibratory sensations aligned spectral output with oral vibratory receptor frequency sensitivities, while those focused on room sound aligned spectral output with auditory receptor frequency sensitivities. An internal focus on movement outcomes was not associated with increased cognitive effort over an external focus. Findings also call into question traditional stage models of motor learning suggesting a fixed progression from controlled to automatic processes earlier vs later in learning.},
}

RevDate: 2025-07-04

Saraç AB, Cangi ME, G Yılmaz (2025)

A comparison of motor speech parameters of school-age children who do and do not stutter after visual and auditory emotional stimuli.

International journal of pediatric otorhinolaryngology, 195:112423 pii:S0165-5876(25)00210-1 [Epub ahead of print].

OBJECTIVE: This study compared the effects of emotional reactivity displayed by children who stutter (CWS) and children who do not stutter (CWNS) in response to visual and auditory stimuli on motor-speech parameters.

METHOD: In total, 61 children-20 CWS and 41 CWNS-aged 7-12 years participated in the study. The International Affective Picture System (IAPS) and the International Affective Digitized Sounds (IADS) were used to observe pleasant, unpleasant, and neutral emotions, and the Self-Assessment Manikin (SAM) was used to evaluate subjective perception of emotional reactivity. The second formant (F2) transition rate, diadochokinetic (DDK) rate, and syllabic rate parameters were taken as variables within the scope of speech-motor performance.

RESULTS: CWS showed significantly lower performance in all motor parameters of all stimulus types (visual-auditory) and qualities (pleasant-unpleasant-neutral) than CWNS. Regardless of stimulus type, significantly reduced rates were observed for unpleasant stimuli in the syllabic and DDK rates in both groups. In the examination of the changes in the interaction effect plot of the DDK parameter, which had an interaction effect of stimulus quality and type, it was observed that the DDK rate significantly decreased from the neutral stimuli to the unpleasant stimuli in CWS. In both groups, these changes were more noticeable in the IADS stimuli than in the IAPS stimuli.

CONCLUSION: The consensus that CWS is characterized by emotional reactivity and motor deficits was supported by the variables included in the study and extended by including the effects of visual-emotional and auditory-emotional stimuli in the design.

Additional Links: PMID-40614368

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40614368,
year = {2025},
author = {Saraç, AB and Cangi, ME and Yılmaz, G},
title = {A comparison of motor speech parameters of school-age children who do and do not stutter after visual and auditory emotional stimuli.},
journal = {International journal of pediatric otorhinolaryngology},
volume = {195},
number = {},
pages = {112423},
doi = {10.1016/j.ijporl.2025.112423},
pmid = {40614368},
issn = {1872-8464},
abstract = {OBJECTIVE: This study compared the effects of emotional reactivity displayed by children who stutter (CWS) and children who do not stutter (CWNS) in response to visual and auditory stimuli on motor-speech parameters.

METHOD: In total, 61 children-20 CWS and 41 CWNS-aged 7-12 years participated in the study. The International Affective Picture System (IAPS) and the International Affective Digitized Sounds (IADS) were used to observe pleasant, unpleasant, and neutral emotions, and the Self-Assessment Manikin (SAM) was used to evaluate subjective perception of emotional reactivity. The second formant (F2) transition rate, diadochokinetic (DDK) rate, and syllabic rate parameters were taken as variables within the scope of speech-motor performance.

RESULTS: CWS showed significantly lower performance in all motor parameters of all stimulus types (visual-auditory) and qualities (pleasant-unpleasant-neutral) than CWNS. Regardless of stimulus type, significantly reduced rates were observed for unpleasant stimuli in the syllabic and DDK rates in both groups. In the examination of the changes in the interaction effect plot of the DDK parameter, which had an interaction effect of stimulus quality and type, it was observed that the DDK rate significantly decreased from the neutral stimuli to the unpleasant stimuli in CWS. In both groups, these changes were more noticeable in the IADS stimuli than in the IAPS stimuli.

CONCLUSION: The consensus that CWS is characterized by emotional reactivity and motor deficits was supported by the variables included in the study and extended by including the effects of visual-emotional and auditory-emotional stimuli in the design.},
}

RevDate: 2025-07-03
CmpDate: 2025-07-03

Berg KA, Noble JH, Dawant BM, et al (2025)

Evaluating Selective Apical Electrode Deactivation for Improving Cochlear Implant Outcomes.

Trends in hearing, 29:23312165251353638.

This prospective study investigated the potential benefits of deactivating the second most apical electrode to improve access to lower-frequency pitch and first formant information to help improve speech and music outcomes with a cochlear implant. Twenty-one adults (30 ears) with cochlear implants completed an A-B-A-B study to compare the participant's clinical map with all electrodes active (A) and their clinical map with the second most apical electrode deactivated (B). Test measures included pitch discrimination, speech understanding in noise, and subjective musical sound quality and enjoyment ratings. This study also investigated the impact of participant demographic and electrode placement factors on the degree of benefit derived from the experimental map (B). There was no significant difference between the two conditions on any measure at the group level. However, individual participants demonstrated improvements in pitch discrimination (33.3%), speech perception in noise (43.3%), musical sound quality (50.0%), and musical enjoyment (40.0%). Musical sound quality and enjoyment ratings were strongly correlated, and speech perception correlated with musical enjoyment but not sound quality. Electrodes outside scala tympani, smaller electrode-to-modiolus distances, and certain device manufacturers (Cochlear and MED-EL) predicted greater benefit from deactivating the second-most apical electrode. Certain adult cochlear implant users may benefit from selective apical electrode deactivation, depending on their demographic and electrode placement profile. Clinicians could consider deactivating the second most apical electrode with patients, who report poor musical sound quality or those who have disengaged from music since receiving their CI to assess potential benefits individually.

Additional Links: PMID-40607992

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40607992,
year = {2025},
author = {Berg, KA and Noble, JH and Dawant, BM and Gifford, RH},
title = {Evaluating Selective Apical Electrode Deactivation for Improving Cochlear Implant Outcomes.},
journal = {Trends in hearing},
volume = {29},
number = {},
pages = {23312165251353638},
doi = {10.1177/23312165251353638},
pmid = {40607992},
issn = {2331-2165},
mesh = {Humans ; *Cochlear Implants ; Male ; Female ; Middle Aged ; Speech Perception ; Prospective Studies ; *Cochlear Implantation/instrumentation/methods ; Aged ; Music ; Adult ; *Persons with Hearing Disabilities/rehabilitation/psychology ; Treatment Outcome ; Noise/adverse effects ; Pitch Discrimination ; Acoustic Stimulation ; Young Adult ; Aged, 80 and over ; Prosthesis Design ; Pleasure ; Pitch Perception ; },
abstract = {This prospective study investigated the potential benefits of deactivating the second most apical electrode to improve access to lower-frequency pitch and first formant information to help improve speech and music outcomes with a cochlear implant. Twenty-one adults (30 ears) with cochlear implants completed an A-B-A-B study to compare the participant's clinical map with all electrodes active (A) and their clinical map with the second most apical electrode deactivated (B). Test measures included pitch discrimination, speech understanding in noise, and subjective musical sound quality and enjoyment ratings. This study also investigated the impact of participant demographic and electrode placement factors on the degree of benefit derived from the experimental map (B). There was no significant difference between the two conditions on any measure at the group level. However, individual participants demonstrated improvements in pitch discrimination (33.3%), speech perception in noise (43.3%), musical sound quality (50.0%), and musical enjoyment (40.0%). Musical sound quality and enjoyment ratings were strongly correlated, and speech perception correlated with musical enjoyment but not sound quality. Electrodes outside scala tympani, smaller electrode-to-modiolus distances, and certain device manufacturers (Cochlear and MED-EL) predicted greater benefit from deactivating the second-most apical electrode. Certain adult cochlear implant users may benefit from selective apical electrode deactivation, depending on their demographic and electrode placement profile. Clinicians could consider deactivating the second most apical electrode with patients, who report poor musical sound quality or those who have disengaged from music since receiving their CI to assess potential benefits individually.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Cochlear Implants
Male
Female
Middle Aged
Speech Perception
Prospective Studies
*Cochlear Implantation/instrumentation/methods
Aged
Music
Adult
*Persons with Hearing Disabilities/rehabilitation/psychology
Treatment Outcome
Noise/adverse effects
Pitch Discrimination
Acoustic Stimulation
Young Adult
Aged, 80 and over
Prosthesis Design
Pleasure
Pitch Perception

RevDate: 2025-06-29

Çınar B, Yılmaz G, Konrot A, et al (2025)

Effects of Negative Emotions and Personality Traits on Laryngeal and Speech Motor Control.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00217-6 [Epub ahead of print].

OBJECTIVE: The purpose of this study was to identify the effects of negative affective stimuli on oral and laryngeal motor control. It was also aimed to examine the role of personality traits in emotion-motor interaction.

METHOD: Thirty-five female volunteers (age range: 18-25) were included in the study. Affective stimuli were selected from within the Nencki Affective Picture System (70 pictures with neutral affective valence and 70 with negative affective valence). Participants' personality traits were assessed using the five-factor personality inventory (FFPI). Skin conductance response and heart rate variability assessments were made simultaneously with the presentation of affective pictures. Oral and laryngeal motor skills were assessed via seven vocal tasks: one task was based on electroglottographic analysis (fundamental frequency-F0, vocal fold contact quotient-CQ, and EGG-jitter), and the remaining six tasks were based on acoustic analyses and included oral and laryngeal diadochokinesis (O-DDK and L-DDK), syllabic rate, and second formant transition rate (F2 rate) assessments.

RESULTS: The abductor L-DDK rate, adductor L-DDK rate, syllabic rate, and EGG-CQ values obtained in the presence of negative affective stimuli were statistically significantly higher than those obtained in the presence of neutral affective stimuli (P < 0.05). There was no statistically significant difference between the two stimulus types in terms of their F2rate and O-DDK rate values (P > 0.05). Moreover, neuroticism and extraversion as personality traits were significantly correlated with the F0, EGG-CQ, and EGG-jitter values obtained in the presence of negative affective stimuli.

CONCLUSION: Negative affective stimuli led to an increase in oral and laryngeal motor movement speed by likely activating the sympathetic nervous system. There was also an increase in the CQ of the vocal folds. Furthermore, participants who scored higher on the FFPI personality trait of "extraversion" had lower EGG-CQ scores, while those who scored higher on the personality trait of "neuroticism" had higher EGG average jitter and F0 scores.

Additional Links: PMID-40582948

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40582948,
year = {2025},
author = {Çınar, B and Yılmaz, G and Konrot, A and Zraick, RI and Saraç, AB and Eteş, H and Değer, Z},
title = {Effects of Negative Emotions and Personality Traits on Laryngeal and Speech Motor Control.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.05.032},
pmid = {40582948},
issn = {1873-4588},
abstract = {OBJECTIVE: The purpose of this study was to identify the effects of negative affective stimuli on oral and laryngeal motor control. It was also aimed to examine the role of personality traits in emotion-motor interaction.

METHOD: Thirty-five female volunteers (age range: 18-25) were included in the study. Affective stimuli were selected from within the Nencki Affective Picture System (70 pictures with neutral affective valence and 70 with negative affective valence). Participants' personality traits were assessed using the five-factor personality inventory (FFPI). Skin conductance response and heart rate variability assessments were made simultaneously with the presentation of affective pictures. Oral and laryngeal motor skills were assessed via seven vocal tasks: one task was based on electroglottographic analysis (fundamental frequency-F0, vocal fold contact quotient-CQ, and EGG-jitter), and the remaining six tasks were based on acoustic analyses and included oral and laryngeal diadochokinesis (O-DDK and L-DDK), syllabic rate, and second formant transition rate (F2 rate) assessments.

RESULTS: The abductor L-DDK rate, adductor L-DDK rate, syllabic rate, and EGG-CQ values obtained in the presence of negative affective stimuli were statistically significantly higher than those obtained in the presence of neutral affective stimuli (P < 0.05). There was no statistically significant difference between the two stimulus types in terms of their F2rate and O-DDK rate values (P > 0.05). Moreover, neuroticism and extraversion as personality traits were significantly correlated with the F0, EGG-CQ, and EGG-jitter values obtained in the presence of negative affective stimuli.

CONCLUSION: Negative affective stimuli led to an increase in oral and laryngeal motor movement speed by likely activating the sympathetic nervous system. There was also an increase in the CQ of the vocal folds. Furthermore, participants who scored higher on the FFPI personality trait of "extraversion" had lower EGG-CQ scores, while those who scored higher on the personality trait of "neuroticism" had higher EGG average jitter and F0 scores.},
}

RevDate: 2025-06-30

Merrikhi Y, Parsa M, A Daliri (2025)

An Integrated Approach to Concurrently Measure Corrective and Adaptive Responses to Auditory Errors.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The brain relies on feedforward and feedback control systems to produce speech movements. Both control systems use auditory errors to generate responses that ensure the accuracy of speech movements. Traditionally, separate auditory perturbation paradigms are used to examine these control systems in isolation; however, this conventional practice is time-consuming and poses practical challenges. This study aimed to develop a new paradigm to examine both control systems concurrently.

METHOD: We applied different auditory perturbation magnitudes (0, 125, 250, and 500 Hz) and directions (ε-to-ɪ and ε-to-ӕ) that randomly changed every six trials. We measured formant changes during early (0-100 ms) and late (200-300 ms) time points of production. Early response was used to calculate adaptive responses (a measure of the feedforward control system). The difference between late and early responses was used to calculate corrective responses (a measure of the feedback control system).

RESULTS: We found that participants produced (a) adaptive and corrective responses in the opposite direction of the perturbation direction and (b) proportionally larger adaptive and corrective responses to the smallest perturbation in the ε-to-ɪ direction. Additionally, participants who responded more to ε-to-ɪ perturbations also responded more to ε-to-ӕ perturbations.

CONCLUSION: These findings suggest that (a) the brain may have similar error sensitivity in the ε-to-ɪ and ε-to-ӕ directions and considers error magnitudes in preparing its responses to errors, and (b) our proposed paradigm is a promising approach to efficiently and concurrently measure the contributions of the feedback and feedforward controls systems.

Additional Links: PMID-40587266

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40587266,
year = {2025},
author = {Merrikhi, Y and Parsa, M and Daliri, A},
title = {An Integrated Approach to Concurrently Measure Corrective and Adaptive Responses to Auditory Errors.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-11},
doi = {10.1044/2025_JSLHR-24-00572},
pmid = {40587266},
issn = {1558-9102},
abstract = {PURPOSE: The brain relies on feedforward and feedback control systems to produce speech movements. Both control systems use auditory errors to generate responses that ensure the accuracy of speech movements. Traditionally, separate auditory perturbation paradigms are used to examine these control systems in isolation; however, this conventional practice is time-consuming and poses practical challenges. This study aimed to develop a new paradigm to examine both control systems concurrently.

METHOD: We applied different auditory perturbation magnitudes (0, 125, 250, and 500 Hz) and directions (ε-to-ɪ and ε-to-ӕ) that randomly changed every six trials. We measured formant changes during early (0-100 ms) and late (200-300 ms) time points of production. Early response was used to calculate adaptive responses (a measure of the feedforward control system). The difference between late and early responses was used to calculate corrective responses (a measure of the feedback control system).

RESULTS: We found that participants produced (a) adaptive and corrective responses in the opposite direction of the perturbation direction and (b) proportionally larger adaptive and corrective responses to the smallest perturbation in the ε-to-ɪ direction. Additionally, participants who responded more to ε-to-ɪ perturbations also responded more to ε-to-ӕ perturbations.

CONCLUSION: These findings suggest that (a) the brain may have similar error sensitivity in the ε-to-ɪ and ε-to-ӕ directions and considers error magnitudes in preparing its responses to errors, and (b) our proposed paradigm is a promising approach to efficiently and concurrently measure the contributions of the feedback and feedforward controls systems.},
}

RevDate: 2025-06-24

Chappie K, Kell S, Qi D, et al (2025)

Comparing Phoneme Speech Recordings and Acoustic App Data Capture Experience for Android and iOS Mobile Device Users in the Large Decentralized AcRIS Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00201-2 [Epub ahead of print].

OBJECTIVE: Growth in telehealth and interest in decentralized clinical trials have motivated the need to understand Android and iPhone Operating System (iOS) impacts on remote, mobile app speech capture. This research investigated Android and iOS device differences in 16 acoustic, physiologically based speech features extracted from phonemes /i/ and /m/, how in-app instructions impacted maximum phonation time (MPT) for /a/, and a mobile app design consideration to capture quality signals for acoustic analyses.

METHODS: Acoustic features were auto-extracted from vocalizations recorded on the personal cell phones of 6505 subjects as part of a 6-8 weeks longitudinal, at-home, clinical trial. Feature averages were computed for 693 self-reported healthy participants (no chronic or acute conditions). Wilcoxon two-sample tests comparing mean feature values from Android and iOS speech recordings were computed for these self-reported healthy participants.

RESULTS: Periodic measures such as harmonicity differed between Android and iOS, with iOS registering more periodic content. Energy-related features demonstrated lower levels of high-frequency energy in the iOS results. Signal-to-noise ratio and coefficient of variation in fundamental frequency measured similarly across Android and iOS, as did the first three formants for /i/. All features showed more variability in Android devices than in iOS devices. Average background noise intensity levels were lower for Android. MPT averages were longer on the sustained /a/ task after a study pause period.

CONCLUSIONS: Measurement differences were found between Android and iOS devices for several features that have historically been used to describe disease change. Device differences impacted participant experience in recording their speech, with iOS users having more difficulty passing in-app intensity-based background noise checks. An app instruction update to the sustained /a/ task during the study pause period resulted in longer MPT averages, demonstrating the importance that app instructions play in remote clinical trials.

TRIAL REGISTRATION NUMBER: NCT04748445.

Additional Links: PMID-40555587

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40555587,
year = {2025},
author = {Chappie, K and Kell, S and Qi, D and Selig, J and Christakis, Y and Moreno, X and Severson, J and Best, A and Wacnik, P and Santamaria, M and Zhang, Y and Fry, BA and Mather, RJ},
title = {Comparing Phoneme Speech Recordings and Acoustic App Data Capture Experience for Android and iOS Mobile Device Users in the Large Decentralized AcRIS Study.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.05.016},
pmid = {40555587},
issn = {1873-4588},
abstract = {OBJECTIVE: Growth in telehealth and interest in decentralized clinical trials have motivated the need to understand Android and iPhone Operating System (iOS) impacts on remote, mobile app speech capture. This research investigated Android and iOS device differences in 16 acoustic, physiologically based speech features extracted from phonemes /i/ and /m/, how in-app instructions impacted maximum phonation time (MPT) for /a/, and a mobile app design consideration to capture quality signals for acoustic analyses.

METHODS: Acoustic features were auto-extracted from vocalizations recorded on the personal cell phones of 6505 subjects as part of a 6-8 weeks longitudinal, at-home, clinical trial. Feature averages were computed for 693 self-reported healthy participants (no chronic or acute conditions). Wilcoxon two-sample tests comparing mean feature values from Android and iOS speech recordings were computed for these self-reported healthy participants.

RESULTS: Periodic measures such as harmonicity differed between Android and iOS, with iOS registering more periodic content. Energy-related features demonstrated lower levels of high-frequency energy in the iOS results. Signal-to-noise ratio and coefficient of variation in fundamental frequency measured similarly across Android and iOS, as did the first three formants for /i/. All features showed more variability in Android devices than in iOS devices. Average background noise intensity levels were lower for Android. MPT averages were longer on the sustained /a/ task after a study pause period.

CONCLUSIONS: Measurement differences were found between Android and iOS devices for several features that have historically been used to describe disease change. Device differences impacted participant experience in recording their speech, with iOS users having more difficulty passing in-app intensity-based background noise checks. An app instruction update to the sustained /a/ task during the study pause period resulted in longer MPT averages, demonstrating the importance that app instructions play in remote clinical trials.

TRIAL REGISTRATION NUMBER: NCT04748445.},
}

RevDate: 2025-06-24

Yılmaz G, Saraç AB, Konrot A, et al (2025)

Speech motor control and laryngeal diadochokinesis in typically developing normophonic children.

International journal of pediatric otorhinolaryngology, 195:112435 pii:S0165-5876(25)00222-8 [Epub ahead of print].

OBJECTIVE: This study aimed to evaluate speech motor control and laryngeal diadochokinesis in terms of age and sex in a typically developing, normophonic pediatric population using a computer-assisted analysis method, and to establish normative data for the assessed parameters.

METHODS: The sample of the study included 427 typically developing, normophonic children between the ages of 7 and 16 years. While 48.01 % (n = 205) of the participants were female, 51.99 % (n = 222) were male. The participants were divided into 3 age groups: 7-9 (male n = 87; female n = 82), 10-12 (male n = 50; female n = 47), and 13-16 (male n = 85; female n = 77). The acoustic analyses were carried out using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ, USA). The analysis protocols consisted of oral diadochokinetic (DDK) rate, laryngeal DDK rate, second formant (F2) transition rate, and general syllabic rate. The alternate motion rate (AMR) and sequential motion rate (SMR) tasks were used for oral DDK assessments. Laryngeal DDK was assessed using 2 tasks, abductor (/hʌ/ and /hi/) and adductor (/ʔʌ/ and /ʔi/).

RESULTS: Normative data were obtained for speech motor control and laryngeal DDK for children aged 7-16 years who were native speakers of Turkish. As age increased, oral and laryngeal DDK rates, F2 transition rates, and syllabic rates increased in both sexes. Additionally, only in the oral AMR-DDK analyses, DDK stabilization was observed to increase (DDK-jitter decreased). No statistically significant differences associated with age were observed in the DDK-jitter values in the other DDK analyses (p > 0.05). Sex-based differences were only observed in the syllabic rate analyses, and the syllabic rate values of the female participants were lower than those of the male participants in all age groups.

CONCLUSION: The pediatric normative database presented in this study can offer reference ranges for further studies involving the analyses of changes in oral and laryngeal motor control that may arise due to various developmental or neurological problems.

Additional Links: PMID-40554903

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40554903,
year = {2025},
author = {Yılmaz, G and Saraç, AB and Konrot, A and Baki, SN and Alpan, E and Bilgiç, HC and Demiryürek, P and Doğan, NN},
title = {Speech motor control and laryngeal diadochokinesis in typically developing normophonic children.},
journal = {International journal of pediatric otorhinolaryngology},
volume = {195},
number = {},
pages = {112435},
doi = {10.1016/j.ijporl.2025.112435},
pmid = {40554903},
issn = {1872-8464},
abstract = {OBJECTIVE: This study aimed to evaluate speech motor control and laryngeal diadochokinesis in terms of age and sex in a typically developing, normophonic pediatric population using a computer-assisted analysis method, and to establish normative data for the assessed parameters.

METHODS: The sample of the study included 427 typically developing, normophonic children between the ages of 7 and 16 years. While 48.01 % (n = 205) of the participants were female, 51.99 % (n = 222) were male. The participants were divided into 3 age groups: 7-9 (male n = 87; female n = 82), 10-12 (male n = 50; female n = 47), and 13-16 (male n = 85; female n = 77). The acoustic analyses were carried out using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ, USA). The analysis protocols consisted of oral diadochokinetic (DDK) rate, laryngeal DDK rate, second formant (F2) transition rate, and general syllabic rate. The alternate motion rate (AMR) and sequential motion rate (SMR) tasks were used for oral DDK assessments. Laryngeal DDK was assessed using 2 tasks, abductor (/hʌ/ and /hi/) and adductor (/ʔʌ/ and /ʔi/).

RESULTS: Normative data were obtained for speech motor control and laryngeal DDK for children aged 7-16 years who were native speakers of Turkish. As age increased, oral and laryngeal DDK rates, F2 transition rates, and syllabic rates increased in both sexes. Additionally, only in the oral AMR-DDK analyses, DDK stabilization was observed to increase (DDK-jitter decreased). No statistically significant differences associated with age were observed in the DDK-jitter values in the other DDK analyses (p > 0.05). Sex-based differences were only observed in the syllabic rate analyses, and the syllabic rate values of the female participants were lower than those of the male participants in all age groups.

CONCLUSION: The pediatric normative database presented in this study can offer reference ranges for further studies involving the analyses of changes in oral and laryngeal motor control that may arise due to various developmental or neurological problems.},
}

RevDate: 2025-06-24
CmpDate: 2025-06-24

Li J, Wang Y, Wang F, et al (2025)

Using Speech Features and Machine Learning Models to Predict Emotional and Behavioral Problems in Chinese Adolescents.

Depression and anxiety, 2025:5734107.

Background: Current assessments of adolescent emotional and behavioral problems rely heavily on subjective reports, which are prone to biases. Aim: This study is the first to explore the potential of speech signals as objective markers for predicting emotional and behavioral problems (hyperactivity, emotional symptoms, conduct problems, and peer problems) in adolescents using machine learning techniques. Materials and Methods: We analyzed speech data from 8215 adolescents aged 12-18 years, extracting four categories of speech features: mel-frequency cepstral coefficients (MFCC), mel energy spectrum (MELS), prosodic features (PROS), and formant features (FORM). Machine learning models-logistic regression (LR), support vector machine (SVM), and gradient boosting decision trees (GBDT)-were employed to classify hyperactivity, emotional symptoms, conduct problems, and peer problems as defined by the Strengths and Difficulties Questionnaire (SDQ). Model performance was assessed using area under the curve (AUC), F1-score, and Shapley additive explanations (SHAP) values. Results: The GBDT model achieved the highest accuracy for predicting hyperactivity (AUC = 0.78) and emotional symptoms (AUC = 0.74 for males and 0.66 for females), while performance was weaker for conduct and peer problems. SHAP analysis revealed gender-specific feature importance patterns, with certain speech features being more critical for males than females. Conclusion: These findings demonstrate the feasibility of using speech features to objectively predict emotional and behavioral problems in adolescents and identify gender-specific markers. This study lays the foundation for developing speech-based assessment tools for early identification and intervention, offering an objective alternative to traditional subjective evaluation methods.

Additional Links: PMID-40551876

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40551876,
year = {2025},
author = {Li, J and Wang, Y and Wang, F and Zhang, R and Wang, N and Zhu, Y and Zhao, T},
title = {Using Speech Features and Machine Learning Models to Predict Emotional and Behavioral Problems in Chinese Adolescents.},
journal = {Depression and anxiety},
volume = {2025},
number = {},
pages = {5734107},
pmid = {40551876},
issn = {1520-6394},
mesh = {Humans ; Adolescent ; Male ; Female ; Child ; *Machine Learning ; *Problem Behavior/psychology ; *Speech/physiology ; *Affective Symptoms/diagnosis ; China ; *Adolescent Behavior ; East Asian People ; },
abstract = {Background: Current assessments of adolescent emotional and behavioral problems rely heavily on subjective reports, which are prone to biases. Aim: This study is the first to explore the potential of speech signals as objective markers for predicting emotional and behavioral problems (hyperactivity, emotional symptoms, conduct problems, and peer problems) in adolescents using machine learning techniques. Materials and Methods: We analyzed speech data from 8215 adolescents aged 12-18 years, extracting four categories of speech features: mel-frequency cepstral coefficients (MFCC), mel energy spectrum (MELS), prosodic features (PROS), and formant features (FORM). Machine learning models-logistic regression (LR), support vector machine (SVM), and gradient boosting decision trees (GBDT)-were employed to classify hyperactivity, emotional symptoms, conduct problems, and peer problems as defined by the Strengths and Difficulties Questionnaire (SDQ). Model performance was assessed using area under the curve (AUC), F1-score, and Shapley additive explanations (SHAP) values. Results: The GBDT model achieved the highest accuracy for predicting hyperactivity (AUC = 0.78) and emotional symptoms (AUC = 0.74 for males and 0.66 for females), while performance was weaker for conduct and peer problems. SHAP analysis revealed gender-specific feature importance patterns, with certain speech features being more critical for males than females. Conclusion: These findings demonstrate the feasibility of using speech features to objectively predict emotional and behavioral problems in adolescents and identify gender-specific markers. This study lays the foundation for developing speech-based assessment tools for early identification and intervention, offering an objective alternative to traditional subjective evaluation methods.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Adolescent
Male
Female
Child
*Machine Learning
*Problem Behavior/psychology
*Speech/physiology
*Affective Symptoms/diagnosis
China
*Adolescent Behavior
East Asian People

RevDate: 2025-06-10
CmpDate: 2025-06-10

Beaudry L, Gerber S, Perrier P, et al (2025)

Effects of a simultaneous lip tube and auditory feedback perturbation on the production of the French vowel /u/.

The Journal of the Acoustical Society of America, 157(6):4285-4299.

This study investigates the relative weight of somatosensory and auditory feedback in the production of the French vowel /u/ in a simultaneous lip tube and formant shift perturbation. To do so, 20 native Quebec French speakers were recruited. Three experimental conditions involving a lip tube, with each displaying a different auditory condition, were devised. In the first condition, auditory feedback was corrected by canceling the auditory effects of the lip tube using a formant shift. In the second condition, the corrected auditory feedback was replaced with white noise. Finally, access to natural auditory feedback was restored. The results reveal a diversity of compensation strategies depending on the participant. Although some participants rely on auditory feedback to compensate for the lip tube, others compensate before access to natural auditory feedback is restored. It is argued that this could be performed with internal predictions of the auditory feedback using somatosensory feedback, in line, among others, with the dual stream prediction model by Tian and Poppel [J. Cognit. Neurosci. 25(7), 1020--1036 (2013)].

Additional Links: PMID-40492699

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40492699,
year = {2025},
author = {Beaudry, L and Gerber, S and Perrier, P and Ménard, L},
title = {Effects of a simultaneous lip tube and auditory feedback perturbation on the production of the French vowel /u/.},
journal = {The Journal of the Acoustical Society of America},
volume = {157},
number = {6},
pages = {4285-4299},
doi = {10.1121/10.0036827},
pmid = {40492699},
issn = {1520-8524},
mesh = {Humans ; Male ; Female ; Adult ; *Feedback, Sensory ; Young Adult ; *Speech Acoustics ; *Lip/physiology/innervation ; Acoustic Stimulation ; *Phonetics ; *Speech Perception ; *Voice Quality ; Speech Production Measurement ; },
abstract = {This study investigates the relative weight of somatosensory and auditory feedback in the production of the French vowel /u/ in a simultaneous lip tube and formant shift perturbation. To do so, 20 native Quebec French speakers were recruited. Three experimental conditions involving a lip tube, with each displaying a different auditory condition, were devised. In the first condition, auditory feedback was corrected by canceling the auditory effects of the lip tube using a formant shift. In the second condition, the corrected auditory feedback was replaced with white noise. Finally, access to natural auditory feedback was restored. The results reveal a diversity of compensation strategies depending on the participant. Although some participants rely on auditory feedback to compensate for the lip tube, others compensate before access to natural auditory feedback is restored. It is argued that this could be performed with internal predictions of the auditory feedback using somatosensory feedback, in line, among others, with the dual stream prediction model by Tian and Poppel [J. Cognit. Neurosci. 25(7), 1020--1036 (2013)].},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Adult
*Feedback, Sensory
Young Adult
*Speech Acoustics
*Lip/physiology/innervation
Acoustic Stimulation
*Phonetics
*Speech Perception
*Voice Quality
Speech Production Measurement

RevDate: 2025-05-27
CmpDate: 2025-05-27

Singer N, Y Zaltz (2025)

Auditory Learning and Generalization in Older Adults: Evidence from Voice Discrimination Training.

Trends in hearing, 29:23312165251342436.

Auditory learning is essential for adapting to continuously changing acoustic environments. This adaptive capability, however, may be impacted by age-related declines in sensory and cognitive functions, potentially limiting learning efficiency and generalization in older adults. This study investigated auditory learning and generalization in 24 older (65-82 years) and 24 younger (18-34 years) adults through voice discrimination (VD) training. Participants were divided into training (12 older, 12 younger adults) and control groups (12 older, 12 younger adults). Trained participants completed five sessions: Two testing sessions assessing VD performance using a 2-down 1-up adaptive procedure with F0-only, formant-only, and combined F0 + formant cues, and three training sessions focusing exclusively on VD with F0 cues. Control groups participated only in the two testing sessions, with no intermediate training. Results revealed significant training-induced improvements in VD with F0 cues for both younger and older adults, with comparable learning efficiency and gains across groups. However, generalization to the formant-only cue was observed only in younger adults, suggesting limited learning transfer in older adults. Additionally, VD training did not improve performance in the combined F0 + formant condition beyond control group improvements, underscoring the specificity of perceptual learning. These findings provide novel insights into auditory learning in older adults, showing that while they retain the ability for significant auditory skill acquisition, age-related declines in perceptual flexibility may limit broader generalization. This study highlights the importance of designing targeted auditory interventions for older adults, considering their specific limitations in generalizing learning gains across different acoustic cues.

Additional Links: PMID-40420656

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40420656,
year = {2025},
author = {Singer, N and Zaltz, Y},
title = {Auditory Learning and Generalization in Older Adults: Evidence from Voice Discrimination Training.},
journal = {Trends in hearing},
volume = {29},
number = {},
pages = {23312165251342436},
doi = {10.1177/23312165251342436},
pmid = {40420656},
issn = {2331-2165},
mesh = {Humans ; Aged ; Male ; Female ; Young Adult ; Adult ; Aged, 80 and over ; Adolescent ; *Aging/psychology ; *Generalization, Psychological ; Cues ; Age Factors ; Acoustic Stimulation ; *Auditory Perception ; Transfer, Psychology ; *Learning ; *Speech Perception ; *Discrimination Learning ; },
abstract = {Auditory learning is essential for adapting to continuously changing acoustic environments. This adaptive capability, however, may be impacted by age-related declines in sensory and cognitive functions, potentially limiting learning efficiency and generalization in older adults. This study investigated auditory learning and generalization in 24 older (65-82 years) and 24 younger (18-34 years) adults through voice discrimination (VD) training. Participants were divided into training (12 older, 12 younger adults) and control groups (12 older, 12 younger adults). Trained participants completed five sessions: Two testing sessions assessing VD performance using a 2-down 1-up adaptive procedure with F0-only, formant-only, and combined F0 + formant cues, and three training sessions focusing exclusively on VD with F0 cues. Control groups participated only in the two testing sessions, with no intermediate training. Results revealed significant training-induced improvements in VD with F0 cues for both younger and older adults, with comparable learning efficiency and gains across groups. However, generalization to the formant-only cue was observed only in younger adults, suggesting limited learning transfer in older adults. Additionally, VD training did not improve performance in the combined F0 + formant condition beyond control group improvements, underscoring the specificity of perceptual learning. These findings provide novel insights into auditory learning in older adults, showing that while they retain the ability for significant auditory skill acquisition, age-related declines in perceptual flexibility may limit broader generalization. This study highlights the importance of designing targeted auditory interventions for older adults, considering their specific limitations in generalizing learning gains across different acoustic cues.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Aged
Male
Female
Young Adult
Adult
Aged, 80 and over
Adolescent
*Aging/psychology
*Generalization, Psychological
Cues
Age Factors
Acoustic Stimulation
*Auditory Perception
Transfer, Psychology
*Learning
*Speech Perception
*Discrimination Learning

RevDate: 2025-05-22
CmpDate: 2025-05-22

Cao Y, Cheng Y, Liu S, et al (2025)

Left hemisphere lateralization in unilateral upper motor neuron dysarthria via quantitative acoustic analysis.

Scientific reports, 15(1):17776.

This paper aimed to identify specific acoustic parameters F1, F2, and Vowel Space Area (VSA), Vowel Articulation Index (VAI), Formant Centralization Ratio (FCR) for evaluating speech in Mandarin-speaking individuals with Unilateral Upper Motor Neuron (UUMN) dysarthria. Additionally, it explored the correlation between dysarthria severity and lesion side based on these parameters and scale results. This study conducted comparative study using acoustic spectral analysis to analyze phonetic features among UUMN dysarthria (UUMND) patients and neurologically normal adults, and the Left-sided and Right-sided upper motor neuron dysarthria (UMND) groups. The Mandibular-Oral Motor Function Assessment Scale (MOMFAS) was used in the study. The acoustic parameters F1, F2 and VSA, VAI, FCR showed significant differences between individuals with UUMN dysarthria and neurologically normal adults. Comparing left-sided upper motor neuron (UMN) dysarthria patients with right-sided UMN dysarthria patients, a considerable increase in FCR was observed in the left-sided group, while VSA and VAI showed significant decreases. The mean scale score of left-sided UMN dysarthria patients was also significantly lower than that of individuals with right-sided UMN dysarthria. The severity of UUMND was more pronounced in individuals with left-sided lesions, providing supportive evidence of lateralization on the left hemisphere. The acoustic indices F1, F2 and VSA, VAI, FCR can sensitively reflect the vowel changes of UUMND patients. They could be utilized not only to describe the acoustic properties of UUMND patients but also to assess the effectiveness of rehabilitation therapy on impaired vowel articulation in such patients.

Additional Links: PMID-40404868

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40404868,
year = {2025},
author = {Cao, Y and Cheng, Y and Liu, S and Mou, Z},
title = {Left hemisphere lateralization in unilateral upper motor neuron dysarthria via quantitative acoustic analysis.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {17776},
pmid = {40404868},
issn = {2045-2322},
support = {2021A1515220049 and 2022A0505040007//Basic and Applied Basic Research Project of Guangzhou Basic Research Program/ ; 2019SKJ003 and KTP20190222//Special Project of Chinese government for Science and Technology of Guangdong Province/ ; 202201020046//Science and Technology Projects in Guangzhou City/ ; },
mesh = {Humans ; *Dysarthria/physiopathology ; Male ; Female ; Middle Aged ; Adult ; *Functional Laterality/physiology ; Aged ; *Speech Acoustics ; *Motor Neuron Disease/physiopathology ; *Motor Neurons/physiology/pathology ; Speech/physiology ; Phonetics ; },
abstract = {This paper aimed to identify specific acoustic parameters F1, F2, and Vowel Space Area (VSA), Vowel Articulation Index (VAI), Formant Centralization Ratio (FCR) for evaluating speech in Mandarin-speaking individuals with Unilateral Upper Motor Neuron (UUMN) dysarthria. Additionally, it explored the correlation between dysarthria severity and lesion side based on these parameters and scale results. This study conducted comparative study using acoustic spectral analysis to analyze phonetic features among UUMN dysarthria (UUMND) patients and neurologically normal adults, and the Left-sided and Right-sided upper motor neuron dysarthria (UMND) groups. The Mandibular-Oral Motor Function Assessment Scale (MOMFAS) was used in the study. The acoustic parameters F1, F2 and VSA, VAI, FCR showed significant differences between individuals with UUMN dysarthria and neurologically normal adults. Comparing left-sided upper motor neuron (UMN) dysarthria patients with right-sided UMN dysarthria patients, a considerable increase in FCR was observed in the left-sided group, while VSA and VAI showed significant decreases. The mean scale score of left-sided UMN dysarthria patients was also significantly lower than that of individuals with right-sided UMN dysarthria. The severity of UUMND was more pronounced in individuals with left-sided lesions, providing supportive evidence of lateralization on the left hemisphere. The acoustic indices F1, F2 and VSA, VAI, FCR can sensitively reflect the vowel changes of UUMND patients. They could be utilized not only to describe the acoustic properties of UUMND patients but also to assess the effectiveness of rehabilitation therapy on impaired vowel articulation in such patients.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Dysarthria/physiopathology
Male
Female
Middle Aged
Adult
*Functional Laterality/physiology
Aged
*Speech Acoustics
*Motor Neuron Disease/physiopathology
*Motor Neurons/physiology/pathology
Speech/physiology
Phonetics

RevDate: 2025-05-22

Nakamura G, Yamada H, Hirose A, et al (2025)

Discovery of sexual dimorphism of the laryngeal sac in the common minke whale Balaenoptera acutorostrata.

Anatomical record (Hoboken, N.J. : 2007) [Epub ahead of print].

Mysticetes, or baleen whales, have an air sac on the ventral surface of the larynx known as the "laryngeal sac." The primary hypothesis regarding this structure's function is that it is involved in sound production. However, several other functions have been proposed, including air recycling, air storage, and even buoyancy control. In this study, we analyzed ontogenetic development and sexual dimorphism of the laryngeal sac with an aim of elucidating the function of this organ. The larynx of 61 (male: n = 40, female: n = 21) common minke whales Balaenoptera acutorostrata, collected from off the Japanese coast were used for present study. We isolated the larynx, situated between the hyoid bone and the trachea, during the flensing process. Seven linear measurements were taken using calipers, and the weight was obtained using a digital weight scale. Allometric equation and proportions to total body length or weight were used to compare laryngeal morphological differences between sexes and maturity. Measurements of laryngeal sac size were significantly larger in sexually mature males. Furthermore, examination of two male individuals of approximately the same body length but different maturities showed the sexually mature male had a larger laryngeal sac, compared to sexually immature male. The thickness of the laryngeal sac's muscle wall and the volume of the sac's lumen may be related to testes development (sexually mature whales have heavier testes). Only the width of the hyoid bone (basihyal and paired thyrohyals) was proportionally constant within all measurement sites, regardless of sex or maturity. We propose that baleen whales utilize their well muscularly developed laryngeal sac in a manner analogous to the human tongue, actively modifying its shape and volume to influence vocal production. Specifically, this structure may function as a resonance filter that creates a formant structure and contributes to the modification of phonemes generated by the U-folds of the larynx. Furthermore, the ability to produce complex vocalizations through this mechanism may have led to the enlargement of the laryngeal sac in males via sexual selection, where it also serves as a signal of their reproductive status.

Additional Links: PMID-40401379

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40401379,
year = {2025},
author = {Nakamura, G and Yamada, H and Hirose, A and Maeda, H and Reidenberg, JS and Kato, H and Park, S and Fujise, Y},
title = {Discovery of sexual dimorphism of the laryngeal sac in the common minke whale Balaenoptera acutorostrata.},
journal = {Anatomical record (Hoboken, N.J. : 2007)},
volume = {},
number = {},
pages = {},
doi = {10.1002/ar.25681},
pmid = {40401379},
issn = {1932-8494},
support = {//Fisheries Agency of Japan/ ; },
abstract = {Mysticetes, or baleen whales, have an air sac on the ventral surface of the larynx known as the "laryngeal sac." The primary hypothesis regarding this structure's function is that it is involved in sound production. However, several other functions have been proposed, including air recycling, air storage, and even buoyancy control. In this study, we analyzed ontogenetic development and sexual dimorphism of the laryngeal sac with an aim of elucidating the function of this organ. The larynx of 61 (male: n = 40, female: n = 21) common minke whales Balaenoptera acutorostrata, collected from off the Japanese coast were used for present study. We isolated the larynx, situated between the hyoid bone and the trachea, during the flensing process. Seven linear measurements were taken using calipers, and the weight was obtained using a digital weight scale. Allometric equation and proportions to total body length or weight were used to compare laryngeal morphological differences between sexes and maturity. Measurements of laryngeal sac size were significantly larger in sexually mature males. Furthermore, examination of two male individuals of approximately the same body length but different maturities showed the sexually mature male had a larger laryngeal sac, compared to sexually immature male. The thickness of the laryngeal sac's muscle wall and the volume of the sac's lumen may be related to testes development (sexually mature whales have heavier testes). Only the width of the hyoid bone (basihyal and paired thyrohyals) was proportionally constant within all measurement sites, regardless of sex or maturity. We propose that baleen whales utilize their well muscularly developed laryngeal sac in a manner analogous to the human tongue, actively modifying its shape and volume to influence vocal production. Specifically, this structure may function as a resonance filter that creates a formant structure and contributes to the modification of phonemes generated by the U-folds of the larynx. Furthermore, the ability to produce complex vocalizations through this mechanism may have led to the enlargement of the laryngeal sac in males via sexual selection, where it also serves as a signal of their reproductive status.},
}

RevDate: 2025-05-15
CmpDate: 2025-05-15

Wright D, Westander J, P Jensen (2025)

Domestication effects on crowing in chickens: variation between wild and captive red junglefowl and domestic white Leghorn and the genetic architecture of crowing vocalizations.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 380(1926):20240199.

The crowing of the male chicken is a charismatic example of vocal display in a bird. It is regarded as the main territorial announcement of the ancestral red junglefowl. The call has been preserved throughout domestication, although several of its elements have been altered. To assess these alterations, we assayed crowing spectrograms from wild and captive-held red junglefowl populations from India, along with two red junglefowl populations held in long-term captivity in Sweden, and a domestic white Leghorn breed. We find consistent differences between the different Indian red junglefowl and the domestic white Leghorn for a range of characteristics, including the duration of the last syllable and the number of formants and their frequency in the last and second-to-last syllable. To analyse the genetic architecture of crowing vocalization, we performed a quantitative trait loci (QTL) experiment using a wild × domestic advanced intercross to identify QTL that explained a large percentage of the variation present for the duration of the last syllable and the number of formants in the second to last syllable. With this study we thus demonstrate consistent differences in red junglefowl and white Leghorn chickens and identify a relatively simple genetic architecture for some of these traits.This article is part of the theme issue 'Unravelling domestication: multi-disciplinary perspectives on human and non-human relationships in the past, present and future'.

Additional Links: PMID-40370017

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40370017,
year = {2025},
author = {Wright, D and Westander, J and Jensen, P},
title = {Domestication effects on crowing in chickens: variation between wild and captive red junglefowl and domestic white Leghorn and the genetic architecture of crowing vocalizations.},
journal = {Philosophical transactions of the Royal Society of London. Series B, Biological sciences},
volume = {380},
number = {1926},
pages = {20240199},
doi = {10.1098/rstb.2024.0199},
pmid = {40370017},
issn = {1471-2970},
support = {//Svenska Forskningsrådet Formas/ ; //Vetenskapsrådet/ ; },
mesh = {Animals ; *Chickens/genetics/physiology ; *Domestication ; *Vocalization, Animal ; Quantitative Trait Loci ; Male ; Sweden ; India ; },
abstract = {The crowing of the male chicken is a charismatic example of vocal display in a bird. It is regarded as the main territorial announcement of the ancestral red junglefowl. The call has been preserved throughout domestication, although several of its elements have been altered. To assess these alterations, we assayed crowing spectrograms from wild and captive-held red junglefowl populations from India, along with two red junglefowl populations held in long-term captivity in Sweden, and a domestic white Leghorn breed. We find consistent differences between the different Indian red junglefowl and the domestic white Leghorn for a range of characteristics, including the duration of the last syllable and the number of formants and their frequency in the last and second-to-last syllable. To analyse the genetic architecture of crowing vocalization, we performed a quantitative trait loci (QTL) experiment using a wild × domestic advanced intercross to identify QTL that explained a large percentage of the variation present for the duration of the last syllable and the number of formants in the second to last syllable. With this study we thus demonstrate consistent differences in red junglefowl and white Leghorn chickens and identify a relatively simple genetic architecture for some of these traits.This article is part of the theme issue 'Unravelling domestication: multi-disciplinary perspectives on human and non-human relationships in the past, present and future'.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
*Chickens/genetics/physiology
*Domestication
*Vocalization, Animal
Quantitative Trait Loci
Male
Sweden
India

RevDate: 2025-05-06

Irineu RA, Dassie-Leite AP, Pereira EC, et al (2025)

Vocal Markers in the Gender Perception of Trans Women and Trans Men.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00095-5 [Epub ahead of print].

OBJECTIVE: To identify vocal gender markers in trans people, considering the relationship between gender perception and the acoustic and auditory-perceptual parameters of the voice.

METHODS: Observational, cross-sectional study, approved by the Research Ethics Committee (n. 5.353.501). The judges completed auditory-perceptual judgment (APJ) and acoustic analysis of 30 transgender women and 23 transgender men, aged between 18 and 43 years, based on the production of the sustained vowel /a/ and connected speech (number counting and days of the week). The APJ was made in consensus by two judges; vocal deviation was analyzed using the GRBASI scale; the parameters pitch (high, medium, and low) loudness (strong, adequate, and weak), resonance (laryngopharyngeal, balanced, and nasal), articulation (locked, adequate, and exaggerated), intonation (descending, level, and ascending), and gender perception (feminine, masculine, and neutral). For the acoustic evaluation, the software PRAAT was used to extract the parameters oscillatory frequency (fo), fo deviation, minimum and maximum frequency (fomin/fomax), first (F1), second (F2), third (F3), and fourth (F4) formant frequencies. The Kruskal-Wallis test, chi-square test, and Fisher's exact test were used for the statistical analysis of the data. For the regression analysis, the data were analyzed descriptively and inferentially using SPSS 29.0 software. A binary logistic regression model was applied to predict the binary nominal qualitative dependent variable of gender congruence through voice. In all statistical tests, a significance level of 5% (P < 0.05) was used.

RESULTS: The average fo was 146.289 Hz for trans women and 157.409 Hz for trans men. For trans women, gender perception was related to the parameters pitch (P = 0.013), articulation (P = 0.017), and intonation (P = 0.000). In trans men, gender perception was related to hormone use (P = 0.016), GRBASI tension parameter (P = 0.028), pitch (P = 0.001), loudness (P = 0.033), intonation (P = 0.001), fo (P = 0.034), fomin (P = 0.029), fomax (P = 0.018), and F1 (P = 0.038). In the results obtained from binary logistic regression for predicting gender congruence based on voice, ascending intonation was an auditory-perceptual predictor (P = 0.001) in the group of transgender women, and F1 was an acoustic predictor (P = 0.050) in the group of transgender men, both in connected speech.

CONCLUSION: In trans women, high pitch, adequate articulation, and ascending intonations were observed as markers of female gender. Most of the trans women's voices were perceived as feminine, even when they had a low pitch. In trans men, more tense vocal quality, descending intonations, and average fo in the range considered masculine were observed as markers of male gender. The parameters high pitch and ascending intonations were markers of female gender for both trans women and trans men. Ascending intonation was an auditory-perceptual predictor of vocal femininity in the transgender women group, and F1 frequency was an acoustic predictor of vocal masculinity in the transgender men group, both in connected speech.

Additional Links: PMID-40328556

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40328556,
year = {2025},
author = {Irineu, RA and Dassie-Leite, AP and Pereira, EC and Ferreira, T and Martins, PDN},
title = {Vocal Markers in the Gender Perception of Trans Women and Trans Men.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.03.002},
pmid = {40328556},
issn = {1873-4588},
abstract = {OBJECTIVE: To identify vocal gender markers in trans people, considering the relationship between gender perception and the acoustic and auditory-perceptual parameters of the voice.

METHODS: Observational, cross-sectional study, approved by the Research Ethics Committee (n. 5.353.501). The judges completed auditory-perceptual judgment (APJ) and acoustic analysis of 30 transgender women and 23 transgender men, aged between 18 and 43 years, based on the production of the sustained vowel /a/ and connected speech (number counting and days of the week). The APJ was made in consensus by two judges; vocal deviation was analyzed using the GRBASI scale; the parameters pitch (high, medium, and low) loudness (strong, adequate, and weak), resonance (laryngopharyngeal, balanced, and nasal), articulation (locked, adequate, and exaggerated), intonation (descending, level, and ascending), and gender perception (feminine, masculine, and neutral). For the acoustic evaluation, the software PRAAT was used to extract the parameters oscillatory frequency (fo), fo deviation, minimum and maximum frequency (fomin/fomax), first (F1), second (F2), third (F3), and fourth (F4) formant frequencies. The Kruskal-Wallis test, chi-square test, and Fisher's exact test were used for the statistical analysis of the data. For the regression analysis, the data were analyzed descriptively and inferentially using SPSS 29.0 software. A binary logistic regression model was applied to predict the binary nominal qualitative dependent variable of gender congruence through voice. In all statistical tests, a significance level of 5% (P < 0.05) was used.

RESULTS: The average fo was 146.289 Hz for trans women and 157.409 Hz for trans men. For trans women, gender perception was related to the parameters pitch (P = 0.013), articulation (P = 0.017), and intonation (P = 0.000). In trans men, gender perception was related to hormone use (P = 0.016), GRBASI tension parameter (P = 0.028), pitch (P = 0.001), loudness (P = 0.033), intonation (P = 0.001), fo (P = 0.034), fomin (P = 0.029), fomax (P = 0.018), and F1 (P = 0.038). In the results obtained from binary logistic regression for predicting gender congruence based on voice, ascending intonation was an auditory-perceptual predictor (P = 0.001) in the group of transgender women, and F1 was an acoustic predictor (P = 0.050) in the group of transgender men, both in connected speech.

CONCLUSION: In trans women, high pitch, adequate articulation, and ascending intonations were observed as markers of female gender. Most of the trans women's voices were perceived as feminine, even when they had a low pitch. In trans men, more tense vocal quality, descending intonations, and average fo in the range considered masculine were observed as markers of male gender. The parameters high pitch and ascending intonations were markers of female gender for both trans women and trans men. Ascending intonation was an auditory-perceptual predictor of vocal femininity in the transgender women group, and F1 frequency was an acoustic predictor of vocal masculinity in the transgender men group, both in connected speech.},
}

RevDate: 2025-04-25
CmpDate: 2025-04-25

Yuan F, Lu-Lu L, Liang YY, et al (2025)

[Abnormal characteristics of tongue consonants and their correlation with articulatory movement parameters in patients with tongue cancer after surgery].

Shanghai kou qiang yi xue = Shanghai journal of stomatology, 34(1):74-78.

PURPOSE: To study the abnormal characteristics of tongue consonants and their correlation with articulatory movement parameters in patients with tongue cancer after operation.

METHODS: A total of 119 patients with tongue cancer who received surgical treatment at First Affiliated Hospital of Bengbu Medical University from March 2019 to May 2023 were selected. The patients were divided into tongue margin group(n=38), tongue body group (n=40) and tongue base group(n=41). Twenty-five monosyllabic words in Huang Zhaoming-Han Zhijuan Vocabulary List for evaluating tongue consonants were used as speech assessment tools to evaluate the errors of each tongue consonant. The articulation speech measurement and training instrument were used to extract the second formants (F2) of the /i/ and /u/ vowels of the patients by linear predictive spectrum, and the articulation movement parameters such as tongue distance and F2i/F2u were calculated according to the formula. SPSS 26.0 software package was used for data analysis.

RESULTS: The rate of tongue consonant error in each group was as follows: in tongue margin group, preapical sound (49.5%)＞ apical middle sound (27.8%)＞ apical postapical sound (17.5%)＞lingual facial sound (9.4%)＞ lingual base sound (6.1%). In tongue body group, preapical sound (55.0%)＞ apical middle sound (47.1%) ＞ apical postapical sound (25.4%)＞lingual facial sound (12.1%)＞lingual base sound (3.3%). In tongue base group, preapical sound (60.0%)＞postapical sound (52.0%) ＞apical medium sound (51.9%)＞lingual base sound (44.3%)＞lingual facial sound (34.8%). The error frequency of tongue apex medium sound in tongue body group and tongue base group was significantly higher than that in tongue margin group, and the error frequency of tongue apex posterior sound, tongue surface sound and tongue base sound in tongue base group was significantly higher than that in tongue body group and tongue margin group(P＜0.05). Tongue distance and F2i/F2u in tongue base group were significantly lower than those in tongue margin group and tongue body group, and tongue distance and F2i/F2u in tongue body group were significantly lower than those in tongue margin group(P＜0.05). Tongue distance, F2i/F2u were significantly negatively correlated with the error frequency of apical midpoint, apical postpoint and base sound in all groups(r＜0, P＜0.05).

CONCLUSIONS: Most patients with tongue cancer after operation have abnormal tongue tip, and the most serious problem is the pretip. In clinical practice, objective parameters such as tongue distance and F2i/F2u can be used to quantitatively and indirectly evaluate the articulation status and dynamic rehabilitation effect of tongue cancer patients after surgery.

Additional Links: PMID-40275664

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40275664,
year = {2025},
author = {Yuan, F and Lu-Lu, L and Liang, YY and Cao, MM and Xu, ZH and Qian, CR and Wang, D and Zhang, K},
title = {[Abnormal characteristics of tongue consonants and their correlation with articulatory movement parameters in patients with tongue cancer after surgery].},
journal = {Shanghai kou qiang yi xue = Shanghai journal of stomatology},
volume = {34},
number = {1},
pages = {74-78},
pmid = {40275664},
issn = {1006-7248},
mesh = {Humans ; *Tongue Neoplasms/surgery/physiopathology ; *Tongue/physiopathology ; Male ; Female ; Middle Aged ; Aged ; Adult ; *Speech ; *Phonetics ; Movement ; Postoperative Period ; },
abstract = {PURPOSE: To study the abnormal characteristics of tongue consonants and their correlation with articulatory movement parameters in patients with tongue cancer after operation.

METHODS: A total of 119 patients with tongue cancer who received surgical treatment at First Affiliated Hospital of Bengbu Medical University from March 2019 to May 2023 were selected. The patients were divided into tongue margin group(n=38), tongue body group (n=40) and tongue base group(n=41). Twenty-five monosyllabic words in Huang Zhaoming-Han Zhijuan Vocabulary List for evaluating tongue consonants were used as speech assessment tools to evaluate the errors of each tongue consonant. The articulation speech measurement and training instrument were used to extract the second formants (F2) of the /i/ and /u/ vowels of the patients by linear predictive spectrum, and the articulation movement parameters such as tongue distance and F2i/F2u were calculated according to the formula. SPSS 26.0 software package was used for data analysis.

RESULTS: The rate of tongue consonant error in each group was as follows: in tongue margin group, preapical sound (49.5%)＞ apical middle sound (27.8%)＞ apical postapical sound (17.5%)＞lingual facial sound (9.4%)＞ lingual base sound (6.1%). In tongue body group, preapical sound (55.0%)＞ apical middle sound (47.1%) ＞ apical postapical sound (25.4%)＞lingual facial sound (12.1%)＞lingual base sound (3.3%). In tongue base group, preapical sound (60.0%)＞postapical sound (52.0%) ＞apical medium sound (51.9%)＞lingual base sound (44.3%)＞lingual facial sound (34.8%). The error frequency of tongue apex medium sound in tongue body group and tongue base group was significantly higher than that in tongue margin group, and the error frequency of tongue apex posterior sound, tongue surface sound and tongue base sound in tongue base group was significantly higher than that in tongue body group and tongue margin group(P＜0.05). Tongue distance and F2i/F2u in tongue base group were significantly lower than those in tongue margin group and tongue body group, and tongue distance and F2i/F2u in tongue body group were significantly lower than those in tongue margin group(P＜0.05). Tongue distance, F2i/F2u were significantly negatively correlated with the error frequency of apical midpoint, apical postpoint and base sound in all groups(r＜0, P＜0.05).

CONCLUSIONS: Most patients with tongue cancer after operation have abnormal tongue tip, and the most serious problem is the pretip. In clinical practice, objective parameters such as tongue distance and F2i/F2u can be used to quantitatively and indirectly evaluate the articulation status and dynamic rehabilitation effect of tongue cancer patients after surgery.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Tongue Neoplasms/surgery/physiopathology
*Tongue/physiopathology
Male
Female
Middle Aged
Aged
Adult
*Speech
*Phonetics
Movement
Postoperative Period

RevDate: 2025-04-21

Hong Y, Chen S, H Jiang (2025)

Does Musical Experience Facilitate Phonetic Accommodation During Human-Robot Interaction?.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study investigated the effect of musical training on phonetic accommodation in a second language (L2) after interacting with a social robot, exploring the motivations and reasons behind their accommodation strategies.

METHOD: Fifteen L2 English speakers with long-term musical training experience (musician group) and 15 speakers without musical training experience (nonmusician group) were recruited to complete four conversational tasks with the social robot Furhat. Their production of a list of key words and carrier sentences was collected before and after conversations and used to quantify their phonetic accommodations. The spectral cues and prosodic cues of the production were extracted and analyzed.

RESULTS: Both groups showed similar convergence patterns but different divergence patterns. Specifically, the musician group showed divergence from the robot's production on more prosodic cues (mean fundamental frequency and duration) than the nonmusician group. Both groups converged their vowel formants toward the robot without group differences.

CONCLUSIONS: The findings reflect individuals' assessment of the robot's speech characteristics and their efforts to enhance communication efficiency, which might indicate a special speech register used for addressing the robot. The finding is more noticeable in the musician group compared to the nonmusician group. We proposed two possible explanations of the effect of musical training on phonetic accommodations: one involves the training of auditory attention and working memory and the other relates to the refinement of phonetic talent in L2 acquisition, contributing to theories on the relationship between music and language. This study also has implications for applying musical training to speech communication training in clinical populations and for designing social robots to better serve as speech therapy partners.

Additional Links: PMID-40258124

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40258124,
year = {2025},
author = {Hong, Y and Chen, S and Jiang, H},
title = {Does Musical Experience Facilitate Phonetic Accommodation During Human-Robot Interaction?.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-16},
doi = {10.1044/2025_JSLHR-24-00495},
pmid = {40258124},
issn = {1558-9102},
abstract = {PURPOSE: This study investigated the effect of musical training on phonetic accommodation in a second language (L2) after interacting with a social robot, exploring the motivations and reasons behind their accommodation strategies.

METHOD: Fifteen L2 English speakers with long-term musical training experience (musician group) and 15 speakers without musical training experience (nonmusician group) were recruited to complete four conversational tasks with the social robot Furhat. Their production of a list of key words and carrier sentences was collected before and after conversations and used to quantify their phonetic accommodations. The spectral cues and prosodic cues of the production were extracted and analyzed.

RESULTS: Both groups showed similar convergence patterns but different divergence patterns. Specifically, the musician group showed divergence from the robot's production on more prosodic cues (mean fundamental frequency and duration) than the nonmusician group. Both groups converged their vowel formants toward the robot without group differences.

CONCLUSIONS: The findings reflect individuals' assessment of the robot's speech characteristics and their efforts to enhance communication efficiency, which might indicate a special speech register used for addressing the robot. The finding is more noticeable in the musician group compared to the nonmusician group. We proposed two possible explanations of the effect of musical training on phonetic accommodations: one involves the training of auditory attention and working memory and the other relates to the refinement of phonetic talent in L2 acquisition, contributing to theories on the relationship between music and language. This study also has implications for applying musical training to speech communication training in clinical populations and for designing social robots to better serve as speech therapy partners.},
}

RevDate: 2025-04-19
CmpDate: 2025-04-19

Filippa M, Tissot H, Mancinelli T, et al (2025)

Maternal and paternal infant directed speech is modulated by the child's age in in two and three person interactions.

Scientific reports, 15(1):13624.

Prosody in infant-directed speech (IDS) serves important functions for the infant's attention, regulation, and emotional expression. However, how the structural characteristics of this vocal signal are influenced by the presence or absence of one or two parents at different infant ages remains under-investigated. This study aimed to identify the acoustic characteristics of parental vocalizations in 69 families during specific phases of the Lausanne Trilogue Play (LTP) setting. Vocalizations were analyzed in both two-person contexts (mother-baby or father-baby interacting with the infant individually) and three-person contexts (mother-baby or father-baby interactions in the presence of the other parent) at three time points: when the infant was 3, 9, and 18 months old. Videos of interactions were coded, and the parental vocalizations were extracted. Five components of acoustic features related to the prosodic aspects of speech were extracted for subsequent analysis: intensity and its variability, pitch and pitch variability, formant amplitude, the intensity of specific speech frequency bands affecting sound timbre, and the rate of voiced and unvoiced segments per second. The study demonstrated a main effect of infant age on parental acoustic prosodic characteristics, along with significant interactions between infant age and interaction context (two- versus three-person) and between infant age and parental role (mother versus father). Across contexts and parental roles, intensity, pitch, and their variability consistently increased from 3 to 9 months. By 9 months, distinct prosodic patterns emerged, including a reduced syllable rate and formant amplitude, along with an increase in pauses. The mother's voice exhibited a steady increase in intensity, as well as in pitch and intensity variability. Interestingly, when comparing parents across the two contexts, IDS in the three-person context is characterized by a higher rate of syllables and fewer pauses, with the most pronounced changes observed at 9 months of age. The development of prosodic characteristics in IDS is not constant across age and it is influenced by the complex interactions between age phases, parental gender, and contextual factors, with a dynamic adaptation of the communication strategies in three-person contexts. The current study underscores the importance of taking a comprehensive perspective in analyzing infant-directed speech within an interactive context involving both fathers and mothers in two- and three-person settings.

Additional Links: PMID-40253572

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253572,
year = {2025},
author = {Filippa, M and Tissot, H and Mancinelli, T and Favez, N and Grandjean, D},
title = {Maternal and paternal infant directed speech is modulated by the child's age in in two and three person interactions.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {13624},
pmid = {40253572},
issn = {2045-2322},
mesh = {Humans ; Infant ; Female ; Male ; *Speech/physiology ; *Mother-Child Relations ; Adult ; Mothers ; Age Factors ; Fathers ; Father-Child Relations ; Speech Acoustics ; },
abstract = {Prosody in infant-directed speech (IDS) serves important functions for the infant's attention, regulation, and emotional expression. However, how the structural characteristics of this vocal signal are influenced by the presence or absence of one or two parents at different infant ages remains under-investigated. This study aimed to identify the acoustic characteristics of parental vocalizations in 69 families during specific phases of the Lausanne Trilogue Play (LTP) setting. Vocalizations were analyzed in both two-person contexts (mother-baby or father-baby interacting with the infant individually) and three-person contexts (mother-baby or father-baby interactions in the presence of the other parent) at three time points: when the infant was 3, 9, and 18 months old. Videos of interactions were coded, and the parental vocalizations were extracted. Five components of acoustic features related to the prosodic aspects of speech were extracted for subsequent analysis: intensity and its variability, pitch and pitch variability, formant amplitude, the intensity of specific speech frequency bands affecting sound timbre, and the rate of voiced and unvoiced segments per second. The study demonstrated a main effect of infant age on parental acoustic prosodic characteristics, along with significant interactions between infant age and interaction context (two- versus three-person) and between infant age and parental role (mother versus father). Across contexts and parental roles, intensity, pitch, and their variability consistently increased from 3 to 9 months. By 9 months, distinct prosodic patterns emerged, including a reduced syllable rate and formant amplitude, along with an increase in pauses. The mother's voice exhibited a steady increase in intensity, as well as in pitch and intensity variability. Interestingly, when comparing parents across the two contexts, IDS in the three-person context is characterized by a higher rate of syllables and fewer pauses, with the most pronounced changes observed at 9 months of age. The development of prosodic characteristics in IDS is not constant across age and it is influenced by the complex interactions between age phases, parental gender, and contextual factors, with a dynamic adaptation of the communication strategies in three-person contexts. The current study underscores the importance of taking a comprehensive perspective in analyzing infant-directed speech within an interactive context involving both fathers and mothers in two- and three-person settings.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Infant
Female
Male
*Speech/physiology
*Mother-Child Relations
Adult
Mothers
Age Factors
Fathers
Father-Child Relations
Speech Acoustics

RevDate: 2025-04-19

M A, I M, A M J, et al (2025)

A Pitch-Synchronous Study of Formants.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00127-4 [Epub ahead of print].

INTRODUCTION: Formants are of fundamental importance in voice science. To date, formants have typically been studied using pitch-asynchronous methods, such as linear-prediction analysis. The results are often incomplete (without level), not objective (with frequencies depending on the preset order p), and require many pitch periods of stationary signals. A method that is accurate, complete, reproducible, and widely applicable is needed.

METHOD: This study presents a pitch-synchronous method for measuring formants. From the waveform of each pitch period, formants are obtained with high reproducibility, including all formant parameters such as central frequency, level, and bandwidth.

RESULTS: The method was tested on 78 utterances of recorded sustained vowels with simultaneously acquired electroglottograph signals, segmented into 4730 individual pitch periods. For each waveform segment, Fourier analysis was applied to obtain an amplitude spectrum. Formants with three parameters were obtained from each amplitude spectrum. Using these formants, the voice waveforms were regenerated showing strong similarity to the original waveforms. The spectra can be averaged over many pitch periods to reduce noise and to estimate standard deviation.

CONCLUSIONS: Measuring formants from the waveform in each pitch period yields accurate, complete, and reproducible results. The method is applicable to live voices, including both speech and singing signals. The results can be used for voice research, speech and singing synthesis, and a quantitative study of phonetics.

Additional Links: PMID-40253259

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253259,
year = {2025},
author = {M, A and I, M and A M, J and I, H and J C, C},
title = {A Pitch-Synchronous Study of Formants.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.03.033},
pmid = {40253259},
issn = {1873-4588},
abstract = {INTRODUCTION: Formants are of fundamental importance in voice science. To date, formants have typically been studied using pitch-asynchronous methods, such as linear-prediction analysis. The results are often incomplete (without level), not objective (with frequencies depending on the preset order p), and require many pitch periods of stationary signals. A method that is accurate, complete, reproducible, and widely applicable is needed.

METHOD: This study presents a pitch-synchronous method for measuring formants. From the waveform of each pitch period, formants are obtained with high reproducibility, including all formant parameters such as central frequency, level, and bandwidth.

RESULTS: The method was tested on 78 utterances of recorded sustained vowels with simultaneously acquired electroglottograph signals, segmented into 4730 individual pitch periods. For each waveform segment, Fourier analysis was applied to obtain an amplitude spectrum. Formants with three parameters were obtained from each amplitude spectrum. Using these formants, the voice waveforms were regenerated showing strong similarity to the original waveforms. The spectra can be averaged over many pitch periods to reduce noise and to estimate standard deviation.

CONCLUSIONS: Measuring formants from the waveform in each pitch period yields accurate, complete, and reproducible results. The method is applicable to live voices, including both speech and singing signals. The results can be used for voice research, speech and singing synthesis, and a quantitative study of phonetics.},
}

RevDate: 2025-04-19

Chabib L, Yulianto , Ananda PWR, et al (2025)

Ethyl Cellulose-Based In-Situ Film of Itraconazole for Enhanced Treatment of Fungal Infections.

Annales pharmaceutiques francaises pii:S0003-4509(25)00072-0 [Epub ahead of print].

OBJECTIVES: Fungal infections represent a significant global health challenge, requiring effective treatments to prevent complications and improve patient outcomes. This study aimed to develop an in-situ film-forming system (IFFS) for transcutaneous delivery of itraconazole (ITZ) as an alternative to oral administration, addressing issues such as low bioavailability, reduced efficacy, and potential side effects.

MATERIALS AND METHODS: The IFFS was formulated using ethyl cellulose as the primary polymer, PEG 400 as a plasticizer, and a eutectic mixture of menthol and camphor as penetration enhancers. The system was characterized for viscosity, pH, drying time, water vapor permeability, bioadhesion, and physicochemical interactions (DSC and FTIR). Ex vivo skin permeation and retention studies were conducted using Franz diffusion cells, and antifungal efficacy was tested on an ex vivo Candida albicans infection model. Skin integrity and hemolysis tests were performed to evaluate safety.

RESULTS: The IFFS exhibited desirable physicochemical properties, with increased polymer concentrations enhancing skin retention and bioadhesive strength while reducing permeation rates. Ex vivo studies showed sustained ITZ release and enhanced skin retention. The antifungal activity test demonstrated complete eradication of Candida albicans within 48 hours. Safety assessments confirmed no skin irritation or toxicity.

CONCLUSION: The developed IFFS provides a safe and effective transcutaneous delivery system for ITZ. This innovative approach enhances antifungal efficacy, improves skin retention, and offers a promising alternative to oral administration, minimizing systemic side effects.

Additional Links: PMID-40253000

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40253000,
year = {2025},
author = {Chabib, L and Yulianto, and Ananda, PWR and Utami, RN and Mir, M and Elim, D and Fitri, AMN and Zaman, HS and Aziz, AYR and Fauziah, N and Rahman, L and Pandoman Febrian, M and Permana, AD},
title = {Ethyl Cellulose-Based In-Situ Film of Itraconazole for Enhanced Treatment of Fungal Infections.},
journal = {Annales pharmaceutiques francaises},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.pharma.2025.04.002},
pmid = {40253000},
issn = {0003-4509},
abstract = {OBJECTIVES: Fungal infections represent a significant global health challenge, requiring effective treatments to prevent complications and improve patient outcomes. This study aimed to develop an in-situ film-forming system (IFFS) for transcutaneous delivery of itraconazole (ITZ) as an alternative to oral administration, addressing issues such as low bioavailability, reduced efficacy, and potential side effects.

MATERIALS AND METHODS: The IFFS was formulated using ethyl cellulose as the primary polymer, PEG 400 as a plasticizer, and a eutectic mixture of menthol and camphor as penetration enhancers. The system was characterized for viscosity, pH, drying time, water vapor permeability, bioadhesion, and physicochemical interactions (DSC and FTIR). Ex vivo skin permeation and retention studies were conducted using Franz diffusion cells, and antifungal efficacy was tested on an ex vivo Candida albicans infection model. Skin integrity and hemolysis tests were performed to evaluate safety.

RESULTS: The IFFS exhibited desirable physicochemical properties, with increased polymer concentrations enhancing skin retention and bioadhesive strength while reducing permeation rates. Ex vivo studies showed sustained ITZ release and enhanced skin retention. The antifungal activity test demonstrated complete eradication of Candida albicans within 48 hours. Safety assessments confirmed no skin irritation or toxicity.

CONCLUSION: The developed IFFS provides a safe and effective transcutaneous delivery system for ITZ. This innovative approach enhances antifungal efficacy, improves skin retention, and offers a promising alternative to oral administration, minimizing systemic side effects.},
}

RevDate: 2025-04-17
CmpDate: 2025-04-17

Behroozmand R, Khoshhal Mollasaraei Z, Nejati V, et al (2025)

Vocal and articulatory speech control deficits in individuals with post-stroke aphasia.

Scientific reports, 15(1):13350.

Individuals with post-stroke aphasia exhibit deficits in regulating vocal (i.e., laryngeal) pitch control during speech vowel production; however, it is not determined whether such deficits also exist when they control their supra-laryngeal speech articulators during word production. To address this question, 19 subjects with post-stroke aphasia and 20 controls were tested under an altered auditory feedback paradigm in which they received + 30% shifts in their vowel first-formant frequency during word production. In addition, 17 aphasia subjects and 19 controls from the same groups also completed steady vowel vocalizations while receiving randomized pitch shifts at ± 100 cents. Consistent with previous findings, our data showed that the magnitude of compensatory vocal responses to pitch-shifted vowel productions was significantly reduced in individuals with aphasia vs. controls. In addition, we also found that the magnitude of compensatory articulatory responses to formant-shifted vowels during word production was significantly diminished in the aphasia group compared with controls. However, no significant correlation was found between the vocal and articulatory compensatory responses to pitch and formant alterations. These findings suggest that vocal and articulatory motor speech control are regulated via independent mechanisms, and stroke-induced damage to left-hemispheric brain networks can selectively impair them in stroke survivors with aphasia.

Additional Links: PMID-40246982

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40246982,
year = {2025},
author = {Behroozmand, R and Khoshhal Mollasaraei, Z and Nejati, V and Daliri, A and Fridriksson, J},
title = {Vocal and articulatory speech control deficits in individuals with post-stroke aphasia.},
journal = {Scientific reports},
volume = {15},
number = {1},
pages = {13350},
pmid = {40246982},
issn = {2045-2322},
support = {R01DC018523/NH/NIH HHS/United States ; R01DC019905/NH/NIH HHS/United States ; P50DC014664/NH/NIH HHS/United States ; },
mesh = {Humans ; *Stroke/complications/physiopathology ; *Aphasia/physiopathology/etiology ; Male ; Female ; Middle Aged ; Aged ; *Speech/physiology ; Adult ; Case-Control Studies ; *Voice/physiology ; Speech Acoustics ; },
abstract = {Individuals with post-stroke aphasia exhibit deficits in regulating vocal (i.e., laryngeal) pitch control during speech vowel production; however, it is not determined whether such deficits also exist when they control their supra-laryngeal speech articulators during word production. To address this question, 19 subjects with post-stroke aphasia and 20 controls were tested under an altered auditory feedback paradigm in which they received + 30% shifts in their vowel first-formant frequency during word production. In addition, 17 aphasia subjects and 19 controls from the same groups also completed steady vowel vocalizations while receiving randomized pitch shifts at ± 100 cents. Consistent with previous findings, our data showed that the magnitude of compensatory vocal responses to pitch-shifted vowel productions was significantly reduced in individuals with aphasia vs. controls. In addition, we also found that the magnitude of compensatory articulatory responses to formant-shifted vowels during word production was significantly diminished in the aphasia group compared with controls. However, no significant correlation was found between the vocal and articulatory compensatory responses to pitch and formant alterations. These findings suggest that vocal and articulatory motor speech control are regulated via independent mechanisms, and stroke-induced damage to left-hemispheric brain networks can selectively impair them in stroke survivors with aphasia.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Stroke/complications/physiopathology
*Aphasia/physiopathology/etiology
Male
Female
Middle Aged
Aged
*Speech/physiology
Adult
Case-Control Studies
*Voice/physiology
Speech Acoustics

RevDate: 2025-04-14

Kang MJ, Ryu JY, Lee JS, et al (2025)

Acoustic analysis of nasalance and formants in VPI patients: Implications for clinical practice and mobile application development.

Journal of cranio-maxillo-facial surgery : official publication of the European Association for Cranio-Maxillo-Facial Surgery pii:S1010-5182(25)00114-3 [Epub ahead of print].

Velopharyngeal insufficiency (VPI) often results in speech abnormalities, making accurate evaluation essential for understanding its relationship with structural anomalies. This retrospective study, spanning January 2019 to December 2022, investigates the role of formant analysis in speech evaluation and treatment. We analyzed speech data from 100 adults, 55 children, and 10 pediatric patients with VPI using Nasometer and PRAAT software, focusing on the sounds Pa, Pi, Pu, Pe, and Po. Nasalance scores and formants 1-4 were measured both pre- and post-VPI surgery and correlated with age, gender, and surgical outcomes. In both normal adults and children, the distributions of formants 1 and 2 for the vowels |a|, |e|, |i|, |o|, and |u| showed variations by age. Gender differences were significant in adults for the vowels |a|, |o|, and |u|, but not in children. VPI surgery significantly improved nasalance scores, and notable changes in formants 1 and 2 were observed post-surgery in VPI patients for the vowels |a|, |e|, and |i|. This study emphasizes the importance of formant analysis in speech therapy and introduces the potential for mobile app-based self-assessment. This approach reduces the reliance on specialized tools, such as nasometers, and provides a more accessible method for the speech management.

Additional Links: PMID-40229175

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40229175,
year = {2025},
author = {Kang, MJ and Ryu, JY and Lee, JS and Yang, JD and Chung, HY and Choi, KY},
title = {Acoustic analysis of nasalance and formants in VPI patients: Implications for clinical practice and mobile application development.},
journal = {Journal of cranio-maxillo-facial surgery : official publication of the European Association for Cranio-Maxillo-Facial Surgery},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jcms.2025.03.018},
pmid = {40229175},
issn = {1878-4119},
abstract = {Velopharyngeal insufficiency (VPI) often results in speech abnormalities, making accurate evaluation essential for understanding its relationship with structural anomalies. This retrospective study, spanning January 2019 to December 2022, investigates the role of formant analysis in speech evaluation and treatment. We analyzed speech data from 100 adults, 55 children, and 10 pediatric patients with VPI using Nasometer and PRAAT software, focusing on the sounds Pa, Pi, Pu, Pe, and Po. Nasalance scores and formants 1-4 were measured both pre- and post-VPI surgery and correlated with age, gender, and surgical outcomes. In both normal adults and children, the distributions of formants 1 and 2 for the vowels |a|, |e|, |i|, |o|, and |u| showed variations by age. Gender differences were significant in adults for the vowels |a|, |o|, and |u|, but not in children. VPI surgery significantly improved nasalance scores, and notable changes in formants 1 and 2 were observed post-surgery in VPI patients for the vowels |a|, |e|, and |i|. This study emphasizes the importance of formant analysis in speech therapy and introduces the potential for mobile app-based self-assessment. This approach reduces the reliance on specialized tools, such as nasometers, and provides a more accessible method for the speech management.},
}

RevDate: 2025-04-14
CmpDate: 2025-04-14

Su Z, Jiang H, Yang Y, et al (2025)

Acoustic Features for Identifying Suicide Risk in Crisis Hotline Callers: Machine Learning Approach.

Journal of medical Internet research, 27:e67772 pii:v27i1e67772.

BACKGROUND: Crisis hotlines serve as a crucial avenue for the early identification of suicide risk, which is of paramount importance for suicide prevention and intervention. However, assessing the risk of callers in the crisis hotline context is constrained by factors such as lack of nonverbal communication cues, anonymity, time limits, and single-occasion intervention. Therefore, it is necessary to develop approaches, including acoustic features, for identifying the suicide risk among hotline callers early and quickly. Given the complicated features of sound, adopting artificial intelligence models to analyze callers' acoustic features is promising.

OBJECTIVE: In this study, we investigated the feasibility of using acoustic features to predict suicide risk in crisis hotline callers. We also adopted a machine learning approach to analyze the complex acoustic features of hotline callers, with the aim of developing suicide risk prediction models.

METHODS: We collected 525 suicide-related calls from the records of a psychological assistance hotline in a province in northwest China. Callers were categorized as low or high risk based on suicidal ideation, suicidal plans, and history of suicide attempts, with risk assessments verified by a team of 18 clinical psychology raters. A total of 164 clearly categorized risk recordings were analyzed, including 102 low-risk and 62 high-risk calls. We extracted 273 audio segments, each exceeding 2 seconds in duration, which were labeled by raters as containing suicide-related expressions for subsequent model training and evaluation. Basic acoustic features (eg, Mel Frequency Cepstral Coefficients, formant frequencies, jitter, shimmer) and high-level statistical function (HSF) features (using OpenSMILE [Open-Source Speech and Music Interpretation by Large-Space Extraction] with the ComParE 2016 configuration) were extracted. Four supervised machine learning algorithms (logistic regression, support vector machine, random forest, and extreme gradient boosting) were trained and evaluated using grouped 5-fold cross-validation and a test set, with performance metrics, including accuracy, F1-score, recall, and false negative rate.

RESULTS: The development of machine learning models utilizing HSF acoustic features has been demonstrated to enhance recognition performance compared to models based solely on basic acoustic features. The random forest classifier, developed with HSFs, achieved the best performance in detecting the suicide risk among the models evaluated (accuracy=0.75, F1-score=0.70, recall=0.76, false negative rate=0.24).

CONCLUSIONS: The results of our study demonstrate the potential of developing artificial intelligence-based early warning systems using acoustic features for identifying the suicide risk among crisis hotline callers. Our work also has implications for employing acoustic features to identify suicide risk in salient voice contexts.

Additional Links: PMID-40228243

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40228243,
year = {2025},
author = {Su, Z and Jiang, H and Yang, Y and Hou, X and Su, Y and Yang, L},
title = {Acoustic Features for Identifying Suicide Risk in Crisis Hotline Callers: Machine Learning Approach.},
journal = {Journal of medical Internet research},
volume = {27},
number = {},
pages = {e67772},
doi = {10.2196/67772},
pmid = {40228243},
issn = {1438-8871},
mesh = {Humans ; *Machine Learning ; *Hotlines ; *Suicide ; *Acoustics ; Female ; Male ; Risk Assessment/methods ; China ; Adult ; *Suicide Prevention ; Suicidal Ideation ; },
abstract = {BACKGROUND: Crisis hotlines serve as a crucial avenue for the early identification of suicide risk, which is of paramount importance for suicide prevention and intervention. However, assessing the risk of callers in the crisis hotline context is constrained by factors such as lack of nonverbal communication cues, anonymity, time limits, and single-occasion intervention. Therefore, it is necessary to develop approaches, including acoustic features, for identifying the suicide risk among hotline callers early and quickly. Given the complicated features of sound, adopting artificial intelligence models to analyze callers' acoustic features is promising.

OBJECTIVE: In this study, we investigated the feasibility of using acoustic features to predict suicide risk in crisis hotline callers. We also adopted a machine learning approach to analyze the complex acoustic features of hotline callers, with the aim of developing suicide risk prediction models.

METHODS: We collected 525 suicide-related calls from the records of a psychological assistance hotline in a province in northwest China. Callers were categorized as low or high risk based on suicidal ideation, suicidal plans, and history of suicide attempts, with risk assessments verified by a team of 18 clinical psychology raters. A total of 164 clearly categorized risk recordings were analyzed, including 102 low-risk and 62 high-risk calls. We extracted 273 audio segments, each exceeding 2 seconds in duration, which were labeled by raters as containing suicide-related expressions for subsequent model training and evaluation. Basic acoustic features (eg, Mel Frequency Cepstral Coefficients, formant frequencies, jitter, shimmer) and high-level statistical function (HSF) features (using OpenSMILE [Open-Source Speech and Music Interpretation by Large-Space Extraction] with the ComParE 2016 configuration) were extracted. Four supervised machine learning algorithms (logistic regression, support vector machine, random forest, and extreme gradient boosting) were trained and evaluated using grouped 5-fold cross-validation and a test set, with performance metrics, including accuracy, F1-score, recall, and false negative rate.

RESULTS: The development of machine learning models utilizing HSF acoustic features has been demonstrated to enhance recognition performance compared to models based solely on basic acoustic features. The random forest classifier, developed with HSFs, achieved the best performance in detecting the suicide risk among the models evaluated (accuracy=0.75, F1-score=0.70, recall=0.76, false negative rate=0.24).

CONCLUSIONS: The results of our study demonstrate the potential of developing artificial intelligence-based early warning systems using acoustic features for identifying the suicide risk among crisis hotline callers. Our work also has implications for employing acoustic features to identify suicide risk in salient voice contexts.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Machine Learning
*Hotlines
*Suicide
*Acoustics
Female
Male
Risk Assessment/methods
China
Adult
*Suicide Prevention
Suicidal Ideation

RevDate: 2025-04-10

Zeng Y, Niziolek CA, B Parrell (2025)

Simultaneous acquisition of multiple auditory-motor transformations reveals suprasyllabic motor planning in speech production.

Journal of experimental psychology. General pii:2026-03037-001 [Epub ahead of print].

Motor planning forms a critical bridge between psycholinguistic and motoric models of word production. While syllables are often considered the core speech motor planning unit, growing evidence hints at suprasyllabic planning that may correspond to words, but firm experimental support is still lacking. We use differential adaptation to altered auditory feedback to provide novel, straightforward evidence for word-level planning. By introducing opposing perturbations to shared segmental content in near real time during speaking (e.g., raising the first vowel formant of "ped" in "pedigree" but lowering it in "pedicure," so speakers hear something akin to "padigree" and "pidicure"), we assess whether participants can use the larger word context to separately oppose the two perturbations (i.e., by producing "pidigree" and "padicure"). Critically, limb control research shows that such differential learning is possible only when the shared movement forms part of distinct motor plans, allowing a straightforward assay of the scope of planning in multisyllabic words. We found differential adaptation in multisyllabic words but of smaller size relative to monosyllabic words. Our results strongly suggest that speech relies on an interactive motor planning process encompassing both syllables and words. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

Additional Links: PMID-40208724

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40208724,
year = {2025},
author = {Zeng, Y and Niziolek, CA and Parrell, B},
title = {Simultaneous acquisition of multiple auditory-motor transformations reveals suprasyllabic motor planning in speech production.},
journal = {Journal of experimental psychology. General},
volume = {},
number = {},
pages = {},
doi = {10.1037/xge0001744},
pmid = {40208724},
issn = {1939-2222},
support = {//National Science Foundation; Division of Behavioral and Cognitive Sciences/ ; },
abstract = {Motor planning forms a critical bridge between psycholinguistic and motoric models of word production. While syllables are often considered the core speech motor planning unit, growing evidence hints at suprasyllabic planning that may correspond to words, but firm experimental support is still lacking. We use differential adaptation to altered auditory feedback to provide novel, straightforward evidence for word-level planning. By introducing opposing perturbations to shared segmental content in near real time during speaking (e.g., raising the first vowel formant of "ped" in "pedigree" but lowering it in "pedicure," so speakers hear something akin to "padigree" and "pidicure"), we assess whether participants can use the larger word context to separately oppose the two perturbations (i.e., by producing "pidigree" and "padicure"). Critically, limb control research shows that such differential learning is possible only when the shared movement forms part of distinct motor plans, allowing a straightforward assay of the scope of planning in multisyllabic words. We found differential adaptation in multisyllabic words but of smaller size relative to monosyllabic words. Our results strongly suggest that speech relies on an interactive motor planning process encompassing both syllables and words. (PsycInfo Database Record (c) 2025 APA, all rights reserved).},
}

RevDate: 2025-04-06
CmpDate: 2025-04-06

Fitch WT, Anikin A, Pisanski K, et al (2025)

Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises.

BMC biology, 23(1):92.

When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the "formant revolution" in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area.Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.

Additional Links: PMID-40189499

PubMed:

Google:

full text, via PubMed Central

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40189499,
year = {2025},
author = {Fitch, WT and Anikin, A and Pisanski, K and Valente, D and Reby, D},
title = {Formant analysis of vertebrate vocalizations: achievements, pitfalls, and promises.},
journal = {BMC biology},
volume = {23},
number = {1},
pages = {92},
pmid = {40189499},
issn = {1741-7007},
support = {W1262-B29//Austrian Science Fund/ ; 2023-00850//Vetenskapsrådet/ ; ANR-21-CE28-0007-01//French National Research Agency/ ; ANR-21-CE28-0007-01//French National Research Agency/ ; },
mesh = {Animals ; *Vocalization, Animal/physiology ; *Vertebrates/physiology ; Humans ; },
abstract = {When applied to vertebrate vocalizations, source-filter theory, initially developed for human speech, has revolutionized our understanding of animal communication, resulting in major insights into the form and function of animal sounds. However, animal calls and human nonverbal vocalizations can differ qualitatively from human speech, often having more chaotic and higher-frequency sources, making formant measurement challenging. We review the considerable achievements of the "formant revolution" in animal vocal communication research, then highlight several important methodological problems in formant analysis. We offer concrete recommendations for effectively applying source-filter theory to non-speech vocalizations and discuss promising avenues for future research in this area.Brief Formants (vocal tract resonances) play key roles in animal communication, offering researchers exciting promise but also potential pitfalls.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Animals
*Vocalization, Animal/physiology
*Vertebrates/physiology
Humans

RevDate: 2025-04-04

Song JY, Rojas C, A Pycha (2025)

Factors modulating perception and production of speech by AI tools: a test case of Amazon Alexa and Polly.

Frontiers in psychology, 16:1520111.

To develop AI tools that can communicate on par with human speakers and listeners, we need a deeper understanding of the factors that affect their perception and production of spoken language. Thus, the goal of this study was to examine to what extent two AI tools, Amazon Alexa and Polly, are impacted by factors that are known to modulate speech perception and production in humans. In particular, we examined the role of lexical (word frequency, phonological neighborhood density) and stylistic (speaking rate) factors. In the domain of perception, high-frequency words and slow speaking rate significantly improved Alexa's recognition of words produced in real time by native speakers of American English (n = 21). Alexa also recognized words with low neighborhood density with greater accuracy, but only at fast speaking rates. In contrast to human listeners, Alexa showed no evidence of adaptation to the speaker over time. In the domain of production, Polly's vowel duration and formants were unaffected by the lexical characteristics of words, unlike human speakers. Overall, these findings suggest that, despite certain patterns that humans and AI tools share, AI tools lack some of the flexibility that is the hallmark of human speech perception and production.

Additional Links: PMID-40181888

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40181888,
year = {2025},
author = {Song, JY and Rojas, C and Pycha, A},
title = {Factors modulating perception and production of speech by AI tools: a test case of Amazon Alexa and Polly.},
journal = {Frontiers in psychology},
volume = {16},
number = {},
pages = {1520111},
doi = {10.3389/fpsyg.2025.1520111},
pmid = {40181888},
issn = {1664-1078},
abstract = {To develop AI tools that can communicate on par with human speakers and listeners, we need a deeper understanding of the factors that affect their perception and production of spoken language. Thus, the goal of this study was to examine to what extent two AI tools, Amazon Alexa and Polly, are impacted by factors that are known to modulate speech perception and production in humans. In particular, we examined the role of lexical (word frequency, phonological neighborhood density) and stylistic (speaking rate) factors. In the domain of perception, high-frequency words and slow speaking rate significantly improved Alexa's recognition of words produced in real time by native speakers of American English (n = 21). Alexa also recognized words with low neighborhood density with greater accuracy, but only at fast speaking rates. In contrast to human listeners, Alexa showed no evidence of adaptation to the speaker over time. In the domain of production, Polly's vowel duration and formants were unaffected by the lexical characteristics of words, unlike human speakers. Overall, these findings suggest that, despite certain patterns that humans and AI tools share, AI tools lack some of the flexibility that is the hallmark of human speech perception and production.},
}

RevDate: 2025-04-03

Atilgan H, Walker KM, King AJ, et al (2025)

Auditory training alters the cortical representation of complex sounds.

The Journal of neuroscience : the official journal of the Society for Neuroscience pii:JNEUROSCI.0989-24.2025 [Epub ahead of print].

Auditory learning is supported by long-term changes in the neural processing of sound. We examined these task-depend changes in auditory cortex by mapping neural sensitivity to timbre, pitch and location cues in cues in trained (n = 5), and untrained control female ferrets (n = 5). Trained animals either identified vowels in a two-alternative forced choice task (n = 3) or discriminated when a repeating vowel changed in identity or pitch (n = 2). Neural responses were recorded under anesthesia in two primary auditory cortical fields and two tonotopically organized non-primary fields. In trained animals, the overall sensitivity to sound timbre was reduced across three cortical fields compared to control animals, but maintained in a non-primary field (the posterior pseudosylvian field). While training did not increase sensitivity to timbre across auditory cortex, it did change the way in which neurons integrated spectral information with neural responses in trained animals increasing their sensitivity to first and second formant frequencies, whereas in control animals' cortical sensitivity to spectral timbre depends mostly on the second formant. Animals trained on timbre identification were required to generalize across pitch when discriminating timbre and their neurons became less modulated by fundamental frequency relative to control animals. Finally, both trained groups showed increased spatial sensitivity and an enhanced response to sound source locations close to the midline, where the loudspeaker was located in the training chamber. These results demonstrate that training elicited widespread alterations in the cortical representation of complex sounds.Significance Statement Learning a task can elicit widespread changes in the brain. Here, we trained animals to discriminate sound timbre using synthetic vowel sounds. Somewhat surprisingly we observed that in 3 out of 4 of the brain regions studied, neural responses became less sensitive to timbre, while in the 4th area sensitivity was maintained. This suggests that training does not simply rewire more neurons to represent learned stimuli. Neurons also changed the way in which they processed stimuli becoming more sensitive to the formant cues that determine vowel identity and tuned preferentially for the region of space in which sounds were presented during training. Together, these results suggest that learning results in complex changes in how and whether neurons represent learned sounds.

Additional Links: PMID-40180572

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40180572,
year = {2025},
author = {Atilgan, H and Walker, KM and King, AJ and Schnupp, JW and Bizley, JK},
title = {Auditory training alters the cortical representation of complex sounds.},
journal = {The Journal of neuroscience : the official journal of the Society for Neuroscience},
volume = {},
number = {},
pages = {},
doi = {10.1523/JNEUROSCI.0989-24.2025},
pmid = {40180572},
issn = {1529-2401},
abstract = {Auditory learning is supported by long-term changes in the neural processing of sound. We examined these task-depend changes in auditory cortex by mapping neural sensitivity to timbre, pitch and location cues in cues in trained (n = 5), and untrained control female ferrets (n = 5). Trained animals either identified vowels in a two-alternative forced choice task (n = 3) or discriminated when a repeating vowel changed in identity or pitch (n = 2). Neural responses were recorded under anesthesia in two primary auditory cortical fields and two tonotopically organized non-primary fields. In trained animals, the overall sensitivity to sound timbre was reduced across three cortical fields compared to control animals, but maintained in a non-primary field (the posterior pseudosylvian field). While training did not increase sensitivity to timbre across auditory cortex, it did change the way in which neurons integrated spectral information with neural responses in trained animals increasing their sensitivity to first and second formant frequencies, whereas in control animals' cortical sensitivity to spectral timbre depends mostly on the second formant. Animals trained on timbre identification were required to generalize across pitch when discriminating timbre and their neurons became less modulated by fundamental frequency relative to control animals. Finally, both trained groups showed increased spatial sensitivity and an enhanced response to sound source locations close to the midline, where the loudspeaker was located in the training chamber. These results demonstrate that training elicited widespread alterations in the cortical representation of complex sounds.Significance Statement Learning a task can elicit widespread changes in the brain. Here, we trained animals to discriminate sound timbre using synthetic vowel sounds. Somewhat surprisingly we observed that in 3 out of 4 of the brain regions studied, neural responses became less sensitive to timbre, while in the 4th area sensitivity was maintained. This suggests that training does not simply rewire more neurons to represent learned stimuli. Neurons also changed the way in which they processed stimuli becoming more sensitive to the formant cues that determine vowel identity and tuned preferentially for the region of space in which sounds were presented during training. Together, these results suggest that learning results in complex changes in how and whether neurons represent learned sounds.},
}

RevDate: 2025-04-03

Almurashi W (2025)

Acoustic Evidence for the Tenseness and Laxity Distinction in Hijazi Arabic: A Pilot Study Using Static and Dynamic Analysis.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: Standard Arabic has a simple three-vowel system with short and long distinctions, specifically /i iː a aː u uː/, traditionally believed to differ solely in duration. However, studies on regional Arabic dialects using a static approach (e.g., measuring formant values at the vowel's midpoint) have suggested that these vowels differ in both quality and quantity. This study aimed to investigate whether Hijazi Arabic (HA) exhibits a tense/lax distinction and, importantly, whether a dynamic analysis (particularly Vowel Inherent Spectral Change) could better capture this distinction, an area relatively underexplored in Arabic acoustic studies.

METHOD: Data were collected from 20 native HA speakers, who produced six HA vowels in various consonantal environments. The first two formant values and vowel duration were automatically extracted. Static formant values were measured at the vowel's midpoint, while dynamic spectral changes were measured at three points during the vowel's duration.

RESULTS: The findings revealed a significant distinction between short and long HA vowels, not only in duration but also in their acoustic properties. In the static model, short vowels were more centralized, while long vowels were more peripheral. In the dynamic model, the spectral changes of short vowels differed significantly from those of their long counterparts.

CONCLUSIONS: These results underscore the existence of a tense/lax distinction in HA, challenging the traditional view that the distinction is based solely on duration. They also highlight the value of dynamic vowel analysis for a comprehensive understanding of vowel behavior in phonological systems.

Additional Links: PMID-40178361

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40178361,
year = {2025},
author = {Almurashi, W},
title = {Acoustic Evidence for the Tenseness and Laxity Distinction in Hijazi Arabic: A Pilot Study Using Static and Dynamic Analysis.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-14},
doi = {10.1044/2025_JSLHR-24-00692},
pmid = {40178361},
issn = {1558-9102},
abstract = {PURPOSE: Standard Arabic has a simple three-vowel system with short and long distinctions, specifically /i iː a aː u uː/, traditionally believed to differ solely in duration. However, studies on regional Arabic dialects using a static approach (e.g., measuring formant values at the vowel's midpoint) have suggested that these vowels differ in both quality and quantity. This study aimed to investigate whether Hijazi Arabic (HA) exhibits a tense/lax distinction and, importantly, whether a dynamic analysis (particularly Vowel Inherent Spectral Change) could better capture this distinction, an area relatively underexplored in Arabic acoustic studies.

METHOD: Data were collected from 20 native HA speakers, who produced six HA vowels in various consonantal environments. The first two formant values and vowel duration were automatically extracted. Static formant values were measured at the vowel's midpoint, while dynamic spectral changes were measured at three points during the vowel's duration.

RESULTS: The findings revealed a significant distinction between short and long HA vowels, not only in duration but also in their acoustic properties. In the static model, short vowels were more centralized, while long vowels were more peripheral. In the dynamic model, the spectral changes of short vowels differed significantly from those of their long counterparts.

CONCLUSIONS: These results underscore the existence of a tense/lax distinction in HA, challenging the traditional view that the distinction is based solely on duration. They also highlight the value of dynamic vowel analysis for a comprehensive understanding of vowel behavior in phonological systems.},
}

RevDate: 2025-04-03
CmpDate: 2025-04-03

Anikin A, Reby D, K Pisanski (2025)

Nonlinear vocal phenomena and speech intelligibility.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 380(1923):20240254.

At some point in our evolutionary history, humans lost vocal membranes and air sacs, representing an unexpected simplification of the vocal apparatus relative to other great apes. One hypothesis is that these simplifications represent anatomical adaptations for speech because a simpler larynx provides a suitably stable and tonal vocal source with fewer nonlinear vocal phenomena (NLP). The key assumption that NLP reduce speech intelligibility is indirectly supported by studies of dysphonia, but it has not been experimentally tested. Here, we manipulate NLP in vocal stimuli ranging from single vowels to sentences, showing that the vocal source needs to be stable, but not necessarily tonal, for speech to be readily understood. When the task is to discriminate synthesized monophthong and diphthong vowels, continuous NLP (subharmonics, amplitude modulation and even deterministic chaos) actually improve vowel perception in high-pitched voices, likely because the resulting dense spectrum reveals formant transitions. Rough-sounding voices also remain highly intelligible when continuous NLP are added to recorded words and sentences. In contrast, voicing interruptions and pitch jumps dramatically reduce speech intelligibility, likely by interfering with voicing contrasts and normal intonation. We argue that NLP were not eliminated from the human vocal repertoire as we evolved for speech, but only brought under better control.This article is part of the theme issue 'Nonlinear phenomena in vertebrate vocalizations: mechanisms and communicative functions'.

Additional Links: PMID-40176514

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40176514,
year = {2025},
author = {Anikin, A and Reby, D and Pisanski, K},
title = {Nonlinear vocal phenomena and speech intelligibility.},
journal = {Philosophical transactions of the Royal Society of London. Series B, Biological sciences},
volume = {380},
number = {1923},
pages = {20240254},
doi = {10.1098/rstb.2024.0254},
pmid = {40176514},
issn = {1471-2970},
support = {//Vetenskapsrådet/ ; },
mesh = {Humans ; *Speech Intelligibility ; Male ; Female ; *Voice ; Adult ; *Speech Acoustics ; *Speech Perception ; Animals ; },
abstract = {At some point in our evolutionary history, humans lost vocal membranes and air sacs, representing an unexpected simplification of the vocal apparatus relative to other great apes. One hypothesis is that these simplifications represent anatomical adaptations for speech because a simpler larynx provides a suitably stable and tonal vocal source with fewer nonlinear vocal phenomena (NLP). The key assumption that NLP reduce speech intelligibility is indirectly supported by studies of dysphonia, but it has not been experimentally tested. Here, we manipulate NLP in vocal stimuli ranging from single vowels to sentences, showing that the vocal source needs to be stable, but not necessarily tonal, for speech to be readily understood. When the task is to discriminate synthesized monophthong and diphthong vowels, continuous NLP (subharmonics, amplitude modulation and even deterministic chaos) actually improve vowel perception in high-pitched voices, likely because the resulting dense spectrum reveals formant transitions. Rough-sounding voices also remain highly intelligible when continuous NLP are added to recorded words and sentences. In contrast, voicing interruptions and pitch jumps dramatically reduce speech intelligibility, likely by interfering with voicing contrasts and normal intonation. We argue that NLP were not eliminated from the human vocal repertoire as we evolved for speech, but only brought under better control.This article is part of the theme issue 'Nonlinear phenomena in vertebrate vocalizations: mechanisms and communicative functions'.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Intelligibility
Male
Female
*Voice
Adult
*Speech Acoustics
*Speech Perception
Animals

RevDate: 2025-03-31

Owino G, B Bernard Shibwabo (2025)

A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.

JMIR rehabilitation and assistive technologies [Epub ahead of print].

BACKGROUND: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

OBJECTIVE: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

METHODS: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.

RESULTS: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.

CONCLUSIONS: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.

Additional Links: PMID-40163619

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40163619,
year = {2025},
author = {Owino, G and Bernard Shibwabo, B},
title = {A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications.},
journal = {JMIR rehabilitation and assistive technologies},
volume = {},
number = {},
pages = {},
doi = {10.2196/69457},
pmid = {40163619},
issn = {2369-2529},
abstract = {BACKGROUND: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.

OBJECTIVE: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.

METHODS: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.

RESULTS: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.

CONCLUSIONS: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.},
}

RevDate: 2025-03-30

Terband H, B Bhat (2025)

Intrinsic fundamental frequency of vowels in children with Childhood Apraxia of Speech (CAS).

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000545595 [Epub ahead of print].

Background Intrinsic pitch (IF0) is an inherent property of vowels where high vowels are produced with a higher fundamental frequency than low vowels. Although well studied in adults, it remains underexplored in children. IF0 reflects combined biomechanical effects as well as a deliberate effort from speakers to produce distinct vowels and enhance vowel contrasts. Vowel errors and inconsistency in vowel production is one of the well-known characteristics in Childhood Apraxia of Speech (CAS). We aimed to investigate if children with CAS exhibit IF0 and if present, how it compares with typically developing (TD) children. Method 17 CAS children and 8 TD children were asked to repeat simple bisyllabic non-word utterances of the type [dəCV] six times. The stimuli contained a consonant, C (/b, d/) and a vowel V, which comprised of the corner vowels of the Dutch vowel space (/a, i, u/). The target stimulus was produced in a carrier sentence (/he dəCV wɪːr/; 'hey the CV again'). Mean pitch (F0) and formant (F1 to F3) values were extracted from the recorded speech samples around vowel midpoint and Bark transformed prior to further analyses. Statistical analyses were carried out using linear mixed models for each outcome measure separately. Results The main finding of our study is that IF0 is present in children with CAS with a pattern generally similar to TD children. Additionally, we observed differences in vowel characteristics in children with CAS that are ambiguous, rather we observed vowel specific differences. Children with CAS produced the /a/ vowel with an exaggerated openness whereas they produced /u/ more fronted compared to TD children. Also, children with CAS produced their vowels generally with a higher pitch and a longer duration compared to TD children. Pitch and duration were only correlated (negatively) in the vowel /a/ in both groups. Conclusions Where intrinsic pitch appears to be preserved in children with CAS, they do show differences in articulatory dimensions of vowel production compared to TD that are vowel specific. Clinicians should take these vowel specific differences into account when choosing therapeutic targets.

Additional Links: PMID-40159307

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40159307,
year = {2025},
author = {Terband, H and Bhat, B},
title = {Intrinsic fundamental frequency of vowels in children with Childhood Apraxia of Speech (CAS).},
journal = {Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP)},
volume = {},
number = {},
pages = {1-18},
doi = {10.1159/000545595},
pmid = {40159307},
issn = {1421-9972},
abstract = {Background Intrinsic pitch (IF0) is an inherent property of vowels where high vowels are produced with a higher fundamental frequency than low vowels. Although well studied in adults, it remains underexplored in children. IF0 reflects combined biomechanical effects as well as a deliberate effort from speakers to produce distinct vowels and enhance vowel contrasts. Vowel errors and inconsistency in vowel production is one of the well-known characteristics in Childhood Apraxia of Speech (CAS). We aimed to investigate if children with CAS exhibit IF0 and if present, how it compares with typically developing (TD) children. Method 17 CAS children and 8 TD children were asked to repeat simple bisyllabic non-word utterances of the type [dəCV] six times. The stimuli contained a consonant, C (/b, d/) and a vowel V, which comprised of the corner vowels of the Dutch vowel space (/a, i, u/). The target stimulus was produced in a carrier sentence (/he dəCV wɪːr/; 'hey the CV again'). Mean pitch (F0) and formant (F1 to F3) values were extracted from the recorded speech samples around vowel midpoint and Bark transformed prior to further analyses. Statistical analyses were carried out using linear mixed models for each outcome measure separately. Results The main finding of our study is that IF0 is present in children with CAS with a pattern generally similar to TD children. Additionally, we observed differences in vowel characteristics in children with CAS that are ambiguous, rather we observed vowel specific differences. Children with CAS produced the /a/ vowel with an exaggerated openness whereas they produced /u/ more fronted compared to TD children. Also, children with CAS produced their vowels generally with a higher pitch and a longer duration compared to TD children. Pitch and duration were only correlated (negatively) in the vowel /a/ in both groups. Conclusions Where intrinsic pitch appears to be preserved in children with CAS, they do show differences in articulatory dimensions of vowel production compared to TD that are vowel specific. Clinicians should take these vowel specific differences into account when choosing therapeutic targets.},
}

RevDate: 2025-03-30

Eyisaraç Ş, Özel HE, Selçuk A, et al (2025)

Vocal Resonance Alterations Following Anterior Palatoplasty and Expansion Sphincter Pharyngoplasty.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00106-7 [Epub ahead of print].

AIM: This study aims to examine the effects of combined anterior palatoplasty (AP) and expansion sphincter pharyngoplasty (ESP) on vocal resonance and nasalization in patients with mild to moderate obstructive sleep apnea syndrome (OSAS), utilizing objective testing methods.

MATERIALS AND METHODS: A total of 28 patients with mild to moderate OSAS, determined by polysomnography, were included in the study. Preoperative assessments and postoperative evaluations at the 1st and 6th months were conducted, during which patients produced steady sustained phonation of the vowels /ɑ/, /ɛ/, /ɯ/, /i/, /ɔ/, /œ/, /u/, and /y/. Formant frequencies (F0, F1, F2, F3, and F4) were analyzed. Additionally, nasalization was evaluated using the vowel /ɑ/ in the syllable /ɟ ɑ ɟ/ and quantified by analyzing F0, F1, F2, F3, F4, and A1P0 values, where A1 represents the amplitude of the first formant harmonic peak and P0 represents the amplitude of the lowest nasal peak.

RESULTS: No statistically significant changes were observed in the fundamental frequency (F0) of any vowels before and after surgery. At 6 months postoperatively, significant decreases in F1 for /ɑ/ (P = 0.047) and F3 for /u/ (P = 0.017) were noted. Nasalization measurements at 6 months showed significant changes, including a decrease in F3 (P = 0.023), an increase in F4 (P = 0.025), and a decrease in A1P0 values for nasalized /ɑ/ (P = 0.013).

CONCLUSION: AP + ESP affect vocal resonance specifically in back vowels (/ɑ/, /u/), and leads to nasalization, consistent with the surgical focus on the velopharyngeal region, while preserving fundamental frequency across all vowels. These alterations might influence how individuals perceive their voice, possibly having particular relevance for professional voice users.

Additional Links: PMID-40158914

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40158914,
year = {2025},
author = {Eyisaraç, Ş and Özel, HE and Selçuk, A and Bayakır, F and Başer, S and Altıparmak, E and Genç, S and Özdoğan, F and Köroğlu, E},
title = {Vocal Resonance Alterations Following Anterior Palatoplasty and Expansion Sphincter Pharyngoplasty.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.03.010},
pmid = {40158914},
issn = {1873-4588},
abstract = {AIM: This study aims to examine the effects of combined anterior palatoplasty (AP) and expansion sphincter pharyngoplasty (ESP) on vocal resonance and nasalization in patients with mild to moderate obstructive sleep apnea syndrome (OSAS), utilizing objective testing methods.

MATERIALS AND METHODS: A total of 28 patients with mild to moderate OSAS, determined by polysomnography, were included in the study. Preoperative assessments and postoperative evaluations at the 1st and 6th months were conducted, during which patients produced steady sustained phonation of the vowels /ɑ/, /ɛ/, /ɯ/, /i/, /ɔ/, /œ/, /u/, and /y/. Formant frequencies (F0, F1, F2, F3, and F4) were analyzed. Additionally, nasalization was evaluated using the vowel /ɑ/ in the syllable /ɟ ɑ ɟ/ and quantified by analyzing F0, F1, F2, F3, F4, and A1P0 values, where A1 represents the amplitude of the first formant harmonic peak and P0 represents the amplitude of the lowest nasal peak.

RESULTS: No statistically significant changes were observed in the fundamental frequency (F0) of any vowels before and after surgery. At 6 months postoperatively, significant decreases in F1 for /ɑ/ (P = 0.047) and F3 for /u/ (P = 0.017) were noted. Nasalization measurements at 6 months showed significant changes, including a decrease in F3 (P = 0.023), an increase in F4 (P = 0.025), and a decrease in A1P0 values for nasalized /ɑ/ (P = 0.013).

CONCLUSION: AP + ESP affect vocal resonance specifically in back vowels (/ɑ/, /u/), and leads to nasalization, consistent with the surgical focus on the velopharyngeal region, while preserving fundamental frequency across all vowels. These alterations might influence how individuals perceive their voice, possibly having particular relevance for professional voice users.},
}

RevDate: 2025-03-28

Wang Q, Xu F, Wang X, et al (2025)

How Anxiety State Influences Speech Parameters: A Network Analysis Study from a Real Stressed Scenario.

Brain sciences, 15(3): pii:brainsci15030262.

Background/Objectives: Voice analysis has shown promise in anxiety assessment, yet traditional approaches examining isolated acoustic features yield inconsistent results. This study aimed to explore the relationship between anxiety states and vocal parameters from a network perspective in ecologically valid settings. Methods: A cross-sectional study was conducted with 316 undergraduate students (191 males, 125 females; mean age 20.3 ± 0.85 years) who completed a standardized picture description task while their speech was recorded. Participants were categorized into low-anxiety (n = 119) and high-anxiety (n = 197) groups based on self-reported anxiety ratings. Five acoustic parameters-jitter, fundamental frequency (F0), formant frequencies (F1/F2), intensity, and speech rate-were analyzed using network analysis. Results: Network analysis revealed a robust negative relationship between jitter and state anxiety, with jitter as the sole speech parameter consistently linked to state anxiety in the total group. Additionally, higher anxiety levels were associated with a coupling between intensity and F1/F2, whereas the low-anxiety network displayed a sparser organization without intensity and F1/F2 connection. Conclusions: Anxiety could be recognized by speech parameter networks in ecological settings. The distinct pattern with the negative jitter-anxiety relationship in the total network and the connection between intensity and F1/2 in high-anxiety states suggest potential speech markers for anxiety assessment. These findings suggest that state anxiety may directly influence jitter and fundamentally restructure the relationships among speech features, highlighting the importance of examining jitter and speech parameter interactions rather than isolated values in speech detection of anxiety.

Additional Links: PMID-40149785

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40149785,
year = {2025},
author = {Wang, Q and Xu, F and Wang, X and Wu, S and Ren, L and Liu, X},
title = {How Anxiety State Influences Speech Parameters: A Network Analysis Study from a Real Stressed Scenario.},
journal = {Brain sciences},
volume = {15},
number = {3},
pages = {},
doi = {10.3390/brainsci15030262},
pmid = {40149785},
issn = {2076-3425},
support = {2023RCJB04//Air Force Medical University/ ; },
abstract = {Background/Objectives: Voice analysis has shown promise in anxiety assessment, yet traditional approaches examining isolated acoustic features yield inconsistent results. This study aimed to explore the relationship between anxiety states and vocal parameters from a network perspective in ecologically valid settings. Methods: A cross-sectional study was conducted with 316 undergraduate students (191 males, 125 females; mean age 20.3 ± 0.85 years) who completed a standardized picture description task while their speech was recorded. Participants were categorized into low-anxiety (n = 119) and high-anxiety (n = 197) groups based on self-reported anxiety ratings. Five acoustic parameters-jitter, fundamental frequency (F0), formant frequencies (F1/F2), intensity, and speech rate-were analyzed using network analysis. Results: Network analysis revealed a robust negative relationship between jitter and state anxiety, with jitter as the sole speech parameter consistently linked to state anxiety in the total group. Additionally, higher anxiety levels were associated with a coupling between intensity and F1/F2, whereas the low-anxiety network displayed a sparser organization without intensity and F1/F2 connection. Conclusions: Anxiety could be recognized by speech parameter networks in ecological settings. The distinct pattern with the negative jitter-anxiety relationship in the total network and the connection between intensity and F1/2 in high-anxiety states suggest potential speech markers for anxiety assessment. These findings suggest that state anxiety may directly influence jitter and fundamentally restructure the relationships among speech features, highlighting the importance of examining jitter and speech parameter interactions rather than isolated values in speech detection of anxiety.},
}

RevDate: 2025-03-27
CmpDate: 2025-03-27

Stepanović M, Hardmeier C, O Scharenborg (2025)

Formant-based vowel categorization for cross-lingual phone recognition.

The Journal of the Acoustical Society of America, 157(3):2248-2262.

Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.

Additional Links: PMID-40145791

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40145791,
year = {2025},
author = {Stepanović, M and Hardmeier, C and Scharenborg, O},
title = {Formant-based vowel categorization for cross-lingual phone recognition.},
journal = {The Journal of the Acoustical Society of America},
volume = {157},
number = {3},
pages = {2248-2262},
doi = {10.1121/10.0036222},
pmid = {40145791},
issn = {1520-8524},
mesh = {Humans ; *Phonetics ; *Multilingualism ; *Speech Perception ; *Speech Acoustics ; Female ; Male ; Adult ; Language ; Young Adult ; Recognition, Psychology ; },
abstract = {Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Phonetics
*Multilingualism
*Speech Perception
*Speech Acoustics
Female
Male
Adult
Language
Young Adult
Recognition, Psychology

RevDate: 2025-03-25
CmpDate: 2025-03-25

Chen F, Pan C, Hu H, et al (2025)

Understanding the Lombard Effect for Mandarin: Relation Between Speech Recognition Thresholds and Acoustic Parameters.

Trends in hearing, 29:23312165251324266.

The present work quantifies the Lombard effect across native speakers of Mandarin Chinese using the Matrix sentence test, which is optimized for precisely assessing speech recognition thresholds (SRTs) in noise. Specifically, we studied the effects of speaker gender, fundamental frequency (F0), formant frequencies (F1 and F2), the duration and rate of voiced segments, and frequency-specific energy redistribution characterized by alpha ratio and speech-weighted signal-to-noise ratio (swSNR) on the recognition of Mandarin in plain and Lombard speech. The Mandarin Chinese matrix test was recorded with plain and Lombard speech from 11 native-Mandarin speakers. SRTs in stationary noise were measured with native-Mandarin, normal-hearing listeners. Results showed that on average, Mandarin Lombard speech was more intelligible than Mandarin plain speech for both female and male speakers, and the Mandarin Lombard gain of female speakers was larger than that of males. In addition, various acoustic analyses involving all speakers showed that (a) only swSNR was significantly correlated with the SRT of the Mandarin plain speech; (b) most acoustic measures were significantly correlated with the SRT of the Mandarin Lombard speech; and (c) alpha ratio and swSNR were significantly correlated with the SRT Lombard gain. In addition, a gender effect was found in the correlational analysis between acoustic parameters and SRT as well as Lombard gain in SRT. The findings highlight the impact of increased high-frequency energy on the observed Lombard gain in Mandarin speech, whereas the changes in individual acoustic parameters (e.g., F0 and F1) appear to play only a minor role.

Additional Links: PMID-40129406

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40129406,
year = {2025},
author = {Chen, F and Pan, C and Hu, H and Hochmuth, S and Kollmeier, B and Warzybok, A},
title = {Understanding the Lombard Effect for Mandarin: Relation Between Speech Recognition Thresholds and Acoustic Parameters.},
journal = {Trends in hearing},
volume = {29},
number = {},
pages = {23312165251324266},
doi = {10.1177/23312165251324266},
pmid = {40129406},
issn = {2331-2165},
mesh = {Humans ; Female ; Male ; *Speech Perception/physiology ; *Speech Acoustics ; Young Adult ; Adult ; *Acoustic Stimulation ; *Noise/adverse effects ; Speech Reception Threshold Test ; Auditory Threshold/physiology ; Sex Factors ; Speech Intelligibility ; Recognition, Psychology ; Perceptual Masking/physiology ; Voice Quality ; Language ; },
abstract = {The present work quantifies the Lombard effect across native speakers of Mandarin Chinese using the Matrix sentence test, which is optimized for precisely assessing speech recognition thresholds (SRTs) in noise. Specifically, we studied the effects of speaker gender, fundamental frequency (F0), formant frequencies (F1 and F2), the duration and rate of voiced segments, and frequency-specific energy redistribution characterized by alpha ratio and speech-weighted signal-to-noise ratio (swSNR) on the recognition of Mandarin in plain and Lombard speech. The Mandarin Chinese matrix test was recorded with plain and Lombard speech from 11 native-Mandarin speakers. SRTs in stationary noise were measured with native-Mandarin, normal-hearing listeners. Results showed that on average, Mandarin Lombard speech was more intelligible than Mandarin plain speech for both female and male speakers, and the Mandarin Lombard gain of female speakers was larger than that of males. In addition, various acoustic analyses involving all speakers showed that (a) only swSNR was significantly correlated with the SRT of the Mandarin plain speech; (b) most acoustic measures were significantly correlated with the SRT of the Mandarin Lombard speech; and (c) alpha ratio and swSNR were significantly correlated with the SRT Lombard gain. In addition, a gender effect was found in the correlational analysis between acoustic parameters and SRT as well as Lombard gain in SRT. The findings highlight the impact of increased high-frequency energy on the observed Lombard gain in Mandarin speech, whereas the changes in individual acoustic parameters (e.g., F0 and F1) appear to play only a minor role.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Female
Male
*Speech Perception/physiology
*Speech Acoustics
Young Adult
Adult
*Acoustic Stimulation
*Noise/adverse effects
Speech Reception Threshold Test
Auditory Threshold/physiology
Sex Factors
Speech Intelligibility
Recognition, Psychology
Perceptual Masking/physiology
Voice Quality
Language

RevDate: 2025-03-20

Mou Z, Peng K, Ye W, et al (2025)

Acoustic Properties of Vowel Production in Mandarin-Speaking Patients With Parkinson Disease-Related Hypokinetic Dysarthria.

The Journal of craniofacial surgery [Epub ahead of print].

OBJECTIVE: The objective of the present study is to identify acoustic parameters for speech evaluation in patients who speak Mandarin, with Parkinson disease-related hypokinetic dysarthria (PDHD).

METHODS: The authors' sample included 31 patients with PDHD and 38 neurologically normal adults in a similar age range. The authors recorded each participant articulating a list of Mandarin monosyllables that included 6 monophthong vowels (i.e., /a, i, u, ɤ, y, o/). The authors identified the vowel duration (V-dur) and formants (F1 and F2) of each vowel token. On the basis of the formants, the authors calculated and analyzed the acoustic indexes of vowel space area (VSA), vowel articulation index (VAI), and formant centralization ratio (FCR) of the vowels.

RESULTS: Compared with healthy speakers, patients with PDHD had a significantly longer vowel duration for all 6 vowels (P < 0.01). The differences in VSA, VAI, and FCR between the case and normal groups were all statistically significant.

CONCLUSIONS: Differences in vowel acoustic indexes (V-dur, VSA, VAI, and FCR) between the 2 groups revealed that these 4 indexes were sensitive to the variation in vowel production in patients with PDHD. These indexes can be used to evaluate speech intelligibility caused by impaired vowel pronunciation in patients with PDHD and the outcome of rehabilitation therapy.

Additional Links: PMID-40111024

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40111024,
year = {2025},
author = {Mou, Z and Peng, K and Ye, W and Xu, J and Chen, Y and Tong, M and Lu, J},
title = {Acoustic Properties of Vowel Production in Mandarin-Speaking Patients With Parkinson Disease-Related Hypokinetic Dysarthria.},
journal = {The Journal of craniofacial surgery},
volume = {},
number = {},
pages = {},
pmid = {40111024},
issn = {1536-3732},
support = {2024B03J1341//Science and Technology Projects in Guangzhou/ ; A2023353//Guangdong Medical Science and Technology Research Foundation of China/ ; 2022A0505040007//Special Project of Guangdong Province for technology innovation strategy/ ; 202201020046//Science and Technology Projects in Guangzhou/ ; 2021A1515220049//Government-enterprise Joint Programs of Natural Science Foundation of Guangdong Province/ ; 20202042//Administration of Traditional Chinese Medicine of Guangdong Province/ ; },
abstract = {OBJECTIVE: The objective of the present study is to identify acoustic parameters for speech evaluation in patients who speak Mandarin, with Parkinson disease-related hypokinetic dysarthria (PDHD).

METHODS: The authors' sample included 31 patients with PDHD and 38 neurologically normal adults in a similar age range. The authors recorded each participant articulating a list of Mandarin monosyllables that included 6 monophthong vowels (i.e., /a, i, u, ɤ, y, o/). The authors identified the vowel duration (V-dur) and formants (F1 and F2) of each vowel token. On the basis of the formants, the authors calculated and analyzed the acoustic indexes of vowel space area (VSA), vowel articulation index (VAI), and formant centralization ratio (FCR) of the vowels.

RESULTS: Compared with healthy speakers, patients with PDHD had a significantly longer vowel duration for all 6 vowels (P < 0.01). The differences in VSA, VAI, and FCR between the case and normal groups were all statistically significant.

CONCLUSIONS: Differences in vowel acoustic indexes (V-dur, VSA, VAI, and FCR) between the 2 groups revealed that these 4 indexes were sensitive to the variation in vowel production in patients with PDHD. These indexes can be used to evaluate speech intelligibility caused by impaired vowel pronunciation in patients with PDHD and the outcome of rehabilitation therapy.},
}

RevDate: 2025-03-19

Celenk C, Ulkumen B, O Celik (2025)

The Effect of Concomitant Septoplasty and Turbinate Surgery on Nasality-Related Voice Parameters.

Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery [Epub ahead of print].

INTRODUCTION: Our study aimed to reveal whether septoplasty and inferior turbinate reduction significantly impact the acoustic properties of nasalized syllables and alter subjective and objective voice parameters.

MATERIALS AND METHODS: Forty patients with nasal septal deviation and bilateral grade 2 ≤ inferior turbinate hypertrophy who underwent septoplasty and bilateral inferior turbinoplasty were enrolled. Participants completed the VHI-10, VAS, and NOSE scales preoperatively and at 6 months postoperatively. Changes in VAS and NOSE scores were calculated as VAS[change] and NOSE[change] values. Voice recordings of the sustained vowel /a/ and the word /mini/ were analysed using MDVP. Acoustic analysis was performed with the sustained vowel /a/, and spectrographic analysis was conducted with the consonants /m/, /n/, and the vowel /i/ in /mini/. Recordings were taken preoperatively and at 6 months postoperatively. Statistical analysis compared pre- and postoperative values for significant changes using SPSS Version 21.0 (IBM Corp.; Armonk, NY, USA).

RESULTS: A statistically significant decrease in VAS and NOSE scores was observed at 6 months postoperatively (p < 0.05). No significant difference was found in VHI-10 scores (p > 0.05). Acoustic analysis showed a significant change in pre- and postoperative F0 values (p < 0.05), but not in jitter, jitter%, shimmer, shimmer%, and NHR (p > 0.05). Spectrographic analysis revealed significant postoperative changes in the F3 and F4 formants of consonants /m/, /n/, and vowel /i/ in the word /mini/. A significant correlation was found between postoperative changes in F3 and F4 formant values for consonants /m/ and /n/ with the VAS[change] value. For the NOSE[change] value, a significant correlation was found only with the change in the F3 formant value for the consonant /m/.

CONCLUSION: Nasal surgeries, particularly septo-turbinoplasty, can influence voice timbre by modifying F3 and F4, which is of notable concern for professional voice users, such as singers and actors, due to the potential impact on the singer's formant cluster and overall vocal quality. Although it may not be appropriate to generalise for all rhinological surgeries, the significant changes in the F3 and F4 formants in a specific and refined patient group suggest that caution should be exercised in such surgeries, especially for professional voice users.

Additional Links: PMID-40103316

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40103316,
year = {2025},
author = {Celenk, C and Ulkumen, B and Celik, O},
title = {The Effect of Concomitant Septoplasty and Turbinate Surgery on Nasality-Related Voice Parameters.},
journal = {Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery},
volume = {},
number = {},
pages = {},
doi = {10.1111/coa.14304},
pmid = {40103316},
issn = {1749-4486},
abstract = {INTRODUCTION: Our study aimed to reveal whether septoplasty and inferior turbinate reduction significantly impact the acoustic properties of nasalized syllables and alter subjective and objective voice parameters.

MATERIALS AND METHODS: Forty patients with nasal septal deviation and bilateral grade 2 ≤ inferior turbinate hypertrophy who underwent septoplasty and bilateral inferior turbinoplasty were enrolled. Participants completed the VHI-10, VAS, and NOSE scales preoperatively and at 6 months postoperatively. Changes in VAS and NOSE scores were calculated as VAS[change] and NOSE[change] values. Voice recordings of the sustained vowel /a/ and the word /mini/ were analysed using MDVP. Acoustic analysis was performed with the sustained vowel /a/, and spectrographic analysis was conducted with the consonants /m/, /n/, and the vowel /i/ in /mini/. Recordings were taken preoperatively and at 6 months postoperatively. Statistical analysis compared pre- and postoperative values for significant changes using SPSS Version 21.0 (IBM Corp.; Armonk, NY, USA).

RESULTS: A statistically significant decrease in VAS and NOSE scores was observed at 6 months postoperatively (p < 0.05). No significant difference was found in VHI-10 scores (p > 0.05). Acoustic analysis showed a significant change in pre- and postoperative F0 values (p < 0.05), but not in jitter, jitter%, shimmer, shimmer%, and NHR (p > 0.05). Spectrographic analysis revealed significant postoperative changes in the F3 and F4 formants of consonants /m/, /n/, and vowel /i/ in the word /mini/. A significant correlation was found between postoperative changes in F3 and F4 formant values for consonants /m/ and /n/ with the VAS[change] value. For the NOSE[change] value, a significant correlation was found only with the change in the F3 formant value for the consonant /m/.

CONCLUSION: Nasal surgeries, particularly septo-turbinoplasty, can influence voice timbre by modifying F3 and F4, which is of notable concern for professional voice users, such as singers and actors, due to the potential impact on the singer's formant cluster and overall vocal quality. Although it may not be appropriate to generalise for all rhinological surgeries, the significant changes in the F3 and F4 formants in a specific and refined patient group suggest that caution should be exercised in such surgeries, especially for professional voice users.},
}

RevDate: 2025-03-17

Benz KR, Hauswald A, N Weisz (2025)

Influence of visual analogue of speech envelope, formants, and word onsets on word recognition is not pronounced.

Hearing research, 460:109237 pii:S0378-5955(25)00056-5 [Epub ahead of print].

In noisy environments, filtering out the relevant speech signal from the background noise is a major challenge. Visual cues, such as lip movements, can improve speech understanding. This suggests that lip movements carry information about speech features (e.g. speech envelope, formants, word onsets) that can be used to aid speech understanding. Moreover, the isolated visual or tactile presentation of the speech envelope can also aid word recognition. However, the evidence in this area is rather mixed, and formants and word onsets have not been studied in this context. This online study investigates the effect of different visually presented speech features (speech envelope, formants, word onsets) during a two-talker audio on word recognition. The speech features were presented as a circle whose size was modulated over time based on the dynamics of three speech features. The circle was either modulated according to the speech features of the target speaker, the distractor speaker or an unrelated control sentence. After each sentence, the participants` word recognition was tested by writing down what they heard. We show that word recognition is not enhanced for any of the visual features relative to the visual control condition.

Additional Links: PMID-40096812

Publisher:

PubMed:

Google:

full text, via PubMed Central

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40096812,
year = {2025},
author = {Benz, KR and Hauswald, A and Weisz, N},
title = {Influence of visual analogue of speech envelope, formants, and word onsets on word recognition is not pronounced.},
journal = {Hearing research},
volume = {460},
number = {},
pages = {109237},
doi = {10.1016/j.heares.2025.109237},
pmid = {40096812},
issn = {1878-5891},
abstract = {In noisy environments, filtering out the relevant speech signal from the background noise is a major challenge. Visual cues, such as lip movements, can improve speech understanding. This suggests that lip movements carry information about speech features (e.g. speech envelope, formants, word onsets) that can be used to aid speech understanding. Moreover, the isolated visual or tactile presentation of the speech envelope can also aid word recognition. However, the evidence in this area is rather mixed, and formants and word onsets have not been studied in this context. This online study investigates the effect of different visually presented speech features (speech envelope, formants, word onsets) during a two-talker audio on word recognition. The speech features were presented as a circle whose size was modulated over time based on the dynamics of three speech features. The circle was either modulated according to the speech features of the target speaker, the distractor speaker or an unrelated control sentence. After each sentence, the participants` word recognition was tested by writing down what they heard. We show that word recognition is not enhanced for any of the visual features relative to the visual control condition.},
}

RevDate: 2025-03-11

Bakhshaee M, Sadri AB, Sobhani D, et al (2025)

The Effect of Rhinoplasty on the Acoustic Characteristics of Resonance and Sound Production.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 77(1):401-411.

Rhinoplasty is the most common cosmetic surgery procedure in Iran. One of the complications of this procedure that has been less considered is the probable effect of rhinoplasty on voice. This study aimed to assess the influence of rhinoplasty on acoustic characteristics of resonance and sound production. This prospective study was undergone on 25 patients with rhinoplasty and septorhinoplasty. All patients were referred to a speech therapy clinic for voice recording. Participants were asked to read a task containing nasal vowels, nasal consonants, syllables, and sentences with and without nasal consonants while a microphone was placed 5 cm from the mouth in a silent room before and three times (one, three, and six months) after surgery. A speech therapist consultant analyzed the recording data. Acoustic parameters including formant 1-5, LTAS, and HNR were measured and compared before and after surgery. Based on this study, fourth and fifth formants were the most formant affected by rhinoplasty; however, it was not significant. In addition, other investigated acoustic parameters, including LTAS and HNR, did not differ meaningfully after the procedure. Acoustic analysis of nasal vowels, nasal consonants, syllables, words, and sentences with and without nasal consonants did not reveal any significant differences after the rhinoplasty.

Additional Links: PMID-40066383

Full Text:

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40066383,
year = {2025},
author = {Bakhshaee, M and Sadri, AB and Sobhani, D and Morovatdar, N and Rasoulian, B},
title = {The Effect of Rhinoplasty on the Acoustic Characteristics of Resonance and Sound Production.},
journal = {Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India},
volume = {77},
number = {1},
pages = {401-411},
doi = {10.1007/s12070-024-05208-3},
pmid = {40066383},
issn = {2231-3796},
abstract = {Rhinoplasty is the most common cosmetic surgery procedure in Iran. One of the complications of this procedure that has been less considered is the probable effect of rhinoplasty on voice. This study aimed to assess the influence of rhinoplasty on acoustic characteristics of resonance and sound production. This prospective study was undergone on 25 patients with rhinoplasty and septorhinoplasty. All patients were referred to a speech therapy clinic for voice recording. Participants were asked to read a task containing nasal vowels, nasal consonants, syllables, and sentences with and without nasal consonants while a microphone was placed 5 cm from the mouth in a silent room before and three times (one, three, and six months) after surgery. A speech therapist consultant analyzed the recording data. Acoustic parameters including formant 1-5, LTAS, and HNR were measured and compared before and after surgery. Based on this study, fourth and fifth formants were the most formant affected by rhinoplasty; however, it was not significant. In addition, other investigated acoustic parameters, including LTAS and HNR, did not differ meaningfully after the procedure. Acoustic analysis of nasal vowels, nasal consonants, syllables, words, and sentences with and without nasal consonants did not reveal any significant differences after the rhinoplasty.},
}

RevDate: 2025-03-06

Ambros GDA, MA Andrada E Silva (2025)

Resonance Strategies in the Upper Range of Western Operatic Tenors.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00071-2 [Epub ahead of print].

BACKGROUND: High notes pose a challenge for classical tenors due to physiological and acoustic aspects. According to nonlinear source-filter interactions, it is beneficial in these notes to position the resonances just above the frequency of their closest harmonics, amplifying them while avoiding phonatory discontinuities. Intentional tuning of resonances to harmonics in the high tenor tessitura has been described in the literature.

OBJECTIVES: Identify the resonance strategies employed by operatic tenors in high notes.

METHOD: Five professional tenors were recorded emitting the vowels /a, e, i, o, u/, sung in ascending scales between the notes C3 (131 Hz) and C5 (523 Hz) and spoken in carrier sentences. The frequencies of the first two resonances were extracted through inverse filtering, as well as the amplitudes of the first four harmonics and the peak in the singer's formant region in the radiated spectrum.

RESULTS: From low to high notes, the frequencies of the first two resonances of all vowels tended to converge. Resonance tuning was most employed in the passaggio (first resonance tuned to the second harmonic, second resonance to the fourth harmonic) and at its upper limit (second resonance tuned to the third harmonic). In the highest notes, the balanced distribution of energy among the lower harmonics was more frequent, with the more dramatic voices exhibiting an equally strong singer's formant. Only in the vowel /i/ did first resonance tunings to the first harmonic occur.

CONCLUSIONS: The vowels became progressively less distinguishable towards the high notes. Systematic resonance tuning was not observed in the high notes, with a greater occurrence of similarly strong lower harmonics, without strong distinct spectrum envelope peaks. Where resonance tuning was identified, there was no apparent preference for positioning the resonances above or below the frequency of their closest harmonics.

Additional Links: PMID-40050171

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40050171,
year = {2025},
author = {Ambros, GDA and Andrada E Silva, MA},
title = {Resonance Strategies in the Upper Range of Western Operatic Tenors.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.02.025},
pmid = {40050171},
issn = {1873-4588},
abstract = {BACKGROUND: High notes pose a challenge for classical tenors due to physiological and acoustic aspects. According to nonlinear source-filter interactions, it is beneficial in these notes to position the resonances just above the frequency of their closest harmonics, amplifying them while avoiding phonatory discontinuities. Intentional tuning of resonances to harmonics in the high tenor tessitura has been described in the literature.

OBJECTIVES: Identify the resonance strategies employed by operatic tenors in high notes.

METHOD: Five professional tenors were recorded emitting the vowels /a, e, i, o, u/, sung in ascending scales between the notes C3 (131 Hz) and C5 (523 Hz) and spoken in carrier sentences. The frequencies of the first two resonances were extracted through inverse filtering, as well as the amplitudes of the first four harmonics and the peak in the singer's formant region in the radiated spectrum.

RESULTS: From low to high notes, the frequencies of the first two resonances of all vowels tended to converge. Resonance tuning was most employed in the passaggio (first resonance tuned to the second harmonic, second resonance to the fourth harmonic) and at its upper limit (second resonance tuned to the third harmonic). In the highest notes, the balanced distribution of energy among the lower harmonics was more frequent, with the more dramatic voices exhibiting an equally strong singer's formant. Only in the vowel /i/ did first resonance tunings to the first harmonic occur.

CONCLUSIONS: The vowels became progressively less distinguishable towards the high notes. Systematic resonance tuning was not observed in the high notes, with a greater occurrence of similarly strong lower harmonics, without strong distinct spectrum envelope peaks. Where resonance tuning was identified, there was no apparent preference for positioning the resonances above or below the frequency of their closest harmonics.},
}

RevDate: 2025-03-05

Lou Q, Wang X, Wan T, et al (2024)

Speech Acoustic Analysis in Adult Patients With Cleft Palate After Cleft Palate Repair and Speech Therapy.

The Journal of craniofacial surgery pii:00001665-990000000-01814 [Epub ahead of print].

OBJECTIVE: This study aims to evaluate the enhancement of speech functionality in adult patients with cleft palate through acoustic analysis, assessing pronunciation level improvements before and after palatopharyngoplasty and speech treatment. The findings aim to provide an objective assessment of the treatment efficacy for older patients with cleft palate.

PARTICIPANTS AND INTERVENTION: The study involved acoustic comparisons encompassing vowel formants, voice onset time (VOT) of consonant syllables, syllable duration, and voice characteristic analysis. Speech functionality in each adult cleft palate patient was evaluated thrice: before palatopharyngoplasty, after palatopharyngoplasty, and following speech therapy, using a self-comparative analysis method to discern phonological differences.

RESULTS: No significant alteration in vowel formants was observed in adult cleft palate patients pre-palatopharyngoplasty and post-palatopharyngoplasty. Post-speech treatment, the F2 and F3 values for the anterior high vowel /i/ significantly improved, aligning closely with those of the normal adult group. Similarly, while consonant parameters (VOT value and syllable duration) remained unchanged post-surgery, both metrics showed significant improvement after speech therapy. Except for the prolonged syllable duration of /s/ compared with normal adults, other indicators were not significantly different. Voice parameter analysis revealed no significant change post-operation; however, both HNR and CPPS values post-speech treatment notably increased, matching those of normal adults.

CONCLUSION: Surgical intervention addresses the physical closure of the cleft palate and reconstructs the resonator's structure. Conversely, consonant improvement predominantly occurs through targeted speech therapy aimed at rectifying pronunciation habits and tutoring patients on the effective utilization of repaired articulatory organs. The combined intervention of cleft palate surgery and speech therapy plays a complementary role in speech restoration for cleft palate patients.

Additional Links: PMID-40043206

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40043206,
year = {2024},
author = {Lou, Q and Wang, X and Wan, T and Wang, B},
title = {Speech Acoustic Analysis in Adult Patients With Cleft Palate After Cleft Palate Repair and Speech Therapy.},
journal = {The Journal of craniofacial surgery},
volume = {},
number = {},
pages = {},
doi = {10.1097/SCS.0000000000010495},
pmid = {40043206},
issn = {1536-3732},
abstract = {OBJECTIVE: This study aims to evaluate the enhancement of speech functionality in adult patients with cleft palate through acoustic analysis, assessing pronunciation level improvements before and after palatopharyngoplasty and speech treatment. The findings aim to provide an objective assessment of the treatment efficacy for older patients with cleft palate.

PARTICIPANTS AND INTERVENTION: The study involved acoustic comparisons encompassing vowel formants, voice onset time (VOT) of consonant syllables, syllable duration, and voice characteristic analysis. Speech functionality in each adult cleft palate patient was evaluated thrice: before palatopharyngoplasty, after palatopharyngoplasty, and following speech therapy, using a self-comparative analysis method to discern phonological differences.

RESULTS: No significant alteration in vowel formants was observed in adult cleft palate patients pre-palatopharyngoplasty and post-palatopharyngoplasty. Post-speech treatment, the F2 and F3 values for the anterior high vowel /i/ significantly improved, aligning closely with those of the normal adult group. Similarly, while consonant parameters (VOT value and syllable duration) remained unchanged post-surgery, both metrics showed significant improvement after speech therapy. Except for the prolonged syllable duration of /s/ compared with normal adults, other indicators were not significantly different. Voice parameter analysis revealed no significant change post-operation; however, both HNR and CPPS values post-speech treatment notably increased, matching those of normal adults.

CONCLUSION: Surgical intervention addresses the physical closure of the cleft palate and reconstructs the resonator's structure. Conversely, consonant improvement predominantly occurs through targeted speech therapy aimed at rectifying pronunciation habits and tutoring patients on the effective utilization of repaired articulatory organs. The combined intervention of cleft palate surgery and speech therapy plays a complementary role in speech restoration for cleft palate patients.},
}

RevDate: 2025-02-28

Muegge JB, B McMurray (2025)

Understanding the Process of Integration in Binaural Cochlear Implant Configurations.

Ear and hearing [Epub ahead of print].

OBJECTIVES: Cochlear implant (CI) users with access to hearing in both ears (binaural configurations) tend to perform better in speech perception tasks than users with a single-hearing ear alone. This benefit derives from several sources, but one central contributor may be that binaural hearing allows listeners to integrate content across ears. A substantial literature demonstrates that binaural integration differs between CI users and normal hearing controls. However, there are still questions about the underlying process of this integration. Here, we test both normal-hearing listeners and CI users to examine this process.

DESIGN: Twenty-three CI users (7 bimodal, 7 bilateral, and 9 single sided deafness CI users) and 28 age-matched normal-hearing listeners completed a dichotic listening task, in which first and second formants from one of four vowels were played to each ear in various configurations: with both formants heard diotically, with one formant heard diotically, or with one formant heard in one ear and the second formant heard in the other (dichotically). Each formant heard alone should provide minimal information for identifying the vowel. Thus, listeners must successfully integrate information from both ears if they are to show good performance in the dichotic condition.

RESULTS: Normal-hearing listeners showed no noticeable difference in performance when formants were heard diotically or dichotically. CI users showed significantly reduced performance in the dichotic condition relative to when formants were heard diotically. A deeper examination of individual participants suggests that CI users show important variation in their integration process.

CONCLUSIONS: Using a dichotic listening task we provide evidence that while normal-hearing listeners successfully integrate content dichotically, CI users show remarkable differences in how they approach integration. This opens further questions regarding the circumstances in which listeners display different integration profiles and has implications for understanding variation in real-world performance outcomes.

Additional Links: PMID-40016877

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid40016877,
year = {2025},
author = {Muegge, JB and McMurray, B},
title = {Understanding the Process of Integration in Binaural Cochlear Implant Configurations.},
journal = {Ear and hearing},
volume = {},
number = {},
pages = {},
pmid = {40016877},
issn = {1538-4667},
abstract = {OBJECTIVES: Cochlear implant (CI) users with access to hearing in both ears (binaural configurations) tend to perform better in speech perception tasks than users with a single-hearing ear alone. This benefit derives from several sources, but one central contributor may be that binaural hearing allows listeners to integrate content across ears. A substantial literature demonstrates that binaural integration differs between CI users and normal hearing controls. However, there are still questions about the underlying process of this integration. Here, we test both normal-hearing listeners and CI users to examine this process.

DESIGN: Twenty-three CI users (7 bimodal, 7 bilateral, and 9 single sided deafness CI users) and 28 age-matched normal-hearing listeners completed a dichotic listening task, in which first and second formants from one of four vowels were played to each ear in various configurations: with both formants heard diotically, with one formant heard diotically, or with one formant heard in one ear and the second formant heard in the other (dichotically). Each formant heard alone should provide minimal information for identifying the vowel. Thus, listeners must successfully integrate information from both ears if they are to show good performance in the dichotic condition.

RESULTS: Normal-hearing listeners showed no noticeable difference in performance when formants were heard diotically or dichotically. CI users showed significantly reduced performance in the dichotic condition relative to when formants were heard diotically. A deeper examination of individual participants suggests that CI users show important variation in their integration process.

CONCLUSIONS: Using a dichotic listening task we provide evidence that while normal-hearing listeners successfully integrate content dichotically, CI users show remarkable differences in how they approach integration. This opens further questions regarding the circumstances in which listeners display different integration profiles and has implications for understanding variation in real-world performance outcomes.},
}

RevDate: 2025-02-25
CmpDate: 2025-02-25

Persson A, Barreda S, TF Jaeger (2025)

Comparing accounts of formant normalization against US English listeners' vowel perception.

The Journal of the Acoustical Society of America, 157(2):1458-1482.

Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.

Additional Links: PMID-39998127

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39998127,
year = {2025},
author = {Persson, A and Barreda, S and Jaeger, TF},
title = {Comparing accounts of formant normalization against US English listeners' vowel perception.},
journal = {The Journal of the Acoustical Society of America},
volume = {157},
number = {2},
pages = {1458-1482},
doi = {10.1121/10.0035476},
pmid = {39998127},
issn = {1520-8524},
mesh = {Humans ; *Speech Perception ; *Phonetics ; Female ; Male ; *Speech Acoustics ; Adult ; Young Adult ; Language ; Acoustic Stimulation ; Recognition, Psychology ; },
abstract = {Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Perception
*Phonetics
Female
Male
*Speech Acoustics
Adult
Young Adult
Language
Acoustic Stimulation
Recognition, Psychology

RevDate: 2025-02-15

Liu W, Y Wang (2025)

Acoustic Characteristics of Tenors and Sopranos in Chinese National Singing and Bel Canto.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(25)00038-4 [Epub ahead of print].

BACKGROUND: With the advancement of vocal arts, Chinese National Singing and Western Classical Singing (Bel Canto) encounter challenges in cross-cultural adaptation. Investigating formant tuning strategies and the singer's formant is crucial for scientifically characterizing the vocal production techniques in Chinese singing styles.

METHOD: Eight singers-Chinese National Singing tenors, Chinese National Singing sopranos, Bel Canto tenors, and Bel Canto sopranos-were recruited. The fundamental frequency (F0), intensity, formants, and long-term average spectrum (LTAS) were analyzed using a series of designed tasks to examine the phonation and articulation characteristics of these two singing genres in the context of cross-cultural adaptation.

RESULTS: A positive correlation between F0 and intensity was generally observed, though variations existed across vowels and singers. Both linear and non-linear relationships were found between F0 and formants. The first formant (F1) was proportional to F0, with greater variability for female singers in the vowel /a/. LTAS analysis revealed that the tenors exhibited the singer's formant in sung vowels and songs, whereas the sopranos did not exhibit this feature when singing vowels but did so in specific songs. Moreover, the primary and secondary spectral peaks in Bel Canto were less influenced by songs compared to Chinese National Singing.

CONCLUSIONS: (i) Intensity can provide an objective basis for differentiating subjective differences between singing genres, and individual differences are evident in how singers handle the relationship between F0 and intensity. (ii) Vowel modification and vowel migration in sopranos reflect consistency and variability across linguistic and cultural contexts. (iii) The presence and characteristics of the singer's formant are influenced by sexes, singing genres, and songs. Differences in the degree of spectral influence between the two singing genres suggest that Bel Canto emphasizes yi qiang xing zi (ie, phonation drives articulation), while Chinese National Singing emphasizes yi zi xing qiang (ie, articulation drives phonation).

Additional Links: PMID-39955192

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39955192,
year = {2025},
author = {Liu, W and Wang, Y},
title = {Acoustic Characteristics of Tenors and Sopranos in Chinese National Singing and Bel Canto.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2025.01.039},
pmid = {39955192},
issn = {1873-4588},
abstract = {BACKGROUND: With the advancement of vocal arts, Chinese National Singing and Western Classical Singing (Bel Canto) encounter challenges in cross-cultural adaptation. Investigating formant tuning strategies and the singer's formant is crucial for scientifically characterizing the vocal production techniques in Chinese singing styles.

METHOD: Eight singers-Chinese National Singing tenors, Chinese National Singing sopranos, Bel Canto tenors, and Bel Canto sopranos-were recruited. The fundamental frequency (F0), intensity, formants, and long-term average spectrum (LTAS) were analyzed using a series of designed tasks to examine the phonation and articulation characteristics of these two singing genres in the context of cross-cultural adaptation.

RESULTS: A positive correlation between F0 and intensity was generally observed, though variations existed across vowels and singers. Both linear and non-linear relationships were found between F0 and formants. The first formant (F1) was proportional to F0, with greater variability for female singers in the vowel /a/. LTAS analysis revealed that the tenors exhibited the singer's formant in sung vowels and songs, whereas the sopranos did not exhibit this feature when singing vowels but did so in specific songs. Moreover, the primary and secondary spectral peaks in Bel Canto were less influenced by songs compared to Chinese National Singing.

CONCLUSIONS: (i) Intensity can provide an objective basis for differentiating subjective differences between singing genres, and individual differences are evident in how singers handle the relationship between F0 and intensity. (ii) Vowel modification and vowel migration in sopranos reflect consistency and variability across linguistic and cultural contexts. (iii) The presence and characteristics of the singer's formant are influenced by sexes, singing genres, and songs. Differences in the degree of spectral influence between the two singing genres suggest that Bel Canto emphasizes yi qiang xing zi (ie, phonation drives articulation), while Chinese National Singing emphasizes yi zi xing qiang (ie, articulation drives phonation).},
}

RevDate: 2025-02-09

Pan AY, Grail GPO, Albert G, et al (2025)

What Contributes to Masculine Perception of Voice Among Transmasculine People on Testosterone Therapy?.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00471-5 [Epub ahead of print].

Voice is a highly salient and complex signal that people use to categorize another's gender. For transmasculine individuals seeking to align their gender expression with their gender identity, vocal presentation is a major concern. Voice-gender incongruence, where one's voice does not match their gender identity, can lead to vocal strain, fatigue, emotional distress, and increased risk of suicidality. Testosterone therapy, which uses exogenous testosterone to masculinize or androgynize the voice and other secondary sexual characteristics in individuals assigned female at birth, is one method to address this issue. However, many individuals remain dissatisfied with their voice post therapy, indicating that hormonal voice modification is a complex process not fully understood. In the present study, we use unmodified voice samples from 30 transmasculine individuals undergoing testosterone therapy and utilized multivariate analysis to determine the relative and combined effects of four acoustic parameters on two measures of gender perception. The results show that transmasculine individuals' speech is perceived as equally "masculine" as that of cisgender males, with both groups being statistically categorized as male at similar rates. Although mean fundamental frequency and formant-estimated vocal tract length together account for a significant portion of the variance in gender perceptions, a substantial amount of variance in gender perception remains unexplained. Understanding the acoustic and sociolinguistic factors that contribute to masculine voice presentation can lead to more informed and individualized care for transmasculine individuals experiencing voice-gender incongruence and considering testosterone therapy. For this population, addressing voice-gender incongruence has important implications for life satisfaction, quality of life, and self-esteem.

Additional Links: PMID-39924373

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39924373,
year = {2025},
author = {Pan, AY and Grail, GPO and Albert, G and Groll, MD and Stepp, CE and Arnocky, SA and Hodges-Simeon, CR},
title = {What Contributes to Masculine Perception of Voice Among Transmasculine People on Testosterone Therapy?.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.12.037},
pmid = {39924373},
issn = {1873-4588},
abstract = {Voice is a highly salient and complex signal that people use to categorize another's gender. For transmasculine individuals seeking to align their gender expression with their gender identity, vocal presentation is a major concern. Voice-gender incongruence, where one's voice does not match their gender identity, can lead to vocal strain, fatigue, emotional distress, and increased risk of suicidality. Testosterone therapy, which uses exogenous testosterone to masculinize or androgynize the voice and other secondary sexual characteristics in individuals assigned female at birth, is one method to address this issue. However, many individuals remain dissatisfied with their voice post therapy, indicating that hormonal voice modification is a complex process not fully understood. In the present study, we use unmodified voice samples from 30 transmasculine individuals undergoing testosterone therapy and utilized multivariate analysis to determine the relative and combined effects of four acoustic parameters on two measures of gender perception. The results show that transmasculine individuals' speech is perceived as equally "masculine" as that of cisgender males, with both groups being statistically categorized as male at similar rates. Although mean fundamental frequency and formant-estimated vocal tract length together account for a significant portion of the variance in gender perceptions, a substantial amount of variance in gender perception remains unexplained. Understanding the acoustic and sociolinguistic factors that contribute to masculine voice presentation can lead to more informed and individualized care for transmasculine individuals experiencing voice-gender incongruence and considering testosterone therapy. For this population, addressing voice-gender incongruence has important implications for life satisfaction, quality of life, and self-esteem.},
}

RevDate: 2025-01-31

Luo X, Lv J, Liu W, et al (2024)

Double-formant PCF-SPR refractive index sensor with ultra-high double-peak-shift sensitivity and a wide detection range.

Journal of the Optical Society of America. A, Optics, image science, and vision, 41(10):1873-1883.

A dual-resonance-peak photonic crystal fiber-surface plasmon resonance (PCF-SPR) refractive index (RI) sensor is designed for different wavelength ranges. The first resonance peak of the sensor is distributed in the wavelength range of 700-2350 nm, while the second peak is distributed in the range of 2350-5550 nm. In addition to detecting analytes using the full spectrum of constraint losses (CLs), it is also possible to use a single resonance peak to achieve the detection of analytes. By systematically optimizing the nanowire diameter, the diameter of the inner and outer layer air hole, the width of the groove, the polishing depth, and the distance from the outer layer air hole to the fiber core, the optimal structure of the sensor is finally determined. In this study, the sensor was studied by numerical analysis, and the characteristics of the sensor were evaluated by wavelength detection technology. The results show that within the RI range of 1.24-1.37, the sensor has a maximum wavelength sensitivity (WS) of 54700 nm/RIU for detecting the RI of analytes. Within the above refractive index range, the regression coefficient R [2] of the dual-peak-resonance wavelength is 0.99993, ensuring the accuracy of the estimated resonance wavelength of the sensor. In addition, the sensor can also use dual-peak-shift sensitivity (DPSS) to detect the refractive index, which is a relatively new sensing technology. The maximum DPSS of the sensor is 95300 nm/RIU. Due to its high sensitivity and unique dual-peak characteristics, this sensor has wide application prospects in medical diagnosis, environmental monitoring, food safety, and other fields.

Additional Links: PMID-39889010

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39889010,
year = {2024},
author = {Luo, X and Lv, J and Liu, W and Mi, C and Wang, J and Yang, L and Chu, PK and Liu, C},
title = {Double-formant PCF-SPR refractive index sensor with ultra-high double-peak-shift sensitivity and a wide detection range.},
journal = {Journal of the Optical Society of America. A, Optics, image science, and vision},
volume = {41},
number = {10},
pages = {1873-1883},
doi = {10.1364/JOSAA.530505},
pmid = {39889010},
issn = {1520-8532},
abstract = {A dual-resonance-peak photonic crystal fiber-surface plasmon resonance (PCF-SPR) refractive index (RI) sensor is designed for different wavelength ranges. The first resonance peak of the sensor is distributed in the wavelength range of 700-2350 nm, while the second peak is distributed in the range of 2350-5550 nm. In addition to detecting analytes using the full spectrum of constraint losses (CLs), it is also possible to use a single resonance peak to achieve the detection of analytes. By systematically optimizing the nanowire diameter, the diameter of the inner and outer layer air hole, the width of the groove, the polishing depth, and the distance from the outer layer air hole to the fiber core, the optimal structure of the sensor is finally determined. In this study, the sensor was studied by numerical analysis, and the characteristics of the sensor were evaluated by wavelength detection technology. The results show that within the RI range of 1.24-1.37, the sensor has a maximum wavelength sensitivity (WS) of 54700 nm/RIU for detecting the RI of analytes. Within the above refractive index range, the regression coefficient R [2] of the dual-peak-resonance wavelength is 0.99993, ensuring the accuracy of the estimated resonance wavelength of the sensor. In addition, the sensor can also use dual-peak-shift sensitivity (DPSS) to detect the refractive index, which is a relatively new sensing technology. The maximum DPSS of the sensor is 95300 nm/RIU. Due to its high sensitivity and unique dual-peak characteristics, this sensor has wide application prospects in medical diagnosis, environmental monitoring, food safety, and other fields.},
}

RevDate: 2025-01-17

Đinh LG, Brunelle M, TT Tạ (2025)

Relating production and perception in two Raglai dialects at different stages of registrogenesis.

Phonetica [Epub ahead of print].

This paper explores the perception of two diachronically related and mutually intelligible phonological oppositions, the onset voicing contrast of Northern Raglai and the register contrast of Southern Raglai. It is the continuation of a previous acoustic study that revealed that Northern Raglai onset stops maintain a voicing distinction accompanied by weak formant and voice quality modulations on following vowels, while Southern Raglai has transphonologized this voicing contrast into a register contrast marked by vowel and voice quality distinctions. Our findings indicate that the two dialects partially differ in their use of identification cues, Northern Raglai listeners using both voicing and F1 as major cues while Southern Raglai listeners largely focus on F1. Production and perception are thus not perfectly aligned in Northern Raglai, because F1 plays a stronger role in perception than production in this dialect. We conclude that mutual intelligibility between dialects is possible because they both use F1 for identification.

Additional Links: PMID-39824758

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39824758,
year = {2025},
author = {Đinh, LG and Brunelle, M and Tạ, TT},
title = {Relating production and perception in two Raglai dialects at different stages of registrogenesis.},
journal = {Phonetica},
volume = {},
number = {},
pages = {},
pmid = {39824758},
issn = {1423-0321},
abstract = {This paper explores the perception of two diachronically related and mutually intelligible phonological oppositions, the onset voicing contrast of Northern Raglai and the register contrast of Southern Raglai. It is the continuation of a previous acoustic study that revealed that Northern Raglai onset stops maintain a voicing distinction accompanied by weak formant and voice quality modulations on following vowels, while Southern Raglai has transphonologized this voicing contrast into a register contrast marked by vowel and voice quality distinctions. Our findings indicate that the two dialects partially differ in their use of identification cues, Northern Raglai listeners using both voicing and F1 as major cues while Southern Raglai listeners largely focus on F1. Production and perception are thus not perfectly aligned in Northern Raglai, because F1 plays a stronger role in perception than production in this dialect. We conclude that mutual intelligibility between dialects is possible because they both use F1 for identification.},
}

RevDate: 2025-01-08

Jv X, Wu J, Mao Q, et al (2024)

Development on Light and Thin Broadband Sound Absorption Structure Based on Unequal-Cross-Section Microperforated Plate Series Connection.

Materials (Basel, Switzerland), 17(24):.

The sound absorption structure of a microperforated plate has many advantages and has great potential in the field of noise control. In order to solve the problem of broadband sound absorption of microperforated plates, a series acoustic structure of microperforated plates of unequal cross-section was designed based on the traditional microperforated plate series acoustic structure. Compared with the traditional series structure, the sudden change of cross-section increases the sound energy dissipation and greatly improves the sound absorption performance. Through the analysis of its parameters, when the overall thickness of the structure is 20 mm, its sound absorption coefficient is above 0.5 in the frequency range of 1000-3450 Hz; there are three formants, and the sound absorption coefficients corresponding to the three formants reach 1. This study provides new ideas and methods for the design of broadband acoustic structures.

Additional Links: PMID-39769881

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39769881,
year = {2024},
author = {Jv, X and Wu, J and Mao, Q and Li, Q and Zhang, T},
title = {Development on Light and Thin Broadband Sound Absorption Structure Based on Unequal-Cross-Section Microperforated Plate Series Connection.},
journal = {Materials (Basel, Switzerland)},
volume = {17},
number = {24},
pages = {},
pmid = {39769881},
issn = {1996-1944},
support = {51965041//National Natural Science Foundation of China/ ; YC2022-s735//Jiangxi Postgraduate Innovation Special Fund Project/ ; },
abstract = {The sound absorption structure of a microperforated plate has many advantages and has great potential in the field of noise control. In order to solve the problem of broadband sound absorption of microperforated plates, a series acoustic structure of microperforated plates of unequal cross-section was designed based on the traditional microperforated plate series acoustic structure. Compared with the traditional series structure, the sudden change of cross-section increases the sound energy dissipation and greatly improves the sound absorption performance. Through the analysis of its parameters, when the overall thickness of the structure is 20 mm, its sound absorption coefficient is above 0.5 in the frequency range of 1000-3450 Hz; there are three formants, and the sound absorption coefficients corresponding to the three formants reach 1. This study provides new ideas and methods for the design of broadband acoustic structures.},
}

RevDate: 2025-01-07
CmpDate: 2025-01-07

Caragli V, Zacheo E, Nodari R, et al (2024)

Effects of face protector devices on acoustic parameters of voice.

Acta otorhinolaryngologica Italica : organo ufficiale della Societa italiana di otorinolaringologia e chirurgia cervico-facciale, 44(6):377-391.

OBJECTIVES: The SARS-CoV-2 pandemic required the use of personal protective equipment (PPE) in medical and social contexts to reduce exposure and prevent pathogen transmission. This study aims to analyse possible changes in voice and speech parameters with and without PPE.

METHODS: Speech samples using different types of PPE were obtained. Recordings were then analysed using PRAAT software (version 6.1.42). Statistical analysis was conducted using ANOVA in Jamovi software. A post-hoc test was performed to compare PPE-related results.

RESULTS: Statistically significant differences were found in Cepstral Peak of Prominence-Smoothed, Harmonic to Noise Ratio (HNR), slope of Long-Term Average Spectrum (LTAS), tilt of trendline through LTAS, shimmer parameters, HNR mean and standard deviation of vowels, vowels and consonants formants. HNR values increased whereas shimmer parameters and formant values reduced using PPE [PPE combined>filtering face piece (FFP)> surgical masks>no PPE].

CONCLUSIONS: Our data show improvement in many parameters of voice and speech quality and modification of speech articulation when using masks, particularly in case of combined PPE. The most relevant changes were found with a combination of face shield and FFP2 masks. This may be due to unconscious improvements in speech articulation and increased demand on vocal folds to achieve better speech intelligibility.

Additional Links: PMID-39763462

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39763462,
year = {2024},
author = {Caragli, V and Zacheo, E and Nodari, R and Genovese, E and Mancuso, A and Mazzoni, L},
title = {Effects of face protector devices on acoustic parameters of voice.},
journal = {Acta otorhinolaryngologica Italica : organo ufficiale della Societa italiana di otorinolaringologia e chirurgia cervico-facciale},
volume = {44},
number = {6},
pages = {377-391},
pmid = {39763462},
issn = {1827-675X},
mesh = {Humans ; *COVID-19/prevention & control/transmission ; Male ; Adult ; Female ; *Personal Protective Equipment ; *Voice Quality ; *Speech Acoustics ; Masks ; Young Adult ; Middle Aged ; Voice ; },
abstract = {OBJECTIVES: The SARS-CoV-2 pandemic required the use of personal protective equipment (PPE) in medical and social contexts to reduce exposure and prevent pathogen transmission. This study aims to analyse possible changes in voice and speech parameters with and without PPE.

METHODS: Speech samples using different types of PPE were obtained. Recordings were then analysed using PRAAT software (version 6.1.42). Statistical analysis was conducted using ANOVA in Jamovi software. A post-hoc test was performed to compare PPE-related results.

RESULTS: Statistically significant differences were found in Cepstral Peak of Prominence-Smoothed, Harmonic to Noise Ratio (HNR), slope of Long-Term Average Spectrum (LTAS), tilt of trendline through LTAS, shimmer parameters, HNR mean and standard deviation of vowels, vowels and consonants formants. HNR values increased whereas shimmer parameters and formant values reduced using PPE [PPE combined>filtering face piece (FFP)> surgical masks>no PPE].

CONCLUSIONS: Our data show improvement in many parameters of voice and speech quality and modification of speech articulation when using masks, particularly in case of combined PPE. The most relevant changes were found with a combination of face shield and FFP2 masks. This may be due to unconscious improvements in speech articulation and increased demand on vocal folds to achieve better speech intelligibility.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*COVID-19/prevention & control/transmission
Male
Adult
Female
*Personal Protective Equipment
*Voice Quality
*Speech Acoustics
Masks
Young Adult
Middle Aged
Voice

RevDate: 2024-12-31
CmpDate: 2024-12-31

Hu Z, Zhang Z, Li H, et al (2024)

Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.

Behavior research methods, 57(1):35.

In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.

Additional Links: PMID-39738817

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39738817,
year = {2024},
author = {Hu, Z and Zhang, Z and Li, H and Yang, LZ},
title = {Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.},
journal = {Behavior research methods},
volume = {57},
number = {1},
pages = {35},
pmid = {39738817},
issn = {1554-3528},
support = {82371931//Natural Science Fund of China/ ; YZJJ202207-TS//HFIPS Director's Fund/ ; 202204295107020004//Anhui Province Key Research and Development Project/ ; },
mesh = {Humans ; Reproducibility of Results ; Male ; Female ; Adult ; Young Adult ; *Speech Acoustics ; Smartphone ; Computers, Handheld ; Speech/physiology ; },
abstract = {In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Reproducibility of Results
Male
Female
Adult
Young Adult
*Speech Acoustics
Smartphone
Computers, Handheld
Speech/physiology

RevDate: 2025-01-04

Lobmaier JS, Klatt WK, SR Schweinberger (2024)

Voice of a woman: influence of interaction partner characteristics on cycle dependent vocal changes in women.

Frontiers in psychology, 15:1401158.

INTRODUCTION: Research has shown that women's vocal characteristics change during the menstrual cycle. Further, evidence suggests that individuals alter their voices depending on the context, such as when speaking to a highly attractive person, or a person with a different social status. The present study aimed at investigating the degree to which women's voices change depending on the vocal characteristics of the interaction partner, and how any such changes are modulated by the woman's current menstrual cycle phase.

METHODS: Forty-two naturally cycling women were recorded once during the late follicular phase (high fertility) and once during the luteal phase (low fertility) while reproducing utterances of men and women who were previously assessed to have either attractive or unattractive voices.

RESULTS: Phonetic analyses revealed that women's voices in response to speakers changed depending on their menstrual cycle phase (F0 variation, maximum F0, Centre of gravity) and depending on the stimulus speaker's vocal attractiveness (HNR, Formants 1-3, Centre of gravity), and sex (Formant 2). Also, the vocal characteristics differed when reproducing spoken sentences of the stimulus speakers compared to when they read out written sentences (minimum F0, Formants 2-4).

DISCUSSION: These results provide further evidence that women alter their voice depending on the vocal characteristics of the interaction partner and that these changes are modulated by the menstrual cycle phase. Specifically, the present findings suggest that cyclic shifts on women's voices may occur only in social contexts (i.e., when a putative interaction partner is involved).

Additional Links: PMID-39734777

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39734777,
year = {2024},
author = {Lobmaier, JS and Klatt, WK and Schweinberger, SR},
title = {Voice of a woman: influence of interaction partner characteristics on cycle dependent vocal changes in women.},
journal = {Frontiers in psychology},
volume = {15},
number = {},
pages = {1401158},
pmid = {39734777},
issn = {1664-1078},
abstract = {INTRODUCTION: Research has shown that women's vocal characteristics change during the menstrual cycle. Further, evidence suggests that individuals alter their voices depending on the context, such as when speaking to a highly attractive person, or a person with a different social status. The present study aimed at investigating the degree to which women's voices change depending on the vocal characteristics of the interaction partner, and how any such changes are modulated by the woman's current menstrual cycle phase.

METHODS: Forty-two naturally cycling women were recorded once during the late follicular phase (high fertility) and once during the luteal phase (low fertility) while reproducing utterances of men and women who were previously assessed to have either attractive or unattractive voices.

RESULTS: Phonetic analyses revealed that women's voices in response to speakers changed depending on their menstrual cycle phase (F0 variation, maximum F0, Centre of gravity) and depending on the stimulus speaker's vocal attractiveness (HNR, Formants 1-3, Centre of gravity), and sex (Formant 2). Also, the vocal characteristics differed when reproducing spoken sentences of the stimulus speakers compared to when they read out written sentences (minimum F0, Formants 2-4).

DISCUSSION: These results provide further evidence that women alter their voice depending on the vocal characteristics of the interaction partner and that these changes are modulated by the menstrual cycle phase. Specifically, the present findings suggest that cyclic shifts on women's voices may occur only in social contexts (i.e., when a putative interaction partner is involved).},
}

RevDate: 2024-12-25

Xiu N, Liu L, Li W, et al (2024)

Correlation Analysis Between Cortical Structural Features and Acoustic Features in Patients With Parkinson's Disease.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00426-0 [Epub ahead of print].

PURPOSE: Parkinson disease (PD) is a progressive neurodegenerative disease. The aim of this study is to investigate the association between acoustic and cortical brain features in Parkinson's disease patients.

METHODS: We recruited 19 (eight females, 11 males) Parkinson's disease patients and 19 (eight females, 11 males) healthy subjects to participate in the experiment. Speech samples of three vowels (/i/, /a/, /u/), six plosives (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/), and three voiced consonants (/l/, /m/, /n/) were collected for the experiment, and the acoustic parameters were extracted for fundamental frequency (F0), voice onset time (VOT), voicing onset-vocalic voicing onset (VO-VVO), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), Jitter, Shimmer, and Harmonics-to-noise ratio (HNR). We also used Ingenia CX 3.0 T to complete the cranial magnetic resonance scanning and did image processing based on the Desikan-Killiany-Tourville Atlas. We assessed the differences in acoustic and neuroimaging parameters between the PD and healthy controls (HCs) groups using the Levene's test (LT), two-sample independent t test (TT), and Mann-Whitney U test (MWUT), and calculated Spearman's bias correlations for acoustic and neuroimaging parameters in the PD and HC groups, respectively.

RESULTS: The results showed that in acoustic features, based on the results of the TT, it can be seen that the F3 of the PD group regarding the vowel /i/ is significantly smaller than that of the HC group. The jitter on the vowel /u/ was significantly higher in the male PD group than in the male HC group. For other acoustic measures, there were no statistically significant differences between the two groups. In the cortical features, the thickness, area, and volume of the cortex were reduced in the vast majority of the brains of the PD patients, however, there is also a small portion of the cortex that appears to be thickened. In the correlation analysis between cortical and acoustic features, F0, F1, F2, F3, B2, B3, VO-VVO, Jitter, HNR, and VOT acoustic parameters showed significant and strong correlation with thickness, area, and volume of cortical sites such as frontal, temporal, entorhinal, fusiform, and precuneus in PD patients, whereas no significant correlation was found in HC group.

CONCLUSIONS: This suggests that Parkinson's disease does have an effect on the acoustic and cortical features of the patient's brain, and that there is a correlation between the two features.

Additional Links: PMID-39721882

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39721882,
year = {2024},
author = {Xiu, N and Liu, L and Li, W and Cai, Z and Wang, Y and Wang, R and Vaxelaire, B and Sock, R and Ling, Z and Chen, J},
title = {Correlation Analysis Between Cortical Structural Features and Acoustic Features in Patients With Parkinson's Disease.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.11.042},
pmid = {39721882},
issn = {1873-4588},
abstract = {PURPOSE: Parkinson disease (PD) is a progressive neurodegenerative disease. The aim of this study is to investigate the association between acoustic and cortical brain features in Parkinson's disease patients.

METHODS: We recruited 19 (eight females, 11 males) Parkinson's disease patients and 19 (eight females, 11 males) healthy subjects to participate in the experiment. Speech samples of three vowels (/i/, /a/, /u/), six plosives (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/), and three voiced consonants (/l/, /m/, /n/) were collected for the experiment, and the acoustic parameters were extracted for fundamental frequency (F0), voice onset time (VOT), voicing onset-vocalic voicing onset (VO-VVO), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), Jitter, Shimmer, and Harmonics-to-noise ratio (HNR). We also used Ingenia CX 3.0 T to complete the cranial magnetic resonance scanning and did image processing based on the Desikan-Killiany-Tourville Atlas. We assessed the differences in acoustic and neuroimaging parameters between the PD and healthy controls (HCs) groups using the Levene's test (LT), two-sample independent t test (TT), and Mann-Whitney U test (MWUT), and calculated Spearman's bias correlations for acoustic and neuroimaging parameters in the PD and HC groups, respectively.

RESULTS: The results showed that in acoustic features, based on the results of the TT, it can be seen that the F3 of the PD group regarding the vowel /i/ is significantly smaller than that of the HC group. The jitter on the vowel /u/ was significantly higher in the male PD group than in the male HC group. For other acoustic measures, there were no statistically significant differences between the two groups. In the cortical features, the thickness, area, and volume of the cortex were reduced in the vast majority of the brains of the PD patients, however, there is also a small portion of the cortex that appears to be thickened. In the correlation analysis between cortical and acoustic features, F0, F1, F2, F3, B2, B3, VO-VVO, Jitter, HNR, and VOT acoustic parameters showed significant and strong correlation with thickness, area, and volume of cortical sites such as frontal, temporal, entorhinal, fusiform, and precuneus in PD patients, whereas no significant correlation was found in HC group.

CONCLUSIONS: This suggests that Parkinson's disease does have an effect on the acoustic and cortical features of the patient's brain, and that there is a correlation between the two features.},
}

RevDate: 2025-01-04

Song J, Kim H, YO Lee (2024)

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.

Heliyon, 10(24):e40748.

INTRODUCTION: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

PROBLEM STATEMENT: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

OBJECTIVES: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

METHODS: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

RESULTS: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

CONCLUSION: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.

Additional Links: PMID-39720068

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39720068,
year = {2024},
author = {Song, J and Kim, H and Lee, YO},
title = {Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.},
journal = {Heliyon},
volume = {10},
number = {24},
pages = {e40748},
pmid = {39720068},
issn = {2405-8440},
abstract = {INTRODUCTION: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes.

PROBLEM STATEMENT: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC.

OBJECTIVES: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories.

METHODS: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features.

RESULTS: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge.

CONCLUSION: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.},
}

RevDate: 2025-01-04
CmpDate: 2024-12-10

Cavalcanti JC, Eriksson A, Barbosa PA, et al (2024)

Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.

PloS one, 19(12):e0311363.

Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants-resonance peaks in the vocal tract-in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants 'F3 + F4' outperformed the combination of lower formants 'F1 + F2'. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the 'F1 + F2' relation was concerned.

Additional Links: PMID-39656685

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39656685,
year = {2024},
author = {Cavalcanti, JC and Eriksson, A and Barbosa, PA and Madureira, S},
title = {Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.},
journal = {PloS one},
volume = {19},
number = {12},
pages = {e0311363},
pmid = {39656685},
issn = {1932-6203},
mesh = {Humans ; Male ; Adult ; *Speech/physiology ; Speech Acoustics ; Phonetics ; Likelihood Functions ; Young Adult ; Speech Production Measurement/methods ; Language ; },
abstract = {Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants-resonance peaks in the vocal tract-in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants 'F3 + F4' outperformed the combination of lower formants 'F1 + F2'. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the 'F1 + F2' relation was concerned.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Adult
*Speech/physiology
Speech Acoustics
Phonetics
Likelihood Functions
Young Adult
Speech Production Measurement/methods
Language

RevDate: 2024-12-10
CmpDate: 2024-12-10

Cervantes Constantino F, Á Caputi (2024)

Cortical tracking of speakers' spectral changes predicts selective listening.

Cerebral cortex (New York, N.Y. : 1991), 34(12):.

A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. The spectral spacing between neighboring instantaneous frequency contours (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.

Additional Links: PMID-39656649

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39656649,
year = {2024},
author = {Cervantes Constantino, F and Caputi, Á},
title = {Cortical tracking of speakers' spectral changes predicts selective listening.},
journal = {Cerebral cortex (New York, N.Y. : 1991)},
volume = {34},
number = {12},
pages = {},
doi = {10.1093/cercor/bhae472},
pmid = {39656649},
issn = {1460-2199},
support = {FCE_1_2019_1_155889//Agencia Nacional de Investigación e Innovación/ ; },
mesh = {Humans ; Male ; Female ; *Speech Perception/physiology ; Adult ; *Electroencephalography/methods ; Young Adult ; Cerebral Cortex/physiology ; Acoustic Stimulation/methods ; },
abstract = {A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. The spectral spacing between neighboring instantaneous frequency contours (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
*Speech Perception/physiology
Adult
*Electroencephalography/methods
Young Adult
Cerebral Cortex/physiology
Acoustic Stimulation/methods

RevDate: 2024-12-12

Heiszenberger E, Reinisch E, Hartmann F, et al (2024)

Perceptually Easy Second-Language Phones Are Not Always Easy: The Role of Orthography and Phonology in Schwa Realization in Second-Language French.

Language and speech [Epub ahead of print].

Encoding and establishing a new second-language (L2) phonological category is notoriously difficult. This is particularly true for phonological contrasts that do not exist in the learners' native language (L1). Phonological categories that also exist in the L1 do not seem to pose any problems. However, foreign-language learners are not only presented with oral input. Instructed L2 learning often involves heavy reliance on written forms of the target language. The present study investigates the contribution of orthography to the quality of phonolexical encoding by examining the acoustics of French schwa by Austrian German learners-a perceptually and articulatorily easy L2 phone with incongruent grapheme-phoneme correspondences between the L1 and L2. We compared production patterns in an auditory word-repetition task (without orthographic input) with those in a word-reading task. We analyzed the formant values (F1, F2, F3) of the schwa realizations of two groups of Austrian high-school students who had been learning French for 1 and 6 years. The results show that production patterns are more likely to be affected by L1 grapheme-to-phoneme correspondences when orthographic input is present. However, orthography does not appear to play the dominant role, as L2 development patterns are strongly determined by both the speaker and especially the lexical item, suggesting a highly complex interaction of multiple internal and external factors in the establishment of L2 phonological categories beyond orthography and phonology.

Additional Links: PMID-39665279

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39665279,
year = {2024},
author = {Heiszenberger, E and Reinisch, E and Hartmann, F and Brown, E and Pustka, E},
title = {Perceptually Easy Second-Language Phones Are Not Always Easy: The Role of Orthography and Phonology in Schwa Realization in Second-Language French.},
journal = {Language and speech},
volume = {},
number = {},
pages = {238309241277995},
doi = {10.1177/00238309241277995},
pmid = {39665279},
issn = {1756-6053},
abstract = {Encoding and establishing a new second-language (L2) phonological category is notoriously difficult. This is particularly true for phonological contrasts that do not exist in the learners' native language (L1). Phonological categories that also exist in the L1 do not seem to pose any problems. However, foreign-language learners are not only presented with oral input. Instructed L2 learning often involves heavy reliance on written forms of the target language. The present study investigates the contribution of orthography to the quality of phonolexical encoding by examining the acoustics of French schwa by Austrian German learners-a perceptually and articulatorily easy L2 phone with incongruent grapheme-phoneme correspondences between the L1 and L2. We compared production patterns in an auditory word-repetition task (without orthographic input) with those in a word-reading task. We analyzed the formant values (F1, F2, F3) of the schwa realizations of two groups of Austrian high-school students who had been learning French for 1 and 6 years. The results show that production patterns are more likely to be affected by L1 grapheme-to-phoneme correspondences when orthographic input is present. However, orthography does not appear to play the dominant role, as L2 development patterns are strongly determined by both the speaker and especially the lexical item, suggesting a highly complex interaction of multiple internal and external factors in the establishment of L2 phonological categories beyond orthography and phonology.},
}

RevDate: 2024-12-09
CmpDate: 2024-12-06

Fadeev KA, Romero Reyes IV, Goiaeva DE, et al (2024)

Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism.

Journal of neurodevelopmental disorders, 16(1):67.

BACKGROUND: Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory "object".

METHODS: We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio.

RESULTS: SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150-200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise.

CONCLUSIONS: These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.

Additional Links: PMID-39643915

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39643915,
year = {2024},
author = {Fadeev, KA and Romero Reyes, IV and Goiaeva, DE and Obukhova, TS and Ovsiannikova, TM and Prokofyev, AO and Rytikova, AM and Novikov, AY and Kozunov, VV and Stroganova, TA and Orekhova, EV},
title = {Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism.},
journal = {Journal of neurodevelopmental disorders},
volume = {16},
number = {1},
pages = {67},
pmid = {39643915},
issn = {1866-1955},
mesh = {Humans ; Male ; *Speech Perception/physiology ; *Magnetoencephalography ; Child ; *Temporal Lobe/physiopathology ; *Noise ; Acoustic Stimulation ; Evoked Potentials, Auditory/physiology ; Autism Spectrum Disorder/physiopathology/complications ; Adolescent ; Auditory Cortex/physiopathology ; Autistic Disorder/physiopathology/complications ; },
abstract = {BACKGROUND: Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory "object".

METHODS: We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio.

RESULTS: SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150-200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise.

CONCLUSIONS: These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
*Speech Perception/physiology
*Magnetoencephalography
Child
*Temporal Lobe/physiopathology
*Noise
Acoustic Stimulation
Evoked Potentials, Auditory/physiology
Autism Spectrum Disorder/physiopathology/complications
Adolescent
Auditory Cortex/physiopathology
Autistic Disorder/physiopathology/complications

RevDate: 2024-11-28
CmpDate: 2024-11-28

Xie B, Li Z, Wang H, et al (2024)

[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].

Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery, 38(12):1149-1153.

Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels（/a/, /(?)/, /i/ and /u/） and different sound intensities（lowest sound, comfort sound, highest true sound and highest falsetto sound） were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.

Additional Links: PMID-39605265

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39605265,
year = {2024},
author = {Xie, B and Li, Z and Wang, H and Kuang, X and Ni, W and Zhong, R and Li, Y},
title = {[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].},
journal = {Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery},
volume = {38},
number = {12},
pages = {1149-1153},
doi = {10.13201/j.issn.2096-7993.2024.12.011},
pmid = {39605265},
issn = {2096-7993},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Acoustics ; Voice Quality ; Phonetics ; Voice/physiology ; Adult ; },
abstract = {Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels（/a/, /(?)/, /i/ and /u/） and different sound intensities（lowest sound, comfort sound, highest true sound and highest falsetto sound） were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Young Adult
*Speech Acoustics
Voice Quality
Phonetics
Voice/physiology
Adult

RevDate: 2025-01-02
CmpDate: 2025-01-02

Fagniart S, Delvaux V, Harmegnies B, et al (2025)

Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.

Journal of speech, language, and hearing research : JSLHR, 68(1):301-322.

PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.

METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.

RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.

CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.

Additional Links: PMID-39589237

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39589237,
year = {2025},
author = {Fagniart, S and Delvaux, V and Harmegnies, B and Huberlant, A and Huet, K and Piccaluga, M and Watterman, I and Charlier, B},
title = {Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {68},
number = {1},
pages = {301-322},
doi = {10.1044/2024_JSLHR-24-00083},
pmid = {39589237},
issn = {1558-9102},
mesh = {Humans ; *Cochlear Implants ; Female ; Male ; Child ; *Speech Acoustics ; *Phonetics ; *Cues ; *Speech Perception/physiology ; *Judgment ; Speech Production Measurement/methods ; Speech/physiology ; Nose/physiology ; Deafness/rehabilitation ; },
abstract = {PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.

METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.

RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.

CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Cochlear Implants
Female
Male
Child
*Speech Acoustics
*Phonetics
*Cues
*Speech Perception/physiology
*Judgment
Speech Production Measurement/methods
Speech/physiology
Nose/physiology
Deafness/rehabilitation

RevDate: 2024-11-16

Bøyesen B, Ø Hide (2024)

Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00363-1 [Epub ahead of print].

OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.

METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.

RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.

CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.

Additional Links: PMID-39550323

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39550323,
year = {2024},
author = {Bøyesen, B and Hide, Ø},
title = {Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.10.020},
pmid = {39550323},
issn = {1873-4588},
abstract = {OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.

METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.

RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.

CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.},
}

RevDate: 2024-11-12
CmpDate: 2024-11-12

Ponsonnet M, Coupé C, Pellegrino F, et al (2024)

Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).

The Journal of the Acoustical Society of America, 156(5):3118-3139.

In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.

Additional Links: PMID-39531311

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39531311,
year = {2024},
author = {Ponsonnet, M and Coupé, C and Pellegrino, F and Garcia Arasco, A and Pisanski, K},
title = {Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {5},
pages = {3118-3139},
doi = {10.1121/10.0032454},
pmid = {39531311},
issn = {1520-8524},
mesh = {Humans ; *Emotions ; Phonetics ; Language ; Speech Acoustics ; Pain/psychology ; Voice Quality ; Happiness ; },
abstract = {In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Emotions
Phonetics
Language
Speech Acoustics
Pain/psychology
Voice Quality
Happiness

RevDate: 2024-11-16
CmpDate: 2024-11-08

Carranante G, Cany C, Farri P, et al (2024)

Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.

Scientific reports, 14(1):27183.

Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.

Additional Links: PMID-39516258

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39516258,
year = {2024},
author = {Carranante, G and Cany, C and Farri, P and Giavazzi, M and Varnet, L},
title = {Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {27183},
pmid = {39516258},
issn = {2045-2322},
support = {ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-17-EURE-0017//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; },
mesh = {Humans ; *Speech Perception/physiology ; Female ; Male ; *Noise ; Adult ; *Phonetics ; Young Adult ; Language ; Cues ; Speech Acoustics ; France ; Acoustic Stimulation ; },
abstract = {Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Perception/physiology
Female
Male
*Noise
Adult
*Phonetics
Young Adult
Language
Cues
Speech Acoustics
France
Acoustic Stimulation

RevDate: 2025-01-05
CmpDate: 2024-11-08

Lin YC, Yan HT, Lin CH, et al (2024)

Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.

Journal of medical Internet research, 26:e58466.

BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.

OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.

METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.

RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.

CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.

Additional Links: PMID-39515817

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39515817,
year = {2024},
author = {Lin, YC and Yan, HT and Lin, CH and Chang, HH},
title = {Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.},
journal = {Journal of medical Internet research},
volume = {26},
number = {},
pages = {e58466},
pmid = {39515817},
issn = {1438-8871},
mesh = {Humans ; Aged ; Cross-Sectional Studies ; *Frailty/physiopathology ; Male ; Female ; *Phenotype ; *Biomarkers ; Middle Aged ; Voice/physiology ; Aged, 80 and over ; Taiwan ; Frail Elderly/statistics & numerical data ; Sarcopenia/physiopathology/diagnosis ; },
abstract = {BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.

OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.

METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.

RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.

CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Aged
Cross-Sectional Studies
*Frailty/physiopathology
Male
Female
*Phenotype
*Biomarkers
Middle Aged
Voice/physiology
Aged, 80 and over
Taiwan
Frail Elderly/statistics & numerical data
Sarcopenia/physiopathology/diagnosis

RevDate: 2025-01-04
CmpDate: 2024-12-16

Hullebus M, Gafos A, Boll-Avetisyan N, et al (2025)

Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.

Infancy : the official journal of the International Society on Infant Studies, 30(1):e12630.

Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.

Additional Links: PMID-39487102

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39487102,
year = {2025},
author = {Hullebus, M and Gafos, A and Boll-Avetisyan, N and Langus, A and Fritzsche, T and Höhle, B},
title = {Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.},
journal = {Infancy : the official journal of the International Society on Infant Studies},
volume = {30},
number = {1},
pages = {e12630},
pmid = {39487102},
issn = {1532-7078},
support = {317633480 - SFB 1287//Deutsche Forschungsgemeinschaft/ ; },
mesh = {Humans ; *Cues ; *Speech Perception ; *Phonetics ; Male ; Female ; Infant ; Speech Acoustics ; Adult ; Acoustic Stimulation ; Language Development ; },
abstract = {Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Cues
*Speech Perception
*Phonetics
Male
Female
Infant
Speech Acoustics
Adult
Acoustic Stimulation
Language Development

RevDate: 2024-10-30

Ayadi H, Elbéji A, Despotovic V, et al (2024)

Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.

Digital biomarkers, 8(1):159-170.

INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.

METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.

RESULTS: A total of 1,332‬ participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.

CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.

Additional Links: PMID-39473806

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39473806,
year = {2024},
author = {Ayadi, H and Elbéji, A and Despotovic, V and Fagherazzi, G},
title = {Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.},
journal = {Digital biomarkers},
volume = {8},
number = {1},
pages = {159-170},
pmid = {39473806},
issn = {2504-110X},
abstract = {INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.

METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.

RESULTS: A total of 1,332‬ participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.

CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.},
}

RevDate: 2025-01-09
CmpDate: 2024-11-22

Li JJ, Daliri A, Kim KS, et al (2024)

Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.

Neuroscience letters, 843:138025.

Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).

Additional Links: PMID-39461704

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39461704,
year = {2024},
author = {Li, JJ and Daliri, A and Kim, KS and Max, L},
title = {Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.},
journal = {Neuroscience letters},
volume = {843},
number = {},
pages = {138025},
pmid = {39461704},
issn = {1872-7972},
support = {R01 DC014510/DC/NIDCD NIH HHS/United States ; R01 DC017444/DC/NIDCD NIH HHS/United States ; R01 DC020162/DC/NIDCD NIH HHS/United States ; R01 DC020707/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; Male ; Female ; *Speech/physiology ; Adult ; Young Adult ; *Electroencephalography/methods ; *Speech Perception/physiology ; Auditory Cortex/physiology ; Acoustic Stimulation/methods ; Movement/physiology ; Auditory Perception/physiology ; },
abstract = {Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
*Speech/physiology
Adult
Young Adult
*Electroencephalography/methods
*Speech Perception/physiology
Auditory Cortex/physiology
Acoustic Stimulation/methods
Movement/physiology
Auditory Perception/physiology

RevDate: 2024-10-24

Pekdemir A, Kemaloğlu YK, Gölaç H, et al (2024)

The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00309-6 [Epub ahead of print].

PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.

METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.

RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.

CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.

Additional Links: PMID-39448279

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39448279,
year = {2024},
author = {Pekdemir, A and Kemaloğlu, YK and Gölaç, H and İriz, A and Köktürk, O and Mengü, G},
title = {The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.018},
pmid = {39448279},
issn = {1873-4588},
abstract = {PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.

METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.

RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.

CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.},
}

RevDate: 2024-11-04
CmpDate: 2024-10-24

Feng S, X Jiang (2024)

Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.

The Journal of the Acoustical Society of America, 156(4):2860-2876.

Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.

Additional Links: PMID-39445770

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39445770,
year = {2024},
author = {Feng, S and Jiang, X},
title = {Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2860-2876},
doi = {10.1121/10.0032400},
pmid = {39445770},
issn = {1520-8524},
mesh = {Adult ; Female ; Humans ; Male ; Intention ; *Language ; Multilingualism ; Phonetics ; *Speech Acoustics ; *Speech Perception ; },
abstract = {Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Adult
Female
Humans
Male
Intention
*Language
Multilingualism
Phonetics
*Speech Acoustics
*Speech Perception

RevDate: 2024-12-02
CmpDate: 2024-12-02

Persson A (2024)

The acoustic characteristics of Swedish vowels.

Phonetica, 81(6):599-643.

The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.

Additional Links: PMID-39443329

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39443329,
year = {2024},
author = {Persson, A},
title = {The acoustic characteristics of Swedish vowels.},
journal = {Phonetica},
volume = {81},
number = {6},
pages = {599-643},
pmid = {39443329},
issn = {1423-0321},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; Sweden ; *Language ; Speech Perception ; Sound Spectrography ; Female ; Male ; Cues ; Adult ; },
abstract = {The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
*Phonetics
Sweden
*Language
Speech Perception
Sound Spectrography
Female
Male
Cues
Adult

RevDate: 2024-10-22

Martínez-Olalla R, Hidalgo-De la Guía I, Gayarzábal-Heinze E, et al (2024)

Analysis of Voice Quality in Children With Smith-Magenis Syndrome.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00319-9 [Epub ahead of print].

UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.

OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.

METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.

RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.

CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.

Additional Links: PMID-39438167

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39438167,
year = {2024},
author = {Martínez-Olalla, R and Hidalgo-De la Guía, I and Gayarzábal-Heinze, E and Fernández-Ruiz, R and Núñez-Vidal, E and Álvarez-Marquina, A and Palacios-Alonso, D},
title = {Analysis of Voice Quality in Children With Smith-Magenis Syndrome.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.026},
pmid = {39438167},
issn = {1873-4588},
abstract = {UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.

OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.

METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.

RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.

CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.},
}

RevDate: 2024-11-17
CmpDate: 2024-11-07

Krakauer J, Naber C, Niziolek CA, et al (2024)

Divided Attention Has Limited Effects on Speech Sensorimotor Control.

Journal of speech, language, and hearing research : JSLHR, 67(11):4358-4368.

PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.

METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.

RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.

CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.

Additional Links: PMID-39418590

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39418590,
year = {2024},
author = {Krakauer, J and Naber, C and Niziolek, CA and Parrell, B},
title = {Divided Attention Has Limited Effects on Speech Sensorimotor Control.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {11},
pages = {4358-4368},
pmid = {39418590},
issn = {1558-9102},
support = {R01 DC017091/DC/NIDCD NIH HHS/United States ; R01 DC019134/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Attention/physiology ; Male ; Female ; Young Adult ; *Speech/physiology ; Adult ; Feedback, Sensory/physiology ; Adaptation, Physiological/physiology ; Speech Perception/physiology ; Visual Perception/physiology ; Adolescent ; },
abstract = {PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.

METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.

RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.

CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Attention/physiology
Male
Female
Young Adult
*Speech/physiology
Adult
Feedback, Sensory/physiology
Adaptation, Physiological/physiology
Speech Perception/physiology
Visual Perception/physiology
Adolescent

RevDate: 2024-10-16

He Y, Wang X, Huang T, et al (2024)

The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00334-5 [Epub ahead of print].

The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.

Additional Links: PMID-39414424

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39414424,
year = {2024},
author = {He, Y and Wang, X and Huang, T and Zhao, W and Fu, Z and Zheng, Q and Jin, L and Kim, H and Liu, H},
title = {The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.041},
pmid = {39414424},
issn = {1873-4588},
abstract = {The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.},
}

RevDate: 2024-10-16

Wang Y, Y Zhao (2024)

Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00316-3 [Epub ahead of print].

OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.

METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).

RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.

CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.

Additional Links: PMID-39414423

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39414423,
year = {2024},
author = {Wang, Y and Zhao, Y},
title = {Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.022},
pmid = {39414423},
issn = {1873-4588},
abstract = {OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.

METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).

RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.

CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.},
}

RevDate: 2024-10-14
CmpDate: 2024-10-14

Clopper CG (2024)

Dynamic acoustic vowel distances within and across dialects.

The Journal of the Acoustical Society of America, 156(4):2497-2507.

Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.

Additional Links: PMID-39400271

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39400271,
year = {2024},
author = {Clopper, CG},
title = {Dynamic acoustic vowel distances within and across dialects.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2497-2507},
doi = {10.1121/10.0032385},
pmid = {39400271},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; *Speech Production Measurement/methods ; Voice Quality ; Acoustics ; Female ; Male ; Time Factors ; Language ; Sound Spectrography ; Adult ; },
abstract = {Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
*Phonetics
*Speech Production Measurement/methods
Voice Quality
Acoustics
Female
Male
Time Factors
Language
Sound Spectrography
Adult

RevDate: 2024-11-10

Ozkan Atak HB, Aslan F, Sennaroglu G, et al (2024)

Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.

Audiology & neuro-otology pii:000541716 [Epub ahead of print].

INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABIs) throughout the first 3 years of life are difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.

METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development-Third Edition (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.

RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.

CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model, and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.

Additional Links: PMID-39396508

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39396508,
year = {2024},
author = {Ozkan Atak, HB and Aslan, F and Sennaroglu, G and Sennaroglu, L},
title = {Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.},
journal = {Audiology & neuro-otology},
volume = {},
number = {},
pages = {1-12},
doi = {10.1159/000541716},
pmid = {39396508},
issn = {1421-9700},
abstract = {INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABIs) throughout the first 3 years of life are difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.

METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development-Third Edition (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.

RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.

CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model, and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.},
}

RevDate: 2024-10-11
CmpDate: 2024-10-11

Yegnanarayana B, V Pannala (2024)

Processing group delay spectrograms for study of formant and harmonic contours in speech signals.

The Journal of the Acoustical Society of America, 156(4):2422-2433.

This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.

Additional Links: PMID-39392353

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39392353,
year = {2024},
author = {Yegnanarayana, B and Pannala, V},
title = {Processing group delay spectrograms for study of formant and harmonic contours in speech signals.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2422-2433},
doi = {10.1121/10.0032364},
pmid = {39392353},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; Sound Spectrography ; Signal Processing, Computer-Assisted ; Speech Production Measurement/methods ; Voice Quality ; Time Factors ; Phonetics ; },
abstract = {This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Speech Acoustics
Sound Spectrography
Signal Processing, Computer-Assisted
Speech Production Measurement/methods
Voice Quality
Time Factors
Phonetics

RevDate: 2024-11-20
CmpDate: 2024-11-07

Parrell B, Niziolek CA, T Chen (2024)

Sensorimotor adaptation to a nonuniform formant perturbation generalizes to untrained vowels.

Journal of neurophysiology, 132(5):1437-1444.

When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a nonuniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.NEW & NOTEWORTHY We show that sensorimotor adaptation of vowels at the edges of the articulatory working space generalizes to intermediate vowels through local transfer of learning from adjacent vowels. These results extend findings on the locality of sensorimotor learning from upper limb control to speech, a complex task with an opaque and nonlinear transformation between motor actions and sensory consequences. Our results also suggest that our paradigm has potential to drive behaviorally relevant changes that improve communication effectiveness.

Additional Links: PMID-39356074

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39356074,
year = {2024},
author = {Parrell, B and Niziolek, CA and Chen, T},
title = {Sensorimotor adaptation to a nonuniform formant perturbation generalizes to untrained vowels.},
journal = {Journal of neurophysiology},
volume = {132},
number = {5},
pages = {1437-1444},
pmid = {39356074},
issn = {1522-1598},
support = {P50 HD105353/HD/NICHD NIH HHS/United States ; R01 DC017091/DC/NIDCD NIH HHS/United States ; R01 DC019134/DC/NIDCD NIH HHS/United States ; BCS 2120506//National Science Foundation (NSF)/ ; },
mesh = {Humans ; Male ; Female ; Adult ; *Adaptation, Physiological/physiology ; Young Adult ; *Speech/physiology ; Learning/physiology ; Speech Perception/physiology ; Generalization, Psychological/physiology ; Phonetics ; Feedback, Sensory/physiology ; },
abstract = {When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a nonuniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.NEW & NOTEWORTHY We show that sensorimotor adaptation of vowels at the edges of the articulatory working space generalizes to intermediate vowels through local transfer of learning from adjacent vowels. These results extend findings on the locality of sensorimotor learning from upper limb control to speech, a complex task with an opaque and nonlinear transformation between motor actions and sensory consequences. Our results also suggest that our paradigm has potential to drive behaviorally relevant changes that improve communication effectiveness.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Adult
*Adaptation, Physiological/physiology
Young Adult
*Speech/physiology
Learning/physiology
Speech Perception/physiology
Generalization, Psychological/physiology
Phonetics
Feedback, Sensory/physiology

RevDate: 2024-09-25

Huang T, Wang X, Xu T, et al (2024)

Acoustic Analysis of Mandarin-Speaking Transgender Women.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00291-1 [Epub ahead of print].

OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).

METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.

RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.

CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.

Additional Links: PMID-39322510

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39322510,
year = {2024},
author = {Huang, T and Wang, X and Xu, T and Zhao, W and Cao, Y and Kim, H and Yi, B},
title = {Acoustic Analysis of Mandarin-Speaking Transgender Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.037},
pmid = {39322510},
issn = {1873-4588},
abstract = {OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).

METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.

RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.

CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.},
}

RevDate: 2024-10-09
CmpDate: 2024-09-17

Kim H, Ratkute V, B Epp (2024)

Monaural and binaural masking release with speech-like stimuli.

JASA express letters, 4(9):.

The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.

Additional Links: PMID-39287502

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39287502,
year = {2024},
author = {Kim, H and Ratkute, V and Epp, B},
title = {Monaural and binaural masking release with speech-like stimuli.},
journal = {JASA express letters},
volume = {4},
number = {9},
pages = {},
doi = {10.1121/10.0028736},
pmid = {39287502},
issn = {2691-1191},
mesh = {Humans ; *Perceptual Masking/physiology ; *Speech Perception/physiology ; Adult ; Acoustic Stimulation ; Male ; Female ; Young Adult ; },
abstract = {The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Perceptual Masking/physiology
*Speech Perception/physiology
Adult
Acoustic Stimulation
Male
Female
Young Adult

RevDate: 2024-10-23
CmpDate: 2024-10-03

Chen S, Whalen DH, PPK Mok (2024)

What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.

Phonetica, 81(5):509-552.

Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.

Additional Links: PMID-39279469

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39279469,
year = {2024},
author = {Chen, S and Whalen, DH and Mok, PPK},
title = {What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.},
journal = {Phonetica},
volume = {81},
number = {5},
pages = {509-552},
pmid = {39279469},
issn = {1423-0321},
support = {R01 DC002717/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Phonetics ; *Speech Acoustics ; *Tongue/physiology ; Female ; Male ; China ; *Language ; Adult ; Young Adult ; Speech Production Measurement ; Ultrasonography ; East Asian People ; },
abstract = {Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Phonetics
*Speech Acoustics
*Tongue/physiology
Female
Male
China
*Language
Adult
Young Adult
Speech Production Measurement
Ultrasonography
East Asian People

RevDate: 2024-10-18
CmpDate: 2024-10-08

Thompson A, Y Kim (2024)

Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 67(10):3595-3611.

PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.

RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.

CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.

OPEN SCIENCE FORM: https://doi.org/10.23641/asha.27011281.

Additional Links: PMID-39259883

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39259883,
year = {2024},
author = {Thompson, A and Kim, Y},
title = {Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {67},
number = {10},
pages = {3595-3611},
pmid = {39259883},
issn = {1558-9102},
support = {F31 DC020121/DC/NIDCD NIH HHS/United States ; R03 DC012405/DC/NIDCD NIH HHS/United States ; },
mesh = {Humans ; *Parkinson Disease/physiopathology/complications ; *Speech Intelligibility/physiology ; Female ; Male ; Biomechanical Phenomena ; Aged ; *Dysarthria/etiology/physiopathology ; *Speech Acoustics ; Middle Aged ; Speech Production Measurement/methods ; Case-Control Studies ; Phonetics ; },
abstract = {PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.

METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.

RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.

CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.

OPEN SCIENCE FORM: https://doi.org/10.23641/asha.27011281.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
*Parkinson Disease/physiopathology/complications
*Speech Intelligibility/physiology
Female
Male
Biomechanical Phenomena
Aged
*Dysarthria/etiology/physiopathology
*Speech Acoustics
Middle Aged
Speech Production Measurement/methods
Case-Control Studies
Phonetics

RevDate: 2024-09-07

Subrahmanya A, Ranasinghe KG, Kothare H, et al (2024)

Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.

Frontiers in human neuroscience, 18:1424920.

Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.

Additional Links: PMID-39234407

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39234407,
year = {2024},
author = {Subrahmanya, A and Ranasinghe, KG and Kothare, H and Raharjo, I and Kim, KS and Houde, JF and Nagarajan, SS},
title = {Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1424920},
pmid = {39234407},
issn = {1662-5161},
abstract = {Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.},
}

RevDate: 2024-09-01

Vampola T, Horáček J, AM Laukkanen (2024)

Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00248-0 [Epub ahead of print].

This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.

Additional Links: PMID-39218756

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39218756,
year = {2024},
author = {Vampola, T and Horáček, J and Laukkanen, AM},
title = {Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.07.035},
pmid = {39218756},
issn = {1873-4588},
abstract = {This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.},
}

RevDate: 2024-08-31

Figueroa C, Guillén V, Huenupán F, et al (2024)

Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00254-6 [Epub ahead of print].

UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.

OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.

METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.

RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).

CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.

Additional Links: PMID-39217086

Publisher:

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39217086,
year = {2024},
author = {Figueroa, C and Guillén, V and Huenupán, F and Vallejos, C and Henríquez, E and Urrutia, F and Sanhueza, F and Alarcón, E},
title = {Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.006},
pmid = {39217086},
issn = {1873-4588},
abstract = {UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.

OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.

METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.

RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).

CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.},
}

RevDate: 2024-09-04
CmpDate: 2024-08-30

Zaltz Y (2024)

The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.

Trends in hearing, 28:23312165241275895.

Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.

Additional Links: PMID-39212078

PubMed:

Google:

Citation:

show bibtex listing

hide bibtex listing

@article {pmid39212078,
year = {2024},
author = {Zaltz, Y},
title = {The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.},
journal = {Trends in hearing},
volume = {28},
number = {},
pages = {23312165241275895},
pmid = {39212078},
issn = {2331-2165},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Perception/physiology ; *Generalization, Psychological ; *Cues ; *Noise/adverse effects ; *Acoustic Stimulation ; Adult ; Recognition, Psychology ; Perceptual Masking ; Adolescent ; Speech Acoustics ; Voice Quality ; Discrimination Learning/physiology ; Voice/physiology ; },
abstract = {Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.},
}

MeSH Terms:

show MeSH Terms

hide MeSH Terms

Humans
Male
Female
Young Adult
*Speech Perception/physiology
*Generalization, Psychological
*Cues
*Noise/adverse effects
*Acoustic Stimulation
Adult
Recognition, Psychology
Perceptual Masking
Adolescent
Speech Acoustics
Voice Quality
Discrimination Learning/physiology
Voice/physiology

RevDate: 2024-12-04

Parrell B, Naber C, Kim OA, et al (2024)

Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.

bioRxiv : the preprint server for biology.

Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.

Additional Links: PMID-39185222

PubMed:

Google: