Other Sites:
Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About: RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE
RJR: Recommended Bibliography 05 Dec 2024 at 01:47 Created:
Formants: Modulators of Communication
Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, "formant" is also sometimes used to refer to the acoustic resonance patten of a human vocal tract. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unique morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal/human vocalizations. Discussions of how formants affect the production and interpretation of vocalizations are available in a few YouTube videos. For example: Formants Explained and Demonstrated or What are FORMANTS and HARMONICS? VOCAL FORMANTS AND HARMONICS Explained! or How Do We Change Our Mouths to Shape Waves? Formants
Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion
Citations The Papers (from PubMed®)
RevDate: 2024-11-28
CmpDate: 2024-11-28
[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].
Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery, 38(12):1149-1153.
Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels(/a/, /(?)/, /i/ and /u/) and different sound intensities(lowest sound, comfort sound, highest true sound and highest falsetto sound) were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.
Additional Links: PMID-39605265
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39605265,
year = {2024},
author = {Xie, B and Li, Z and Wang, H and Kuang, X and Ni, W and Zhong, R and Li, Y},
title = {[The influence of vowel and sound intensity on the results of voice acoustic formant detection was analyzed].},
journal = {Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery},
volume = {38},
number = {12},
pages = {1149-1153},
doi = {10.13201/j.issn.2096-7993.2024.12.011},
pmid = {39605265},
issn = {2096-7993},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Acoustics ; Voice Quality ; Phonetics ; Voice/physiology ; Adult ; },
abstract = {Objective:This study aims to explore the influence of vowels and sound intensity on formant, so as to provide reference for the selection of sound samples and vocal methods in acoustic detection. Methods:Thirty-eight healthy subjects, 19 male and 19 female, aged 19-24 years old were recruited. The formants of different vowels(/a/, /(?)/, /i/ and /u/) and different sound intensities(lowest sound, comfort sound, highest true sound and highest falsetto sound) were analyzed, and pairings were compared between groups with significant differences. Results:①The vowels /a/ and /(?)/ in the first formant were larger than /i/ and /u/, and /i/ was the largest in the second formant. The minimum value of the first formant is the lowest sound of /i/ and the maximum is the highest sound of /a/. ②In the first formant, the chest sound area increases with the increase of sound intensity, while the second formant enters the highest falsetto and decreases significantly. Conclusion:Different vowels and sound intensity have different distribution of formant, that is, vowel and sound intensity have different degree of influence on formant. According to the extreme value of the first formant, the maximum normal range is determined initially, which is helpful to improve the acoustic detection.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Male
Female
Young Adult
*Speech Acoustics
Voice Quality
Phonetics
Voice/physiology
Adult
RevDate: 2024-11-26
Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.
METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.
RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.
CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.
SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.
Additional Links: PMID-39589237
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39589237,
year = {2024},
author = {Fagniart, S and Delvaux, V and Harmegnies, B and Huberlant, A and Huet, K and Piccaluga, M and Watterman, I and Charlier, B},
title = {Producing Nasal Vowels Without Nasalization? Perceptual Judgments and Acoustic Measurements of Nasal/Oral Vowels Produced by Children With Cochlear Implants and Typically Hearing Peers.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-22},
doi = {10.1044/2024_JSLHR-24-00083},
pmid = {39589237},
issn = {1558-9102},
abstract = {PURPOSE: The objective of the present study is to investigate nasal and oral vowel production in French-speaking children with cochlear implants (CIs) and children with typical hearing (TH). Vowel nasality relies primarily on acoustic cues that may be less effectively transmitted by the implant. The study investigates how children with CIs manage to produce these segments in French, a language with contrastive vowel nasalization.
METHOD: The children performed a task in which they repeated sentences containing a consonant-vowel-consonant-vowel-type pseudoword, the vowel being a nasal or oral vowel from French. Thirteen children with CIs and 25 children with TH completed the task. Among the children with CIs, the level of exposure to Cued Speech (CS) was either occasional (CS-) or intense (CS+). The productions were analyzed through perceptual judgments and acoustic measurements. Different acoustic cues related to nasality were collected: segmental durations, formant values, and predicted values of nasalization. Multiple regression analyses were conducted to examine which acoustic features are associated with perceived nasality in perceptual judgments.
RESULTS: The perceptual judgments realized on the children's speech productions indicate that children with sustained exposure to CS (CS+) exhibited the best identified and most distinct oral/nasal productions. Acoustic measures revealed different production profiles among the groups: Children in the CS+ group seem to differentiate between nasal and oral vowels by relying on segmental duration cues and variations in oropharyngeal configurations (associated with formant differences) but less through nasal resonance.
CONCLUSION: The study highlights (a) a benefit of sustained CS practice for CI children for the intelligibility of nasal-oral segments, (b) privileged exploitation of temporal (segmental duration) and salient acoustic cues (oropharyngeal configuration) in the CS+ group, and (c) difficulties among children with CI in distinguishing nasal-oral segments through nasal resonance.
SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.27744768.},
}
RevDate: 2024-11-16
Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00363-1 [Epub ahead of print].
OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.
METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.
RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.
CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.
Additional Links: PMID-39550323
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39550323,
year = {2024},
author = {Bøyesen, B and Hide, Ø},
title = {Using Twang and Medialization Techniques to Gain Feminine-Sounding Speech in Trans Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.10.020},
pmid = {39550323},
issn = {1873-4588},
abstract = {OBJECTIVES: In this study, we introduce an intervention based on two techniques: twang and medialization. The hypothesis is that a combination of these two techniques will enable trans women to gain feminine-sounding speech without vocal strain or harm.
METHOD: Five trans women took part in the study. A control group of five cisgender women and five cisgender men were included. A list of 14 monosyllabic words was created, where the vowel /ɑ/ was embedded in various consonant contexts. All participants were asked to read the word list three times, each time presented in a different order. The trans women read the word list before and after intervention. Acoustic analyses of fundamental frequency and the first, second, and third formant frequencies were conducted. For the perceptual analysis, 60 voice samples were selected from the entire material. Fifteen listeners were asked whether they perceived the voice samples as feminine, masculine, or uncertain. The listeners were also asked for gender judgments based on sentences read by the trans women after intervention.
RESULTS: The acoustic analyses revealed an increase in fundamental frequencies and first, second, and third formants after intervention for all five trans women, approaching the values of the female controls. The perceptual judgments showed that the majority of the trans women voice samples were perceived as feminine after intervention.
CONCLUSIONS: Based on the acoustic analyses and the perceptual evaluations, the conclusion seems to show that the combination of the techniques twang and medialization enable the trans women to obtain feminine attribution. Nevertheless, the study is too small for generalizations. However, a take-home message is that it is appropriate to focus primarily on resonance, in addition to speaking fundamental frequency, to gain feminine-sounding speech.},
}
RevDate: 2024-11-12
CmpDate: 2024-11-08
Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.
Scientific reports, 14(1):27183.
Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.
Additional Links: PMID-39516258
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39516258,
year = {2024},
author = {Carranante, G and Cany, C and Farri, P and Giavazzi, M and Varnet, L},
title = {Mapping the spectrotemporal regions influencing perception of French stop consonants in noise.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {27183},
pmid = {39516258},
issn = {2045-2322},
support = {ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; ANR-17-EURE-0017//Agence Nationale de la Recherche/ ; ANR-20-CE28-0004//Agence Nationale de la Recherche/ ; },
mesh = {Humans ; *Speech Perception/physiology ; Female ; Male ; *Noise ; Adult ; *Phonetics ; Young Adult ; Language ; Cues ; Speech Acoustics ; France ; Acoustic Stimulation ; },
abstract = {Understanding how speech sounds are decoded into linguistic units has been a central research challenge over the last century. This study follows a reverse-correlation approach to reveal the acoustic cues listeners use to categorize French stop consonants in noise. Compared to previous methods, this approach ensures an unprecedented level of detail with only minimal theoretical assumptions. Thirty-two participants performed a speech-in-noise discrimination task based on natural /aCa/ utterances, with C = /b/, /d/, /g/, /p/, /t/, or /k/. The trial-by-trial analysis of their confusions enabled us to map the spectrotemporal information they relied on for their decisions. In place-of-articulation contrasts, the results confirmed the critical role of formant consonant-vowel transitions, used by all participants, and, to a lesser extent, vowel-consonant transitions and high-frequency release bursts. Similarly, for voicing contrasts, we validated the prominent role of the voicing bar cue, with some participants also using formant transitions and burst cues. This approach revealed that most listeners use a combination of several cues for each task, with significant variability within the participant group. These insights shed new light on decades-old debates regarding the relative importance of cues for phoneme perception and suggest that research on acoustic cues should not overlook individual variability in speech perception.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Speech Perception/physiology
Female
Male
*Noise
Adult
*Phonetics
Young Adult
Language
Cues
Speech Acoustics
France
Acoustic Stimulation
RevDate: 2024-11-08
CmpDate: 2024-11-08
Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.
Journal of medical Internet research, 26:e58466 pii:v26i1e58466.
BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.
OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.
METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.
RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.
CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.
Additional Links: PMID-39515817
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39515817,
year = {2024},
author = {Lin, YC and Yan, HT and Lin, CH and Chang, HH},
title = {Identifying and Estimating Frailty Phenotypes by Vocal Biomarkers: Cross-Sectional Study.},
journal = {Journal of medical Internet research},
volume = {26},
number = {},
pages = {e58466},
doi = {10.2196/58466},
pmid = {39515817},
issn = {1438-8871},
mesh = {Humans ; Aged ; Cross-Sectional Studies ; *Frailty/physiopathology ; Male ; Female ; *Phenotype ; *Biomarkers ; Middle Aged ; Voice/physiology ; Aged, 80 and over ; Taiwan ; Frail Elderly/statistics & numerical data ; Sarcopenia/physiopathology/diagnosis ; },
abstract = {BACKGROUND: Researchers have developed a variety of indices to assess frailty. Recent research indicates that the human voice reflects frailty status. Frailty phenotypes are seldom discussed in the literature on the aging voice.
OBJECTIVE: This study aims to examine potential phenotypes of frail older adults and determine their correlation with vocal biomarkers.
METHODS: Participants aged ≥60 years who visited the geriatric outpatient clinic of a teaching hospital in central Taiwan between 2020 and 2021 were recruited. We identified 4 frailty phenotypes: energy-based frailty, sarcopenia-based frailty, hybrid-based frailty-energy, and hybrid-based frailty-sarcopenia. Participants were asked to pronounce a sustained vowel "/a/" for approximately 1 second. The speech signals were digitized and analyzed. Four voice parameters-the average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4)-were used for analyzing changes in voice. Logistic regression was used to elucidate the prediction model.
RESULTS: Among 277 older adults, an increase in A1 values was associated with a lower likelihood of energy-based frailty (odds ratio [OR] 0.81, 95% CI 0.68-0.96), whereas an increase in A2 values resulted in a higher likelihood of sarcopenia-based frailty (OR 1.34, 95% CI 1.18-1.52). Respondents with larger A3 and A4 values had a higher likelihood of hybrid-based frailty-sarcopenia (OR 1.03, 95% CI 1.002-1.06) and hybrid-based frailty-energy (OR 1.43, 95% CI 1.02-2.01), respectively.
CONCLUSIONS: Vocal biomarkers might be potentially useful in estimating frailty phenotypes. Clinicians can use 2 crucial acoustic parameters, namely A1 and A2, to diagnose a frailty phenotype that is associated with insufficient energy or reduced muscle function. The assessment of A3 and A4 involves a complex frailty phenotype.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Aged
Cross-Sectional Studies
*Frailty/physiopathology
Male
Female
*Phenotype
*Biomarkers
Middle Aged
Voice/physiology
Aged, 80 and over
Taiwan
Frail Elderly/statistics & numerical data
Sarcopenia/physiopathology/diagnosis
RevDate: 2024-11-12
CmpDate: 2024-11-12
Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).
The Journal of the Acoustical Society of America, 156(5):3118-3139.
In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.
Additional Links: PMID-39531311
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39531311,
year = {2024},
author = {Ponsonnet, M and Coupé, C and Pellegrino, F and Garcia Arasco, A and Pisanski, K},
title = {Vowel signatures in emotional interjections and nonlinguistic vocalizations expressing pain, disgust, and joy across languagesa).},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {5},
pages = {3118-3139},
doi = {10.1121/10.0032454},
pmid = {39531311},
issn = {1520-8524},
mesh = {Humans ; *Emotions ; Phonetics ; Language ; Speech Acoustics ; Pain/psychology ; Voice Quality ; Happiness ; },
abstract = {In this comparative cross-linguistic study we test whether expressive interjections (words like ouch or yay) share similar vowel signatures across the world's languages, and whether these can be traced back to nonlinguistic vocalizations (like screams and cries) expressing the same emotions of pain, disgust, and joy. We analyze vowels in interjections from dictionaries of 131 languages (over 600 tokens) and compare these with nearly 500 vowels based on formant frequency measures from voice recordings of volitional nonlinguistic vocalizations. We show that across the globe, pain interjections feature a-like vowels and wide falling diphthongs ("ai" as in Ayyy! "aw" as in Ouch!), whereas disgust and joy interjections do not show robust vowel regularities that extend geographically. In nonlinguistic vocalizations, all emotions yield distinct vowel signatures: pain prompts open vowels such as [a], disgust schwa-like central vowels, and joy front vowels such as [i]. Our results show that pain is the only affective experience tested with a clear, robust vowel signature that is preserved between nonlinguistic vocalizations and interjections across languages. These results offer empirical evidence for iconicity in some expressive interjections. We consider potential mechanisms and origins, from evolutionary pressures and sound symbolism to colexification, proposing testable hypotheses for future research.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Emotions
Phonetics
Language
Speech Acoustics
Pain/psychology
Voice Quality
Happiness
RevDate: 2024-11-01
Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.
Infancy : the official journal of the International Society on Infant Studies [Epub ahead of print].
Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.
Additional Links: PMID-39487102
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39487102,
year = {2024},
author = {Hullebus, M and Gafos, A and Boll-Avetisyan, N and Langus, A and Fritzsche, T and Höhle, B},
title = {Infant preference for specific phonetic cue relations in the contrast between voiced and voiceless stops.},
journal = {Infancy : the official journal of the International Society on Infant Studies},
volume = {},
number = {},
pages = {},
doi = {10.1111/infa.12630},
pmid = {39487102},
issn = {1532-7078},
support = {317633480 - SFB 1287//Deutsche Forschungsgemeinschaft/ ; },
abstract = {Acoustic variability in the speech input has been shown, in certain contexts, to be beneficial during infants' acquisition of sound contrasts. One approach attributes this result to the potential of variability to make the stability of individual cues visible. Another approach suggests that, instead of highlighting individual cues, variability uncovers stable relations between cues that signal a sound contrast. Here, we investigate the relation between Voice Onset Time and the onset of F1 formant frequency, two cues that subserve the voicing contrast in German. First, we verified that German-speaking adults' use of VOT to categorize voiced and voiceless stops is dependent on the value of the F1 onset frequency, in the specific form of a so-called trading relation. Next, we tested whether 6-month-old German learning infants exhibit differential sensitivity to stimulus continua in which the cues varied to an equal extent, but either adhered to the trading relation established in the adult experiment or adhered to a reversed relation. Our results present evidence that infants prefer listening to speech in which phonetic cues conform to certain cue trading relations over cue relations that are reversed.},
}
RevDate: 2024-10-30
Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.
Digital biomarkers, 8(1):159-170.
INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.
METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.
RESULTS: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.
CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.
Additional Links: PMID-39473806
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39473806,
year = {2024},
author = {Ayadi, H and Elbéji, A and Despotovic, V and Fagherazzi, G},
title = {Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings: Results from the Colive Voice Study.},
journal = {Digital biomarkers},
volume = {8},
number = {1},
pages = {159-170},
pmid = {39473806},
issn = {2504-110X},
abstract = {INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.
METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.
RESULTS: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.
CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.},
}
RevDate: 2024-10-26
Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.
Neuroscience letters pii:S0304-3940(24)00404-X [Epub ahead of print].
Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).
Additional Links: PMID-39461704
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39461704,
year = {2024},
author = {Jingwen Li, J and Daliri, A and Kim, KS and Max, L},
title = {Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.},
journal = {Neuroscience letters},
volume = {},
number = {},
pages = {138025},
doi = {10.1016/j.neulet.2024.138025},
pmid = {39461704},
issn = {1872-7972},
abstract = {Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production, and its possible relation to PSAM, at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).},
}
RevDate: 2024-10-24
The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00309-6 [Epub ahead of print].
PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.
METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.
RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.
CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.
Additional Links: PMID-39448279
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39448279,
year = {2024},
author = {Pekdemir, A and Kemaloğlu, YK and Gölaç, H and İriz, A and Köktürk, O and Mengü, G},
title = {The Self-Assessment, Perturbation, and Resonance Values of Voice and Speech in Individuals with Snoring and Obstructive Sleep Apnea.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.018},
pmid = {39448279},
issn = {1873-4588},
abstract = {PURPOSE: The static and dynamic soft tissue changes resulting in hypopnea and/or apnea in the subjects with obstructive sleep apnea (OSA) occur in the upper airway, which also serves as the voice or speech tract. In this study, we looked for the Voice Handicap Index-10 (VHI-10) and Voice-Related Quality of Life (V-RQOL) scores in addition to perturbation and formant values of the vowels in those with snoring and OSA.
METHODS: Epworth Sleepiness Scale (ESS), STOP-Bang scores, Body-Mass Index (BMI), neck circumference (NC), modified Mallampati Index, tonsil size, Apnea-Hypopnea Index, VHI-10 and V-RQOL scores, perturbation and formant values, and fundamental frequency of the voice samples were taken to evaluate.
RESULTS: The data revealed that not the perturbation and formant values but scores of VHI-10 and V-RQOL were significantly different between the control and OSA subjects and that both were significantly correlated with ESS and NC. Further, a few significant correlations of BMI and tonsil size with the formant and perturbation values were also found.
CONCLUSIONS: Our data reveal that (i) VHI-10 and V-RQOL were good identifiers for those with OSA, and (ii) perturbation and formant values were related to particularly tonsil size, and further BMI. Hence, we could say that in an attempt to use a voice parameter to screen OSA, VHI-10, and V-RQOL appeared to be better than the objective voice measures, which could be variable due to the tonsil size and BMI of the subjects.},
}
RevDate: 2024-10-24
CmpDate: 2024-10-24
Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.
The Journal of the Acoustical Society of America, 156(4):2860-2876.
Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.
Additional Links: PMID-39445770
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39445770,
year = {2024},
author = {Feng, S and Jiang, X},
title = {Acoustic encoding of vocally expressed confidence and doubt in Chinese bidialectics.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2860-2876},
doi = {10.1121/10.0032400},
pmid = {39445770},
issn = {1520-8524},
mesh = {Humans ; Male ; Female ; *Speech Acoustics ; Adult ; *Speech Perception ; Young Adult ; Language ; Phonetics ; Intention ; Multilingualism ; East Asian People ; },
abstract = {Language communicators use acoustic-phonetic cues to convey a variety of social information in the spoken language, and the learning of a second language affects speech production in a social setting. It remains unclear how speaking different dialects could affect the acoustic metrics underlying the intended communicative meanings. Nine Chinese Bayannur-Mandarin bidialectics produced single-digit numbers in statements of both Standard Mandarin and the Bayannur dialect with different levels of intended confidence. Fifteen listeners judged the intention presence and confidence level. Prosodically unmarked and marked stimuli exhibited significant differences in perceived intention. A higher intended level was perceived as more confident. The acoustic analysis revealed the segmental (third and fourth formants, center of gravity), suprasegmental (mean fundamental frequency, fundamental frequency range, duration), and source features (harmonic to noise ratio, cepstral peak prominence) can distinguish between confident and doubtful expressions. Most features also distinguished between dialect and Mandarin productions. Interactions on fourth formant and mean fundamental frequency suggested that speakers made greater use of acoustic parameters to encode confidence and doubt in the Bayannur dialect than in Mandarin. In machine learning experiments, the above-chance-level overall classification rates for confidence and doubt and the in-group advantage supported the dialect theory.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Male
Female
*Speech Acoustics
Adult
*Speech Perception
Young Adult
Language
Phonetics
Intention
Multilingualism
East Asian People
RevDate: 2024-10-23
The acoustic characteristics of Swedish vowels.
Phonetica [Epub ahead of print].
The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.
Additional Links: PMID-39443329
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39443329,
year = {2024},
author = {Persson, A},
title = {The acoustic characteristics of Swedish vowels.},
journal = {Phonetica},
volume = {},
number = {},
pages = {},
pmid = {39443329},
issn = {1423-0321},
abstract = {The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.},
}
RevDate: 2024-10-22
Analysis of Voice Quality in Children With Smith-Magenis Syndrome.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00319-9 [Epub ahead of print].
UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.
OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.
METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.
RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.
CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.
Additional Links: PMID-39438167
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39438167,
year = {2024},
author = {Martínez-Olalla, R and Hidalgo-De la Guía, I and Gayarzábal-Heinze, E and Fernández-Ruiz, R and Núñez-Vidal, E and Álvarez-Marquina, A and Palacios-Alonso, D},
title = {Analysis of Voice Quality in Children With Smith-Magenis Syndrome.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.026},
pmid = {39438167},
issn = {1873-4588},
abstract = {UNLABELLED: The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in speech. In previous works, the authors determined the relationship between the pathological characteristics of the voice of the speakers with Smith-Magenis syndrome (SMS) and a lower value in the cepstral peak prominence (CPP) with respect to normative speakers. They also described the presence of subharmonics in their voices.
OBJECTIVES: The present study aims to verify whether both characteristics can be used simultaneously to differentiate SMS voices from neurotypical voices. It will also be analyzed if there is variation in the trajectory of the formants coinciding with the subharmonics.
METHODS: To do this, the effect of subharmonics in the voices of 12 SMS individuals was isolated to see if they were responsible for the lower CPP values. An evaluation of the CPP was also carried out in the areas of subharmonic presence, from the peak that reflected the value of f0, rather than using the most prominent peak. This offered us a baseline for the CPP value in the presence of subharmonics. It was checked if changes in the formants occurred synchronously to the appearance of those subharmonics. If so, the muscles that control the position of the jaw and tongue would be affected at the same time as the larynx. The latter was difficult to observe since the samples were very short. A comparison of phonatory performance of a sustained /a/ between a normotypical group and non-normotypical group of children was carried out. These groups were balanced and matched in age and gender. The Spanish Association of Smith-Magenis Syndrome (ASME) provides almost 20% of the population in Spain.
RESULTS: The CPP allows differentiating between normative speakers and those with SMS, even when isolating the effect of subharmonics.
CONCLUSIONS: The CPP is a robust index for determining the degree of dysphonia. It makes it possible to differentiate pathological voices from healthy voices even when subharmonics are present. The presence of subharmonics is a characteristic of voices of SMS individuals and is not present in healthy ones. Both indexes can be used simultaneously to differentiate SMS voices from neurotypical voices.},
}
RevDate: 2024-10-17
Divided Attention Has Limited Effects on Speech Sensorimotor Control.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.
METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.
RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.
CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.
Additional Links: PMID-39418590
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39418590,
year = {2024},
author = {Krakauer, J and Naber, C and Niziolek, CA and Parrell, B},
title = {Divided Attention Has Limited Effects on Speech Sensorimotor Control.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-11},
doi = {10.1044/2024_JSLHR-24-00098},
pmid = {39418590},
issn = {1558-9102},
abstract = {PURPOSE: When vowel formants are externally perturbed, speakers change their production to oppose that perturbation both during the ongoing production (compensation) and in future productions (adaptation). To date, attempts to explain the large variability across individuals in these responses have focused on trait-based characteristics such as auditory acuity, but evidence from other motor domains suggests that attention may modulate the motor response to sensory perturbations. Here, we test the extent to which divided attention impacts sensorimotor control for supralaryngeal articulation.
METHOD: Neurobiologically healthy speakers were exposed to random (Experiment 1) or consistent (Experiment 2) real-time auditory perturbation of vowel formants to measure online compensation and trial-to-trial adaptation, respectively. In both experiments, participants completed two conditions: one with a simultaneous visual distractor task to divide attention and one without this secondary task.
RESULTS: Divided visual attention slightly reduced online compensation, but only starting > 300 ms after vowel onset, well beyond the typical duration of vowels in speech. Divided attention had no effect on adaptation.
CONCLUSIONS: The results from both experiments suggest that the use of sensory feedback in typical speech motor control is a largely automatic process unaffected by divided visual attention, suggesting that the source of cross-speaker variability in response to formant perturbations likely lies within the speech production system rather than in higher-level cognitive processes. Methodologically, these results suggest that compensation for formant perturbations should be measured prior to 300 ms after vowel onset to avoid any potential impact of attention or other higher-order cognitive factors.},
}
RevDate: 2024-10-16
The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00334-5 [Epub ahead of print].
The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.
Additional Links: PMID-39414424
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39414424,
year = {2024},
author = {He, Y and Wang, X and Huang, T and Zhao, W and Fu, Z and Zheng, Q and Jin, L and Kim, H and Liu, H},
title = {The Study of Speech Acoustic Characteristics of Elderly Individuals with Presbyphagia in Ningbo, China.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.041},
pmid = {39414424},
issn = {1873-4588},
abstract = {The feasibility of using acoustic parameters to predict presbyphagia has been preliminarily confirmed. Considering that age and gender can influence the results of acoustic parameters, this study aimed to further explore the specific effects of age and gender on acoustic parameter analysis of the elderly population over 60 years old with presbyphagia. A total of 45 participants were enrolled and divided into three groups (60-69 years old, 70-79 years old, and 80-89 years old). Acoustic parameters, including maximum phonation time, first to third formant frequencies (F1-F3) of /a/, /i/, and /u/, oral diadochokinesis, the acoustic vowel space, and laryngeal diadochokinesis (LDDK), were extracted and calculated. Two-way analysis of variance was used to analyze the correlations between acoustic parameters and age and gender. The result indicates that /hʌ/ LDDK rate had significant differences in age groups, presenting the 80-89 age group being significantly slower than the 60-69 age group. F1/a/, F2/a/, F2/i/, F3/i/, and F2i/F2u differed systematically between genders, with males being lower and smaller than females. Changes that were consistent with /hʌ/ LDDK regularity, confirmed by greater regularity in females. No significant differences were observed for other acoustic parameters. No significant interactions were revealed. According to the preliminary data, we hypothesized that respiratory capacity and control during vocal fold abduction weaken with aging. This highlights the importance of continuously monitoring the respiratory impact on swallowing function in elderly individuals. Additionally, gender influenced several acoustic parameters, indicating the necessity to differentiate between genders when assessing presbyphagia using acoustic parameters, especially focusing on swallowing function in elderly males in Ningbo.},
}
RevDate: 2024-10-16
Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00316-3 [Epub ahead of print].
OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.
METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).
RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.
CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.
Additional Links: PMID-39414423
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39414423,
year = {2024},
author = {Wang, Y and Zhao, Y},
title = {Acoustic Characteristics of Modern Chinese Folk Singing at Different Vocal Efforts.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.09.022},
pmid = {39414423},
issn = {1873-4588},
abstract = {OBJECTIVES: Modern Chinese folk singing is developed by fusing regionally specific traditional Chinese singing with Western scientific training techniques. The purpose of this research is to contribute to the exploration of the acoustic characteristics of Chinese folk songs and the efficient resonance space for the performance.
METHOD: Seven tenors and seven sopranos were invited to sing three songs and read the lyrics in an anechoic chamber. The vocal outputs were meticulously recorded and subjected to a comprehensive acoustic analysis. Overall equivalent sound level, long-term average spectrum (LTAS), gain factors, and other acoustic parameters were analyzed for different vocal efforts (soft, normal, and loud), genders, and vocal modes (singing and speaking).
RESULTS: Male singers have singer's formant at 3 kHz in LTAS, a characteristic not found in other country singers or Chinese opera singers, but slightly higher than the frequency of Western Classical singers. Female singers do not have singer's formant and their LTAS curves are much flatter. The α, spectral balance, and singing power ratio all increased with increasing vocal effort, and they are higher for singing than for speaking. Finally, there is a significant gain factor at 3 kHz, with a maximum value of 1.85 for men and 1.68 for women.
CONCLUSIONS: Male singers in Chinese folk singing have a singer's formant, a phenomenon not consistently observed in their female singers. The intricate acoustic characteristics of this singing style have been extensively examined and can contribute to the existing literature on the spectral properties of diverse vocal genres. Furthermore, this analysis offers foundational data essential for the optimization of room acoustics tailored to vocal performance.},
}
RevDate: 2024-10-14
CmpDate: 2024-10-14
Dynamic acoustic vowel distances within and across dialects.
The Journal of the Acoustical Society of America, 156(4):2497-2507.
Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.
Additional Links: PMID-39400271
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39400271,
year = {2024},
author = {Clopper, CG},
title = {Dynamic acoustic vowel distances within and across dialects.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2497-2507},
doi = {10.1121/10.0032385},
pmid = {39400271},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; *Speech Production Measurement/methods ; Voice Quality ; Acoustics ; Female ; Male ; Time Factors ; Language ; Sound Spectrography ; Adult ; },
abstract = {Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Speech Acoustics
*Phonetics
*Speech Production Measurement/methods
Voice Quality
Acoustics
Female
Male
Time Factors
Language
Sound Spectrography
Adult
RevDate: 2024-10-13
Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.
Audiology & neuro-otology pii:000541716 [Epub ahead of print].
INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABI) throughout the first three years of life is difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.
METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.
RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.
CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.
Additional Links: PMID-39396508
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39396508,
year = {2024},
author = {Ozkan Atak, HB and Aslan, F and Sennaroglu, G and Sennaroglu, L},
title = {Children with Auditory Brainstem Implants: Language Proficiency and Reading Comprehension Process.},
journal = {Audiology & neuro-otology},
volume = {},
number = {},
pages = {1-23},
doi = {10.1159/000541716},
pmid = {39396508},
issn = {1421-9700},
abstract = {INTRODUCTION: Auditory performance and language proficiency in young children who utilize auditory brainstem implants (ABI) throughout the first three years of life is difficult to predict. ABI users have challenges as a result of delays in language proficiency and the acquisition of reading comprehension, even if ABI technology offers auditory experiences that enhance spoken language development. The aim of this study was to evaluate about the impact of language proficiency on reading comprehension skills in children with ABI.
METHOD: In this study, 20 children with ABI were evaluated for their reading comprehension abilities and language proficiency using an Informal Reading Inventory, Test of Early Language Development (TELD-3), Categories of Auditory Performance-II (CAP-II), and Speech Intelligibility Rating (SIR). Three distinct aspects of reading comprehension were assessed and analyzed to provide a composite score for reading comprehension abilities. TELD-3, which measures receptive and expressive language proficiency, was presented through spoken language.
RESULTS: Studies have shown that there was a relationship between language proficiency and reading comprehension in children with ABI. In the present study, it was determined that the total scores of reading comprehension skills of the children who had poor language proficiency and enrolled in the school for the deaf were also low. The children use short, basic sentences, often repeat words and phrases, and have a restricted vocabulary. In addition, the children had difficulty reading characters and detailed paragraphs and could not remember events in a logical order.
CONCLUSION: Children with ABI may potentially have complicated reading comprehension abilities due to lack of access to all the speech formants needed to develop spoken language. In addition, variables affecting the reading levels of children with ABI include factors such as age at implantation, duration of implant use, presence of additional disability, communication model and access to auditory rehabilitation. The reading comprehension skills of ABI users were evaluated in this study for the first time in the literature and may constitute a starting point for the examination of variables affecting reading comprehension in this area.},
}
RevDate: 2024-10-11
CmpDate: 2024-10-11
Processing group delay spectrograms for study of formant and harmonic contours in speech signals.
The Journal of the Acoustical Society of America, 156(4):2422-2433.
This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.
Additional Links: PMID-39392353
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39392353,
year = {2024},
author = {Yegnanarayana, B and Pannala, V},
title = {Processing group delay spectrograms for study of formant and harmonic contours in speech signals.},
journal = {The Journal of the Acoustical Society of America},
volume = {156},
number = {4},
pages = {2422-2433},
doi = {10.1121/10.0032364},
pmid = {39392353},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; Sound Spectrography ; Signal Processing, Computer-Assisted ; Speech Production Measurement/methods ; Voice Quality ; Time Factors ; Phonetics ; },
abstract = {This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Speech Acoustics
Sound Spectrography
Signal Processing, Computer-Assisted
Speech Production Measurement/methods
Voice Quality
Time Factors
Phonetics
RevDate: 2024-10-02
Sensorimotor adaptation to a non-uniform formant perturbation generalizes to untrained vowels.
Journal of neurophysiology [Epub ahead of print].
When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a non-uniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.
Additional Links: PMID-39356074
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39356074,
year = {2024},
author = {Parrell, B and Niziolek, CA and Chen, T},
title = {Sensorimotor adaptation to a non-uniform formant perturbation generalizes to untrained vowels.},
journal = {Journal of neurophysiology},
volume = {},
number = {},
pages = {},
doi = {10.1152/jn.00240.2024},
pmid = {39356074},
issn = {1522-1598},
support = {R01 DC019134/DC/NIDCD NIH HHS/United States ; R01 DC017091/DC/NIDCD NIH HHS/United States ; BCS 2120506//National Science Foundation (NSF)/ ; },
abstract = {When speakers learn to change the way they produce a speech sound, how much does that learning generalize to other speech sounds? Past studies of speech sensorimotor learning have typically tested the generalization of a single transformation learned in a single context. Here, we investigate the ability of the speech motor system to generalize learning when multiple opposing sensorimotor transformations are learned in separate regions of the vowel space. We find that speakers adapt to a non-uniform "centralization" perturbation, learning to produce vowels with greater acoustic contrast, and that this adaptation generalizes to untrained vowels, which pattern like neighboring trained vowels and show increased contrast of a similar magnitude.},
}
RevDate: 2024-09-25
Acoustic Analysis of Mandarin-Speaking Transgender Women.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00291-1 [Epub ahead of print].
OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).
METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.
RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.
CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.
Additional Links: PMID-39322510
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39322510,
year = {2024},
author = {Huang, T and Wang, X and Xu, T and Zhao, W and Cao, Y and Kim, H and Yi, B},
title = {Acoustic Analysis of Mandarin-Speaking Transgender Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.037},
pmid = {39322510},
issn = {1873-4588},
abstract = {OBJECTIVES: This study aims to investigate the speech characteristics and assess the potential risk of voice fatigue and voice disorders in Chinese transgender women (TW).
METHODS: A case-control study was conducted involving TW recruited in Shanghai, China. The participants included 15 TW, 20 cisgender men (CISM), and 20 cisgender women (CISW). Acoustic parameters including formants (F1, F2, F3, F4), cepstral peak prominence (CPP), jitter, shimmer, harmonic-to-noise ratio (HNR), noise-to-harmonics (NHR), fundamental frequency (f0), and intensity, across vowels, passages, and free talking. Additionally, the Voice Handicap Index-10 (VHI-10) and the Voice Fatigue Index were administered to evaluate voice-related concerns.
RESULTS: (1) The F1 of TW was significantly higher than that of CISW for the vowels /i/ and /u/, and significantly higher than that of CISM for the vowels /a/, /i/, and /u/. The F2 of TW was significantly lower than CISW for the vowels /i/, significantly higher than CISW for the vowels /u/, and significantly higher than CISM for the vowels /a/ and /u/. F3 was significantly lower in TW than in CISW for the vowels /a/ and /i/. The F4 formant was significantly lower in TW than in CISW for the vowels /a/ and /i/, but significantly higher than in CISM for the vowel /u/. (2) The f0 of TW was significantly lower than that of CISW for the vowels /a/, /i/, /u/, during passage reading, and in free speech, but was significantly higher than CISM during passage reading and free talking. Additionally, TW exhibited significantly higher intensity compared with CISW for the vowel /a/ and during passage reading. (3) Jitter in TW was significantly higher than in CISW for the vowels /i/ and /u/, and significantly lower than in CISM during passage reading and free talking. Shimmer was significantly higher in TW compared with both CISW and CISM across the vowels /a/, /i/, during passage reading, and in free talking. The HNR in TW was significantly lower than in both CISW and CISM across all vowels, during passage reading, and in free talking. The NHR was significantly higher in TW than in CISW across all vowels, during passage reading, and in free talking, and significantly higher than in CISM for the vowels /a/, /i/, during passage reading, and in free talking. The CPP in TW was significantly lower than in CISW during passage reading and free talking, and significantly lower than in CISM across all vowels, during passage reading, and in free speech. (4) The VHI-10 scores were significantly higher in TW compared with both CISM and CISW.
CONCLUSIONS: TW exhibit certain acoustic parameters, such as f0 and some of the formants, that fall between those of CISW and CISM without undergoing phonosurgery or voice training. The findings suggest a potential risk for voice fatigue and the development of voice disorders as TW try to modify their vocal characteristics to align with their gender identity.},
}
RevDate: 2024-09-17
CmpDate: 2024-09-17
Monaural and binaural masking release with speech-like stimuli.
JASA express letters, 4(9):.
The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.
Additional Links: PMID-39287502
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39287502,
year = {2024},
author = {Kim, H and Ratkute, V and Epp, B},
title = {Monaural and binaural masking release with speech-like stimuli.},
journal = {JASA express letters},
volume = {4},
number = {9},
pages = {},
doi = {10.1121/10.0028736},
pmid = {39287502},
issn = {2691-1191},
mesh = {Humans ; *Perceptual Masking/physiology ; *Speech Perception/physiology ; Adult ; Acoustic Stimulation ; Male ; Female ; Young Adult ; },
abstract = {The relevance of comodulation and interaural phase difference for speech perception is still unclear. We used speech-like stimuli to link spectro-temporal properties of formants with masking release. The stimuli comprised a tone and three masker bands centered at formant frequencies F1, F2, and F3 derived from a consonant-vowel. The target was a diotic or dichotic frequency-modulated tone following F2 trajectories. Results showed a small comodulation masking release, while the binaural masking level difference was comparable to previous findings. The data suggest that factors other than comodulation may play a dominant role in grouping frequency components in speech.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Perceptual Masking/physiology
*Speech Perception/physiology
Adult
Acoustic Stimulation
Male
Female
Young Adult
RevDate: 2024-09-16
What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.
Phonetica [Epub ahead of print].
Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.
Additional Links: PMID-39279469
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39279469,
year = {2024},
author = {Chen, S and Whalen, DH and Mok, PPK},
title = {What R Mandarin Chinese /ɹ/s? - acoustic and articulatory features of Mandarin Chinese rhotics.},
journal = {Phonetica},
volume = {},
number = {},
pages = {},
pmid = {39279469},
issn = {1423-0321},
abstract = {Rhotic sounds are well known for their considerable phonetic variation within and across languages and their complexity in speech production. Although rhotics in many languages have been examined and documented, the phonetic features of Mandarin rhotics remain unclear, and debates about the prevocalic rhotic (the syllable-onset rhotic) persist. This paper extends the investigation of rhotic sounds by examining the articulatory and acoustic features of Mandarin Chinese rhotics in prevocalic, syllabic (the rhotacized vowel [ɚ]), and postvocalic (r-suffix) positions. Eighteen speakers from Northern China were recorded using ultrasound imaging. Results showed that Mandarin syllabic and postvocalic rhotics can be articulated with various tongue shapes, including tongue-tip-up retroflex and tongue-tip-down bunched shapes. Different tongue shapes have no significant acoustic differences in the first three formants, demonstrating a many-to-one articulation-acoustics relationship. The prevocalic rhotics in our data were found to be articulated only with bunched tongue shapes, and were sometimes produced with frication noise at the start. In general, rhotics in all syllable positions are characterized by a close F2 and F3, though the prevocalic rhotic has a higher F2 and F3 than the syllabic and postvocalic rhotics. The effects of syllable position and vowel context are also discussed.},
}
RevDate: 2024-09-11
Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.
METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.
RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.
CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.
Additional Links: PMID-39259883
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39259883,
year = {2024},
author = {Thompson, A and Kim, Y},
title = {Acoustic and Kinematic Predictors of Intelligibility and Articulatory Precision in Parkinson's Disease.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-17},
doi = {10.1044/2024_JSLHR-24-00153},
pmid = {39259883},
issn = {1558-9102},
abstract = {PURPOSE: This study investigated relationships within and between perceptual, acoustic, and kinematic measures in speakers with and without dysarthria due to Parkinson's disease (PD) across different clarity conditions. Additionally, the study assessed the predictive capabilities of selected acoustic and kinematic measures for intelligibility and articulatory precision ratings.
METHOD: Forty participants, comprising 22 with PD and 18 controls, read three phrases aloud using conversational, less clear, and more clear speaking conditions. Acoustic measures and their theoretical kinematic parallel measures (i.e., acoustic and kinematic distance and vowel space area [VSA]; second formant frequency [F2] slope and kinematic speed) were obtained from the diphthong /aɪ/ and selected vowels in the sentences. A total of 368 listeners from crowdsourcing provided ratings for intelligibility and articulatory precision. The research questions were examined using correlations and linear mixed-effects models.
RESULTS: Intelligibility and articulatory precision ratings were highly correlated across all speakers. Acoustic and kinematic distance, as well as F2 slope and kinematic speed, showed moderately positive correlations. In contrast, acoustic and kinematic VSA exhibited no correlation. Among all measures, acoustic VSA and kinematic distance were robust predictors of both intelligibility and articulatory precision ratings, but they were stronger predictors of articulatory precision.
CONCLUSIONS: The findings highlight the importance of measurement selection when examining cross-domain relationships. Additionally, they support the use of behavioral modifications aimed at eliciting larger articulatory gestures to improve intelligibility in individuals with dysarthria due to PD.},
}
RevDate: 2024-09-05
Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.
Frontiers in human neuroscience, 18:1424920.
Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.
Additional Links: PMID-39234407
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39234407,
year = {2024},
author = {Subrahmanya, A and Ranasinghe, KG and Kothare, H and Raharjo, I and Kim, KS and Houde, JF and Nagarajan, SS},
title = {Pitch corrections occur in natural speech and are abnormal in patients with Alzheimer's disease.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1424920},
pmid = {39234407},
issn = {1662-5161},
abstract = {Past studies have explored formant centering, a corrective behavior of convergence over the duration of an utterance toward the formants of a putative target vowel. In this study, we establish the existence of a similar centering phenomenon for pitch in healthy elderly controls and examine how such corrective behavior is altered in Alzheimer's Disease (AD). We found the pitch centering response in healthy elderly was similar when correcting pitch errors below and above the target (median) pitch. In contrast, patients with AD showed an asymmetry with a larger correction for the pitch errors below the target phonation than above the target phonation. These findings indicate that pitch centering is a robust compensation behavior in human speech. Our findings also explore the potential impacts on pitch centering from neurodegenerative processes impacting speech in AD.},
}
RevDate: 2024-09-01
Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00248-0 [Epub ahead of print].
This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.
Additional Links: PMID-39218756
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39218756,
year = {2024},
author = {Vampola, T and Horáček, J and Laukkanen, AM},
title = {Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.07.035},
pmid = {39218756},
issn = {1873-4588},
abstract = {This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.},
}
RevDate: 2024-08-31
Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00254-6 [Epub ahead of print].
UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.
OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.
METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.
RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).
CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.
Additional Links: PMID-39217086
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39217086,
year = {2024},
author = {Figueroa, C and Guillén, V and Huenupán, F and Vallejos, C and Henríquez, E and Urrutia, F and Sanhueza, F and Alarcón, E},
title = {Comparison of Acoustic Parameters of Voice and Speech According to Vowel Type and Suicidal Risk in Adolescents.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.08.006},
pmid = {39217086},
issn = {1873-4588},
abstract = {UNLABELLED: Globally, suicide prevention and understanding suicidal behavior represent significant health challenges. The predictive potential of voice, speech, and language appears as a promising solution to the difficulty in assessment.
OBJECTIVE: To analyze variations in acoustic parameters in voice and speech based on vowel types according to different levels of suicidal risk among adolescents in a text reading task.
METHODOLOGY: Cross-sectional analytical design using nonprobabilistic sampling. Our sample comprised 98 adolescents aged 14 to 19, undergoing voice acoustic assessment, along with suicidal ideation determination through the Okasha Suicidality Scale and Beck Depression Inventory. Acoustic analysis of recordings was conducted using Praat for phonetic research, Python program, Focusrite interface, and microphone to register voice and speech acoustic parameters such as Fundamental Frequency, Jitter, and Formants. Subsequently, data from adolescents with and without suicidal risk were compared.
RESULTS: Significant differences were observed between suicidal and nonsuicidal adolescents in several acoustic aspects, especially in females in fundamental frequency (F0), signal-to-noise ratio (HNRdB), and temporal variability measured by jitter and standard deviation. In men, differences were found in F0 and HNRdB (P < 0.05).
CONCLUSION: This study demonstrated statistically significant variations in various voice acoustic parameters among adolescents with and without suicidal risk. These findings underscore the potential relevance of voice and speech as markers for suicidal risk.},
}
RevDate: 2024-08-30
CmpDate: 2024-08-30
The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.
Trends in hearing, 28:23312165241275895.
Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.
Additional Links: PMID-39212078
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39212078,
year = {2024},
author = {Zaltz, Y},
title = {The Impact of Trained Conditions on the Generalization of Learning Gains Following Voice Discrimination Training.},
journal = {Trends in hearing},
volume = {28},
number = {},
pages = {23312165241275895},
doi = {10.1177/23312165241275895},
pmid = {39212078},
issn = {2331-2165},
mesh = {Humans ; Male ; Female ; Young Adult ; *Speech Perception/physiology ; *Generalization, Psychological ; *Cues ; *Noise/adverse effects ; *Acoustic Stimulation ; Adult ; Recognition, Psychology ; Perceptual Masking ; Adolescent ; Speech Acoustics ; Voice Quality ; Discrimination Learning/physiology ; Voice/physiology ; },
abstract = {Auditory training can lead to notable enhancements in specific tasks, but whether these improvements generalize to untrained tasks like speech-in-noise (SIN) recognition remains uncertain. This study examined how training conditions affect generalization. Fifty-five young adults were divided into "Trained-in-Quiet" (n = 15), "Trained-in-Noise" (n = 20), and "Control" (n = 20) groups. Participants completed two sessions. The first session involved an assessment of SIN recognition and voice discrimination (VD) with word or sentence stimuli, employing combined fundamental frequency (F0) + formant frequencies voice cues. Subsequently, only the trained groups proceeded to an interleaved training phase, encompassing six VD blocks with sentence stimuli, utilizing either F0-only or formant-only cues. The second session replicated the interleaved training for the trained groups, followed by a second assessment conducted by all three groups, identical to the first session. Results showed significant improvements in the trained task regardless of training conditions. However, VD training with a single cue did not enhance VD with both cues beyond control group improvements, suggesting limited generalization. Notably, the Trained-in-Noise group exhibited the most significant SIN recognition improvements posttraining, implying generalization across tasks that share similar acoustic conditions. Overall, findings suggest training conditions impact generalization by influencing processing levels associated with the trained task. Training in noisy conditions may prompt higher auditory and/or cognitive processing than training in quiet, potentially extending skills to tasks involving challenging listening conditions, such as SIN recognition. These insights hold significant theoretical and clinical implications, potentially advancing the development of effective auditory training protocols.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Male
Female
Young Adult
*Speech Perception/physiology
*Generalization, Psychological
*Cues
*Noise/adverse effects
*Acoustic Stimulation
Adult
Recognition, Psychology
Perceptual Masking
Adolescent
Speech Acoustics
Voice Quality
Discrimination Learning/physiology
Voice/physiology
RevDate: 2024-08-26
Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.
bioRxiv : the preprint server for biology pii:2024.08.13.607718.
Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.
Additional Links: PMID-39185222
Full Text:
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39185222,
year = {2024},
author = {Parrell, B and Naber, C and Kim, OA and Nizolek, CA and McDougle, SD},
title = {Audiomotor prediction errors drive speech adaptation even in the absence of overt movement.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.1101/2024.08.13.607718},
pmid = {39185222},
issn = {2692-8205},
abstract = {Observed outcomes of our movements sometimes differ from our expectations. These sensory prediction errors recalibrate the brain's internal models for motor control, reflected in alterations to subsequent movements that counteract these errors (motor adaptation). While leading theories suggest that all forms of motor adaptation are driven by learning from sensory prediction errors, dominant models of speech adaptation argue that adaptation results from integrating time-advanced copies of corrective feedback commands into feedforward motor programs. Here, we tested these competing theories of speech adaptation by inducing planned, but not executed, speech. Human speakers (male and female) were prompted to speak a word and, on a subset of trials, were rapidly cued to withhold the prompted speech. On standard trials, speakers were exposed to real-time playback of their own speech with an auditory perturbation of the first formant to induce single-trial speech adaptation. Speakers experienced a similar sensory error on movement cancelation trials, hearing a perturbation applied to a recording of their speech from a previous trial at the time they would have spoken. Speakers adapted to auditory prediction errors in both contexts, altering the spectral content of spoken vowels to counteract formant perturbations even when no actual movement coincided with the perturbed feedback. These results build upon recent findings in reaching, and suggest that prediction errors, rather than corrective motor commands, drive adaptation in speech.},
}
RevDate: 2024-08-25
Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?.
Forensic science international, 363:112199 pii:S0379-0738(24)00280-9 [Epub ahead of print].
A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.
Additional Links: PMID-39182457
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39182457,
year = {2024},
author = {Chan, RKW and Wang, BX},
title = {Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?.},
journal = {Forensic science international},
volume = {363},
number = {},
pages = {112199},
doi = {10.1016/j.forsciint.2024.112199},
pmid = {39182457},
issn = {1872-6283},
abstract = {A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.},
}
RevDate: 2024-08-23
CmpDate: 2024-08-23
Automatic speech analysis for detecting cognitive decline of older adults.
Frontiers in public health, 12:1417966.
BACKGROUND: Speech analysis has been expected to help as a screening tool for early detection of Alzheimer's disease (AD) and mild-cognitively impairment (MCI). Acoustic features and linguistic features are usually used in speech analysis. However, no studies have yet determined which type of features provides better screening effectiveness, especially in the large aging population of China.
OBJECTIVE: Firstly, to compare the screening effectiveness of acoustic features, linguistic features, and their combination using the same dataset. Secondly, to develop Chinese automated diagnosis model using self-collected natural discourse data obtained from native Chinese speakers.
METHODS: A total of 92 participants from communities in Shanghai, completed MoCA-B and a picture description task based on the Cookie Theft under the guidance of trained operators, and were divided into three groups including AD, MCI, and heathy control (HC) based on their MoCA-B score. Acoustic features (Pitches, Jitter, Shimmer, MFCCs, Formants) and linguistic features (part-of-speech, type-token ratio, information words, information units) are extracted. The machine algorithms used in this study included logistic regression, random forest (RF), support vector machines (SVM), Gaussian Naive Bayesian (GNB), and k-Nearest neighbor (kNN). The validation accuracies of the same ML model using acoustic features, linguistic features, and their combination were compared.
RESULTS: The accuracy with linguistic features is generally higher than acoustic features in training. The highest accuracy to differentiate HC and AD is 80.77% achieved by SVM, based on all the features extracted from the speech data, while the highest accuracy to differentiate HC and AD or MCI is 80.43% achieved by RF, based only on linguistic features.
CONCLUSION: Our results suggest the utility and validity of linguistic features in the automated diagnosis of cognitive impairment, and validated the applicability of automated diagnosis for Chinese language data.
Additional Links: PMID-39175901
Full Text:
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39175901,
year = {2024},
author = {Huang, L and Yang, H and Che, Y and Yang, J},
title = {Automatic speech analysis for detecting cognitive decline of older adults.},
journal = {Frontiers in public health},
volume = {12},
number = {},
pages = {1417966},
doi = {10.3389/fpubh.2024.1417966},
pmid = {39175901},
issn = {2296-2565},
mesh = {Humans ; Aged ; Female ; Male ; *Cognitive Dysfunction/diagnosis ; China ; Alzheimer Disease/diagnosis ; Aged, 80 and over ; Speech ; Middle Aged ; Bayes Theorem ; Support Vector Machine ; Algorithms ; },
abstract = {BACKGROUND: Speech analysis has been expected to help as a screening tool for early detection of Alzheimer's disease (AD) and mild-cognitively impairment (MCI). Acoustic features and linguistic features are usually used in speech analysis. However, no studies have yet determined which type of features provides better screening effectiveness, especially in the large aging population of China.
OBJECTIVE: Firstly, to compare the screening effectiveness of acoustic features, linguistic features, and their combination using the same dataset. Secondly, to develop Chinese automated diagnosis model using self-collected natural discourse data obtained from native Chinese speakers.
METHODS: A total of 92 participants from communities in Shanghai, completed MoCA-B and a picture description task based on the Cookie Theft under the guidance of trained operators, and were divided into three groups including AD, MCI, and heathy control (HC) based on their MoCA-B score. Acoustic features (Pitches, Jitter, Shimmer, MFCCs, Formants) and linguistic features (part-of-speech, type-token ratio, information words, information units) are extracted. The machine algorithms used in this study included logistic regression, random forest (RF), support vector machines (SVM), Gaussian Naive Bayesian (GNB), and k-Nearest neighbor (kNN). The validation accuracies of the same ML model using acoustic features, linguistic features, and their combination were compared.
RESULTS: The accuracy with linguistic features is generally higher than acoustic features in training. The highest accuracy to differentiate HC and AD is 80.77% achieved by SVM, based on all the features extracted from the speech data, while the highest accuracy to differentiate HC and AD or MCI is 80.43% achieved by RF, based only on linguistic features.
CONCLUSION: Our results suggest the utility and validity of linguistic features in the automated diagnosis of cognitive impairment, and validated the applicability of automated diagnosis for Chinese language data.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Aged
Female
Male
*Cognitive Dysfunction/diagnosis
China
Alzheimer Disease/diagnosis
Aged, 80 and over
Speech
Middle Aged
Bayes Theorem
Support Vector Machine
Algorithms
RevDate: 2024-08-22
The effect of sexual orientation on voice acoustic properties.
Frontiers in psychology, 15:1412372.
INTRODUCTION: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses.
METHODS: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout.
RESULTS: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex.
DISCUSSION: Recommendations for future "natural voice" investigations are outlined.
Additional Links: PMID-39171236
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39171236,
year = {2024},
author = {Holmes, L and Rieger, G and Paulmann, S},
title = {The effect of sexual orientation on voice acoustic properties.},
journal = {Frontiers in psychology},
volume = {15},
number = {},
pages = {1412372},
pmid = {39171236},
issn = {1664-1078},
abstract = {INTRODUCTION: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses.
METHODS: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout.
RESULTS: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex.
DISCUSSION: Recommendations for future "natural voice" investigations are outlined.},
}
RevDate: 2024-08-02
Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.
Annals of the New York Academy of Sciences [Epub ahead of print].
Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures-such as jaw, tongue, lip, and soft palate movements-shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame-by-frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production.
Additional Links: PMID-39091036
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39091036,
year = {2024},
author = {Goncharova, M and Jadoul, Y and Reichmuth, C and Fitch, WT and Ravignani, A},
title = {Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.},
journal = {Annals of the New York Academy of Sciences},
volume = {},
number = {},
pages = {},
doi = {10.1111/nyas.15189},
pmid = {39091036},
issn = {1749-6632},
support = {Advanced Grant SOMACCA/ERC_/European Research Council/International ; (#W1262-B29)//Austrian Science Foundation Grant/ ; DNRF117//Danmarks Grundforskningsfond/ ; N00014-04-1-0284//Office of Naval Research/ ; Independent Max Planck Research Group Leader funding//Max-Planck-Gesellschaft/ ; },
abstract = {Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures-such as jaw, tongue, lip, and soft palate movements-shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame-by-frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production.},
}
RevDate: 2024-08-01
Close approximations to the sound of a cochlear implant.
Frontiers in human neuroscience, 18:1434786.
Cochlear implant (CI) systems differ in terms of electrode design and signal processing. It is likely that patients fit with different implant systems will experience different percepts when presented speech via their implant. The sound quality of speech can be evaluated by asking single-sided-deaf (SSD) listeners fit with a cochlear implant (CI) to modify clean signals presented to their typically hearing ear to match the sound quality of signals presented to their CI ear. In this paper, we describe very close matches to CI sound quality, i.e., similarity ratings of 9.5 to 10 on a 10-point scale, by ten patients fit with a 28 mm electrode array and MED EL signal processing. The modifications required to make close approximations to CI sound quality fell into two groups: One consisted of a restricted frequency bandwidth and spectral smearing while a second was characterized by a wide bandwidth and no spectral smearing. Both sets of modifications were different from those found for patients with shorter electrode arrays who chose upshifts in voice pitch and formant frequencies to match CI sound quality. The data from matching-based metrics of CI sound quality document that speech sound-quality differs for patients fit with different CIs and among patients fit with the same CI.
Additional Links: PMID-39086377
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39086377,
year = {2024},
author = {Dorman, MF and Natale, SC and Stohl, JS and Felder, J},
title = {Close approximations to the sound of a cochlear implant.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1434786},
pmid = {39086377},
issn = {1662-5161},
abstract = {Cochlear implant (CI) systems differ in terms of electrode design and signal processing. It is likely that patients fit with different implant systems will experience different percepts when presented speech via their implant. The sound quality of speech can be evaluated by asking single-sided-deaf (SSD) listeners fit with a cochlear implant (CI) to modify clean signals presented to their typically hearing ear to match the sound quality of signals presented to their CI ear. In this paper, we describe very close matches to CI sound quality, i.e., similarity ratings of 9.5 to 10 on a 10-point scale, by ten patients fit with a 28 mm electrode array and MED EL signal processing. The modifications required to make close approximations to CI sound quality fell into two groups: One consisted of a restricted frequency bandwidth and spectral smearing while a second was characterized by a wide bandwidth and no spectral smearing. Both sets of modifications were different from those found for patients with shorter electrode arrays who chose upshifts in voice pitch and formant frequencies to match CI sound quality. The data from matching-based metrics of CI sound quality document that speech sound-quality differs for patients fit with different CIs and among patients fit with the same CI.},
}
RevDate: 2024-07-26
Persistent post-concussion symptoms include neural auditory processing in young children.
Concussion (London, England), 9(1):CNC114.
AIM: Difficulty understanding speech following concussion is likely caused by auditory processing impairments. We hypothesized that concussion disrupts pitch and phonetic processing of a sound, cues in understanding a talker.
We obtained frequency following responses to a syllable from 120 concussed and 120 control. Encoding of the fundamental frequency (F0), a pitch cue and the first formant (F1), a phonetic cue, was poorer in concussed children. The F0 reduction was greater in the children assessed within 2 weeks of their injuries.
CONCLUSION: Concussions affect auditory processing. Results strengthen evidence of reduced F0 encoding in children with concussion and call for longitudinal study aimed at monitoring the recovery course with respect to the auditory system.
Additional Links: PMID-39056002
Full Text:
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39056002,
year = {2024},
author = {Bonacina, S and Krizman, J and Farley, J and Nicol, T and LaBella, CR and Kraus, N},
title = {Persistent post-concussion symptoms include neural auditory processing in young children.},
journal = {Concussion (London, England)},
volume = {9},
number = {1},
pages = {CNC114},
doi = {10.2217/cnc-2023-0013},
pmid = {39056002},
issn = {2056-3299},
abstract = {AIM: Difficulty understanding speech following concussion is likely caused by auditory processing impairments. We hypothesized that concussion disrupts pitch and phonetic processing of a sound, cues in understanding a talker.
We obtained frequency following responses to a syllable from 120 concussed and 120 control. Encoding of the fundamental frequency (F0), a pitch cue and the first formant (F1), a phonetic cue, was poorer in concussed children. The F0 reduction was greater in the children assessed within 2 weeks of their injuries.
CONCLUSION: Concussions affect auditory processing. Results strengthen evidence of reduced F0 encoding in children with concussion and call for longitudinal study aimed at monitoring the recovery course with respect to the auditory system.},
}
RevDate: 2024-07-19
Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.
bioRxiv : the preprint server for biology pii:2024.07.13.603344.
Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production and its possible relation to PSAM at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).
Additional Links: PMID-39026879
Full Text:
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39026879,
year = {2024},
author = {Li, JJ and Daliri, A and Kim, KS and Max, L},
title = {Does pre-speech auditory modulation reflect processes related to feedback monitoring or speech movement planning?.},
journal = {bioRxiv : the preprint server for biology},
volume = {},
number = {},
pages = {},
doi = {10.1101/2024.07.13.603344},
pmid = {39026879},
issn = {2692-8205},
abstract = {Previous studies have revealed that auditory processing is modulated during the planning phase immediately prior to speech onset. To date, the functional relevance of this pre-speech auditory modulation (PSAM) remains unknown. Here, we investigated whether PSAM reflects neuronal processes that are associated with preparing auditory cortex for optimized feedback monitoring as reflected in online speech corrections. Combining electroencephalographic PSAM data from a previous data set with new acoustic measures of the same participants' speech, we asked whether individual speakers' extent of PSAM is correlated with the implementation of within-vowel articulatory adjustments during /b/-vowel-/d/ word productions. Online articulatory adjustments were quantified as the extent of change in inter-trial formant variability from vowel onset to vowel midpoint (a phenomenon known as centering). This approach allowed us to also consider inter-trial variability in formant production and its possible relation to PSAM at vowel onset and midpoint separately. Results showed that inter-trial formant variability was significantly smaller at vowel midpoint than at vowel onset. PSAM was not significantly correlated with this amount of change in variability as an index of within-vowel adjustments. Surprisingly, PSAM was negatively correlated with inter-trial formant variability not only in the middle but also at the very onset of the vowels. Thus, speakers with more PSAM produced formants that were already less variable at vowel onset. Findings suggest that PSAM may reflect processes that influence speech acoustics as early as vowel onset and, thus, that are directly involved in motor command preparation (feedforward control) rather than output monitoring (feedback control).},
}
RevDate: 2024-07-17
Word and Gender Identification in the Speech of Transgender Individuals.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00178-4 [Epub ahead of print].
Listeners use speech to identify both linguistic information, such as the word being produced, and indexical attributes, such as the gender of the speaker. Previous research has shown that these two aspects of speech perception are interrelated. It is important to understand this relationship in the context of gender-affirming voice training (GAVT), where changes in speech production as part of a speaker's gender-affirming care could potentially influence listeners' recognition of the intended utterance. This study conducted a secondary analysis of data from an experiment in which trans women matched shifted targets for the second formant frequency using visual-acoustic biofeedback. Utterances were synthetically altered to feature a gender-ambiguous fundamental frequency and were presented to blinded listeners for rating on a visual analog scale representing the gender spectrum, as well as word identification in a forced-choice task. We found a statistically significant association between the accuracy of word identification and the gender rating of utterances. However, there was no statistically significant difference in word identification accuracy for the formant-shifted conditions relative to an unshifted condition. Overall, these results support previous research in finding that word identification and speaker gender identification are interrelated processes; however, the findings also suggest that a small magnitude of shift in formant frequencies (of the type that might be pursued in a GAVT context) does not have a significant negative impact on the perceptual recoverability of isolated words.
Additional Links: PMID-39019670
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid39019670,
year = {2024},
author = {Doyle, KA and Harel, D and Feeny, GT and Novak, VD and McAllister, T},
title = {Word and Gender Identification in the Speech of Transgender Individuals.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.06.007},
pmid = {39019670},
issn = {1873-4588},
abstract = {Listeners use speech to identify both linguistic information, such as the word being produced, and indexical attributes, such as the gender of the speaker. Previous research has shown that these two aspects of speech perception are interrelated. It is important to understand this relationship in the context of gender-affirming voice training (GAVT), where changes in speech production as part of a speaker's gender-affirming care could potentially influence listeners' recognition of the intended utterance. This study conducted a secondary analysis of data from an experiment in which trans women matched shifted targets for the second formant frequency using visual-acoustic biofeedback. Utterances were synthetically altered to feature a gender-ambiguous fundamental frequency and were presented to blinded listeners for rating on a visual analog scale representing the gender spectrum, as well as word identification in a forced-choice task. We found a statistically significant association between the accuracy of word identification and the gender rating of utterances. However, there was no statistically significant difference in word identification accuracy for the formant-shifted conditions relative to an unshifted condition. Overall, these results support previous research in finding that word identification and speaker gender identification are interrelated processes; however, the findings also suggest that a small magnitude of shift in formant frequencies (of the type that might be pursued in a GAVT context) does not have a significant negative impact on the perceptual recoverability of isolated words.},
}
RevDate: 2024-07-10
CmpDate: 2024-07-10
Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.
Dental press journal of orthodontics, 29(3):e2423277.
OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech.
MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison.
RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks.
CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.
Additional Links: PMID-38985077
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38985077,
year = {2024},
author = {Lorenzoni, DC and Henriques, JFC and Silva, LKD and Rosa, RR and Berretin-Felix, G and Freitas, KMS and Janson, G},
title = {Comparison of speech changes caused by four different orthodontic retainers: a crossover randomized clinical trial.},
journal = {Dental press journal of orthodontics},
volume = {29},
number = {3},
pages = {e2423277},
pmid = {38985077},
issn = {2177-6709},
mesh = {Humans ; *Orthodontic Retainers ; Female ; Male ; Adult ; *Cross-Over Studies ; Orthodontic Appliance Design ; Young Adult ; Speech/physiology ; },
abstract = {OBJECTIVE: This study aimed to compare the influence of four different maxillary removable orthodontic retainers on speech.
MATERIAL AND METHODS: Eligibility criteria for sample selection were: 20-40-year subjects with acceptable occlusion, native speakers of Portuguese. The volunteers (n=21) were divided in four groups randomized with a 1:1:1:1 allocation ratio. The four groups used, in random order, the four types of retainers full-time for 21 days each, with a washout period of 7-days. The removable maxillary retainers were: conventional wraparound, wraparound with an anterior hole, U-shaped wraparound, and thermoplastic retainer. Three volunteers were excluded. The final sample comprised 18 subjects (11 male; 7 female) with mean age of 27.08 years (SD=4.65). The speech evaluation was performed in vocal excerpts recordings made before, immediately after, and 21 days after the installation of each retainer, with auditory-perceptual and acoustic analysis of formant frequencies F1 and F2 of the vowels. Repeated measures ANOVA and Friedman with Tukey tests were used for statistical comparison.
RESULTS: Speech changes increased immediately after conventional wraparound and thermoplastic retainer installation, and reduced after 21 days, but not to normal levels. However, this increase was statistically significant only for the wraparound with anterior hole and the thermoplastic retainer. Formant frequencies of vowels were altered at initial time, and the changes remained in conventional, U-shaped and thermoplastic appliances after three weeks.
CONCLUSIONS: The thermoplastic retainer was more harmful to the speech than wraparound appliances. The conventional and U-shaped retainers interfered less in speech. The three-week period was not sufficient for speech adaptation.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Orthodontic Retainers
Female
Male
Adult
*Cross-Over Studies
Orthodontic Appliance Design
Young Adult
Speech/physiology
RevDate: 2024-07-09
Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices.
Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000540255 [Epub ahead of print].
INTRODUCTION: Benign and malignant vocal fold lesions are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aims to determine the important acoustic characteristics specific to benign and malignant vocal fold lesions.
METHODS: The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose and Throat Hospital between May 2020 and July 2021.The acoustic variables of the participants were analyzed using Principal Component Analysis to present important acoustic characteristics that are specific to normal vocal folds, benign vocal fold lesions, and malignant vocal fold lesions. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia.
RESULTS: Using the Principal Component Analysis method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion as a weighted factor in Principal Component Analysis was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch and coefficients of variation for formant dispersion were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia.
CONCLUSION: Experimental evidence demonstrates the utility of the Principal Component Analysis method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the Principal Component Analysis analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.
Additional Links: PMID-38981448
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38981448,
year = {2024},
author = {Liu, B and Lei, J and Wischhoff, OP and Smereka, KA and Jiang, JJ},
title = {Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices.},
journal = {Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP)},
volume = {},
number = {},
pages = {},
doi = {10.1159/000540255},
pmid = {38981448},
issn = {1421-9972},
abstract = {INTRODUCTION: Benign and malignant vocal fold lesions are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aims to determine the important acoustic characteristics specific to benign and malignant vocal fold lesions.
METHODS: The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose and Throat Hospital between May 2020 and July 2021.The acoustic variables of the participants were analyzed using Principal Component Analysis to present important acoustic characteristics that are specific to normal vocal folds, benign vocal fold lesions, and malignant vocal fold lesions. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia.
RESULTS: Using the Principal Component Analysis method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion as a weighted factor in Principal Component Analysis was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch and coefficients of variation for formant dispersion were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia.
CONCLUSION: Experimental evidence demonstrates the utility of the Principal Component Analysis method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the Principal Component Analysis analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.},
}
RevDate: 2024-07-01
Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion.
Scientific reports, 14(1):15029.
Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.
Additional Links: PMID-38951556
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38951556,
year = {2024},
author = {Fletcher, MD and Akis, E and Verschuur, CA and Perry, SW},
title = {Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {15029},
pmid = {38951556},
issn = {2045-2322},
support = {EP/W032422/1//Engineering and Physical Sciences Research Council/ ; EP/T517859/1//Engineering and Physical Sciences Research Council/ ; },
abstract = {Recent advances in haptic technology could allow haptic hearing aids, which convert audio to tactile stimulation, to become viable for supporting people with hearing loss. A tactile vocoder strategy for audio-to-tactile conversion, which exploits these advances, has recently shown significant promise. In this strategy, the amplitude envelope is extracted from several audio frequency bands and used to modulate the amplitude of a set of vibro-tactile tones. The vocoder strategy allows good consonant discrimination, but vowel discrimination is poor and the strategy is susceptible to background noise. In the current study, we assessed whether multi-band amplitude envelope expansion can effectively enhance critical vowel features, such as formants, and improve speech extraction from noise. In 32 participants with normal touch perception, tactile-only phoneme discrimination with and without envelope expansion was assessed both in quiet and in background noise. Envelope expansion improved performance in quiet by 10.3% for vowels and by 5.9% for consonants. In noise, envelope expansion improved overall phoneme discrimination by 9.6%, with no difference in benefit between consonants and vowels. The tactile vocoder with envelope expansion can be deployed in real-time on a compact device and could substantially improve clinical outcomes for a new generation of haptic hearing aids.},
}
RevDate: 2024-06-25
Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.
Cureus, 16(5):e60873.
Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.
Additional Links: PMID-38916010
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38916010,
year = {2024},
author = {Sahoo, AK and Sahoo, PK and Gupta, V and Behera, G and Sidam, S and Mishra, UP and Chavan, A and Binu, R and Gour, S and Velayutham, DK and Pooja, and Chatterjee, T and Pal, D},
title = {Assessment of Changes in the Quality of Voice in Post-thyroidectomy Patients With Intact Recurrent and Superior Laryngeal Nerve Function.},
journal = {Cureus},
volume = {16},
number = {5},
pages = {e60873},
pmid = {38916010},
issn = {2168-8184},
abstract = {Background Thyroidectomy is a routinely performed surgical procedure used to treat benign, malignant, and some hormonal disorders of the thyroid that are not responsive to medical therapy. Voice alterations following thyroid surgery are well-documented and often attributed to recurrent laryngeal nerve dysfunction. However, subtle changes in voice quality can persist despite anatomically intact laryngeal nerves. This study aimed to quantify post-thyroidectomy voice changes in patients with intact laryngeal nerves, focusing on fundamental frequency, first formant frequency, shimmer intensity, and maximum phonation duration. Methodology This cross-sectional study was conducted at a tertiary referral center in central India and focused on post-thyroidectomy patients with normal vocal cord function. Preoperative assessments included laryngeal endoscopy and voice recording using a computer program, with evaluations repeated at one and three months post-surgery. Patients with normal laryngeal endoscopic findings underwent voice analysis and provided feedback on subjective voice changes. The PRAAT version 6.2 software was utilized for voice analysis. Results The study included 41 patients with normal laryngoscopic findings after thyroid surgery, with the majority being female (85.4%) and the average age being 42.4 years. Hemithyroidectomy was performed in 41.4% of patients and total thyroidectomy in 58.6%, with eight patients undergoing central compartment neck dissection. Except for one patient, the majority reported no subjective change in voice following surgery. Objective voice analysis showed statistically significant changes in the one-month postoperative period compared to preoperative values, including a 5.87% decrease in fundamental frequency, a 1.37% decrease in shimmer intensity, and a 6.24% decrease in first formant frequency, along with a 4.35% decrease in maximum phonatory duration. These trends persisted at the three-month postoperative period, although values approached close to preoperative levels. Results revealed statistically significant alterations in voice parameters, particularly fundamental frequency and first formant frequency, with greater values observed in total thyroidectomy patients. Shimmer intensity also exhibited slight changes. Comparison between hemithyroidectomy and total thyroidectomy groups revealed no significant differences in fundamental frequency, first formant frequency, and shimmer. However, maximum phonation duration showed a significantly greater change in the hemithyroidectomy group at both one-month and three-month postoperative intervals. Conclusions This study on post-thyroidectomy patients with normal vocal cord movement revealed significant changes in voice parameters postoperatively, with most patients reporting no subjective voice changes. The findings highlight the importance of objective voice analysis in assessing post-thyroidectomy voice outcomes.},
}
RevDate: 2024-06-18
A Study on Voice Measures in Patients with Parkinson's Disease.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00168-1 [Epub ahead of print].
PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers.
METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features.
RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers.
CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.
Additional Links: PMID-38890016
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38890016,
year = {2024},
author = {Xiu, N and Li, W and Liu, L and Liu, Z and Cai, Z and Li, L and Vaxelaire, B and Sock, R and Ling, Z and Chen, J and Wang, Y},
title = {A Study on Voice Measures in Patients with Parkinson's Disease.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.05.018},
pmid = {38890016},
issn = {1873-4588},
abstract = {PURPOSE: This research aims to identify acoustic features which can distinguish patients with Parkinson's disease (PD patients) and healthy speakers.
METHODS: Thirty PD patients and 30 healthy speakers were recruited in the experiment, and their speech was collected, including three vowels (/i/, /a/, and /u/) and nine consonants (/p/, /pʰ/, /t/, /tʰ/, /k/, /kʰ/, /l/, /m/, and /n/). Acoustic features like fundamental frequency (F0), Jitter, Shimmer, harmonics-to-noise ratio (HNR), first formant (F1), second formant (F2), third formant (F3), first bandwidth (B1), second bandwidth (B2), third bandwidth (B3), voice onset, voice onset time were analyzed in our experiment. Two-sample independent t test and the nonparametric Mann-Whitney U (MWU) test were carried out alternatively to compare the acoustic measures between the PD patients and healthy speakers. In addition, after figuring out the effective acoustic features for distinguishing PD patients and healthy speakers, we adopted two methods to detect PD patients: (1) Built classifiers based on the effective acoustic features and (2) Trained support vector machine classifiers via the effective acoustic features.
RESULTS: Significant differences were found between the male PD group and the male health control in vowel /i/ (Jitter and Shimmer) and /a/ (Shimmer and HNR). Among female subjects, significant differences were observed in F0 standard deviation (F0 SD) of /u/ between the two groups. Additionally, significant differences between PD group and health control were also found in the F3 of /i/ and /n/, whereas other acoustic features showed no significant differences between the two groups. The HNR of vowel /a/ performed the best classification accuracy compared with the other six acoustic features above found to distinguish PD patients and healthy speakers.
CONCLUSIONS: PD can cause changes in the articulation and phonation of PD patients, wherein increases or decreases occur in some acoustic features. Therefore, the use of acoustic features to detect PD is expected to be a low-cost and large-scale diagnostic method.},
}
RevDate: 2024-06-16
Effects of testosterone on speech production and perception: Linking hormone levels in males to vocal cues and female voice attractiveness ratings.
Physiology & behavior pii:S0031-9384(24)00160-4 [Epub ahead of print].
This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.
Additional Links: PMID-38880296
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38880296,
year = {2024},
author = {Weirich, M and Simpson, AP and Knutti, N},
title = {Effects of testosterone on speech production and perception: Linking hormone levels in males to vocal cues and female voice attractiveness ratings.},
journal = {Physiology & behavior},
volume = {},
number = {},
pages = {114615},
doi = {10.1016/j.physbeh.2024.114615},
pmid = {38880296},
issn = {1873-507X},
abstract = {This study sets out to investigate the potential effect of males' testosterone level on speech production and speech perception. Regarding speech production, we investigate intra- and inter-individual variation in mean fundamental frequency (fo) and formant frequencies and highlight the potential interacting effect of another hormone, i.e. cortisol. In addition, we investigate the influence of different speech materials on the relationship between testosterone and speech production. Regarding speech perception, we investigate the potential effect of individual differences in males' testosterone level on ratings of attractiveness of female voices. In the production study, data is gathered from 30 healthy adult males ranging from 19 to 27 years (mean age: 22.4, SD: 2.2) who recorded their voices and provided saliva samples at 9 am, 12 noon and 3 pm on a single day. Speech material consists of sustained vowels, counting, read speech and a free description of pictures. Biological measures comprise speakers' height, grip strength, and hormone levels (testosterone and cortisol). In the perception study, participants were asked to rate the attractiveness of female voice stimuli (sentence stimulus, same-speaker pairs) that were manipulated in three steps regarding mean fo and formant frequencies. Regarding speech production, our results show that testosterone affected mean fo (but not formants) both within and between speakers. This relationship was weakened in speakers with high cortisol levels and depended on the speech material. Regarding speech perception, we found female stimuli with higher mean fo and formants to be rated as sounding more attractive than stimuli with lower mean fo and formants. Moreover, listeners with low testosterone showed an increased sensitivity to vocal cues of female attractiveness. While our results of the production study support earlier findings of a relationship between testosterone and mean fo in males (which is mediated by cortisol), they also highlight the relevance of the speech material: The effect of testosterone was strongest in sustained vowels, potentially due to a strengthened effect of hormones on physiologically strongly influenced tasks such as sustained vowels in contrast to more free speech tasks such as a picture description. The perception study is the first to show an effect of males' testosterone level on female attractiveness ratings using voice stimuli.},
}
RevDate: 2024-06-14
Detection of Suicide Risk Using Vocal Characteristics: Systematic Review.
JMIR biomedical engineering, 7(2):e42386 pii:v7i2e42386.
BACKGROUND: In an age when telehealth services are increasingly being used for forward triage, there is a need for accurate suicide risk detection. Vocal characteristics analyzed using artificial intelligence are now proving capable of detecting suicide risk with accuracies superior to traditional survey-based approaches, suggesting an efficient and economical approach to ensuring ongoing patient safety.
OBJECTIVE: This systematic review aimed to identify which vocal characteristics perform best at differentiating between patients with an elevated risk of suicide in comparison with other cohorts and identify the methodological specifications of the systems used to derive each feature and the accuracies of classification that result.
METHODS: A search of MEDLINE via Ovid, Scopus, Computers and Applied Science Complete, CADTH, Web of Science, ProQuest Dissertations and Theses A&I, Australian Policy Online, and Mednar was conducted between 1995 and 2020 and updated in 2021. The inclusion criteria were human participants with no language, age, or setting restrictions applied; randomized controlled studies, observational cohort studies, and theses; studies that used some measure of vocal quality; and individuals assessed as being at high risk of suicide compared with other individuals at lower risk using a validated measure of suicide risk. Risk of bias was assessed using the Risk of Bias in Non-randomized Studies tool. A random-effects model meta-analysis was used wherever mean measures of vocal quality were reported.
RESULTS: The search yielded 1074 unique citations, of which 30 (2.79%) were screened via full text. A total of 21 studies involving 1734 participants met all inclusion criteria. Most studies (15/21, 71%) sourced participants via either the Vanderbilt II database of recordings (8/21, 38%) or the Silverman and Silverman perceptual study recording database (7/21, 33%). Candidate vocal characteristics that performed best at differentiating between high risk of suicide and comparison cohorts included timing patterns of speech (median accuracy 95%), power spectral density sub-bands (median accuracy 90.3%), and mel-frequency cepstral coefficients (median accuracy 80%). A random-effects meta-analysis was used to compare 22 characteristics nested within 14% (3/21) of the studies, which demonstrated significant standardized mean differences for frequencies within the first and second formants (standardized mean difference ranged between -1.07 and -2.56) and jitter values (standardized mean difference=1.47). In 43% (9/21) of the studies, risk of bias was assessed as moderate, whereas in the remaining studies (12/21, 57%), the risk of bias was assessed as high.
CONCLUSIONS: Although several key methodological issues prevailed among the studies reviewed, there is promise in the use of vocal characteristics to detect elevations in suicide risk, particularly in novel settings such as telehealth or conversational agents.
TRIAL REGISTRATION: PROSPERO International Prospective Register of Systematic Reviews CRD420200167413; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020167413.
Additional Links: PMID-38875684
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38875684,
year = {2022},
author = {Iyer, R and Meyer, D},
title = {Detection of Suicide Risk Using Vocal Characteristics: Systematic Review.},
journal = {JMIR biomedical engineering},
volume = {7},
number = {2},
pages = {e42386},
doi = {10.2196/42386},
pmid = {38875684},
issn = {2561-3278},
abstract = {BACKGROUND: In an age when telehealth services are increasingly being used for forward triage, there is a need for accurate suicide risk detection. Vocal characteristics analyzed using artificial intelligence are now proving capable of detecting suicide risk with accuracies superior to traditional survey-based approaches, suggesting an efficient and economical approach to ensuring ongoing patient safety.
OBJECTIVE: This systematic review aimed to identify which vocal characteristics perform best at differentiating between patients with an elevated risk of suicide in comparison with other cohorts and identify the methodological specifications of the systems used to derive each feature and the accuracies of classification that result.
METHODS: A search of MEDLINE via Ovid, Scopus, Computers and Applied Science Complete, CADTH, Web of Science, ProQuest Dissertations and Theses A&I, Australian Policy Online, and Mednar was conducted between 1995 and 2020 and updated in 2021. The inclusion criteria were human participants with no language, age, or setting restrictions applied; randomized controlled studies, observational cohort studies, and theses; studies that used some measure of vocal quality; and individuals assessed as being at high risk of suicide compared with other individuals at lower risk using a validated measure of suicide risk. Risk of bias was assessed using the Risk of Bias in Non-randomized Studies tool. A random-effects model meta-analysis was used wherever mean measures of vocal quality were reported.
RESULTS: The search yielded 1074 unique citations, of which 30 (2.79%) were screened via full text. A total of 21 studies involving 1734 participants met all inclusion criteria. Most studies (15/21, 71%) sourced participants via either the Vanderbilt II database of recordings (8/21, 38%) or the Silverman and Silverman perceptual study recording database (7/21, 33%). Candidate vocal characteristics that performed best at differentiating between high risk of suicide and comparison cohorts included timing patterns of speech (median accuracy 95%), power spectral density sub-bands (median accuracy 90.3%), and mel-frequency cepstral coefficients (median accuracy 80%). A random-effects meta-analysis was used to compare 22 characteristics nested within 14% (3/21) of the studies, which demonstrated significant standardized mean differences for frequencies within the first and second formants (standardized mean difference ranged between -1.07 and -2.56) and jitter values (standardized mean difference=1.47). In 43% (9/21) of the studies, risk of bias was assessed as moderate, whereas in the remaining studies (12/21, 57%), the risk of bias was assessed as high.
CONCLUSIONS: Although several key methodological issues prevailed among the studies reviewed, there is promise in the use of vocal characteristics to detect elevations in suicide risk, particularly in novel settings such as telehealth or conversational agents.
TRIAL REGISTRATION: PROSPERO International Prospective Register of Systematic Reviews CRD420200167413; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020167413.},
}
RevDate: 2024-06-09
When time does not heal all wounds: three decades' experience of immigrants living in Sweden.
Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina, 21(2): [Epub ahead of print].
AIM: To investigate how immigrants from the Balkan region experienced their current life situation after living in Sweden for 30 years or more.
MATERIALS: The study was designed as a qualitative study using data from interviews with informants from five Balkan countries. The inclusion criteria were informants who were immigrants to Sweden and had lived in Sweden for more than 30 years. Five groups comprising sixteen informants were invited to participate in the study, and they all agreed.
RESULTS: The analysis of the interviews resulted in three main categories: "from someone to no one", "labour market", and "discrimination". All the informants reported that having an education and life experience was worth-less, having a life but having to start over, re-educating, applying for many jobs but often not being answered, and finally getting a job for which every in-formant was educated but being humiliated every day and treated separately as well as being discriminated against.
CONCLUSION: Coming to Sweden with all their problems, having an education and work experience that was equal to zero in Sweden, studying Swedish and re-reading/repeating all their education, looking for a job and not receiving answers to applications, and finally getting a job but being treated differently and discriminated against on a daily basis was experienced by all the in-formants as terrible. Even though there are enough similar studies in Sweden, it is always good to write more to help prospective immigrants and prospective employers in Sweden.
Additional Links: PMID-38852197
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38852197,
year = {2024},
author = {Krupić, F and Moravcova, M and Dervišević, E and Čustović, S and Grbić, K and Lindström, P},
title = {When time does not heal all wounds: three decades' experience of immigrants living in Sweden.},
journal = {Medicinski glasnik : official publication of the Medical Association of Zenica-Doboj Canton, Bosnia and Herzegovina},
volume = {21},
number = {2},
pages = {},
doi = {10.17392/1696-21-02},
pmid = {38852197},
issn = {1840-2445},
abstract = {AIM: To investigate how immigrants from the Balkan region experienced their current life situation after living in Sweden for 30 years or more.
MATERIALS: The study was designed as a qualitative study using data from interviews with informants from five Balkan countries. The inclusion criteria were informants who were immigrants to Sweden and had lived in Sweden for more than 30 years. Five groups comprising sixteen informants were invited to participate in the study, and they all agreed.
RESULTS: The analysis of the interviews resulted in three main categories: "from someone to no one", "labour market", and "discrimination". All the informants reported that having an education and life experience was worth-less, having a life but having to start over, re-educating, applying for many jobs but often not being answered, and finally getting a job for which every in-formant was educated but being humiliated every day and treated separately as well as being discriminated against.
CONCLUSION: Coming to Sweden with all their problems, having an education and work experience that was equal to zero in Sweden, studying Swedish and re-reading/repeating all their education, looking for a job and not receiving answers to applications, and finally getting a job but being treated differently and discriminated against on a daily basis was experienced by all the in-formants as terrible. Even though there are enough similar studies in Sweden, it is always good to write more to help prospective immigrants and prospective employers in Sweden.},
}
RevDate: 2024-06-07
Classification of phonation types in singing voice using wavelet scattering network-based features.
JASA express letters, 4(6):.
The automatic classification of phonation types in singing voice is essential for tasks such as identification of singing style. In this study, it is proposed to use wavelet scattering network (WSN)-based features for classification of phonation types in singing voice. WSN, which has a close similarity with auditory physiological models, generates acoustic features that greatly characterize the information related to pitch, formants, and timbre. Hence, the WSN-based features can effectively capture the discriminative information across phonation types in singing voice. The experimental results show that the proposed WSN-based features improved phonation classification accuracy by at least 9% compared to state-of-the-art features.
Additional Links: PMID-38847582
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38847582,
year = {2024},
author = {Mittapalle, KR and Alku, P},
title = {Classification of phonation types in singing voice using wavelet scattering network-based features.},
journal = {JASA express letters},
volume = {4},
number = {6},
pages = {},
doi = {10.1121/10.0026241},
pmid = {38847582},
issn = {2691-1191},
abstract = {The automatic classification of phonation types in singing voice is essential for tasks such as identification of singing style. In this study, it is proposed to use wavelet scattering network (WSN)-based features for classification of phonation types in singing voice. WSN, which has a close similarity with auditory physiological models, generates acoustic features that greatly characterize the information related to pitch, formants, and timbre. Hence, the WSN-based features can effectively capture the discriminative information across phonation types in singing voice. The experimental results show that the proposed WSN-based features improved phonation classification accuracy by at least 9% compared to state-of-the-art features.},
}
RevDate: 2024-06-06
Exposure to bilingual or monolingual maternal speech during pregnancy affects the neurophysiological encoding of speech sounds in neonates differently.
Frontiers in human neuroscience, 18:1379660.
INTRODUCTION: Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother's voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus' neural mechanisms underlying speech sound encoding.
METHODS: In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 129 healthy term neonates within 1 to 3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F0) and first formant (F1) frequencies of each vowel were, respectively, taken as measures of pitch and formant structure neural encoding.
RESULTS: Our results reveal that while spectral amplitudes at F0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F1 frequencies.
DISCUSSION: We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.
Additional Links: PMID-38841122
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38841122,
year = {2024},
author = {Gorina-Careta, N and Arenillas-Alcón, S and Puertollano, M and Mondéjar-Segovia, A and Ijjou-Kadiri, S and Costa-Faidella, J and Gómez-Roig, MD and Escera, C},
title = {Exposure to bilingual or monolingual maternal speech during pregnancy affects the neurophysiological encoding of speech sounds in neonates differently.},
journal = {Frontiers in human neuroscience},
volume = {18},
number = {},
pages = {1379660},
pmid = {38841122},
issn = {1662-5161},
abstract = {INTRODUCTION: Exposure to maternal speech during the prenatal period shapes speech perception and linguistic preferences, allowing neonates to recognize stories heard frequently in utero and demonstrating an enhanced preference for their mother's voice and native language. Yet, with a high prevalence of bilingualism worldwide, it remains an open question whether monolingual or bilingual maternal speech during pregnancy influence differently the fetus' neural mechanisms underlying speech sound encoding.
METHODS: In the present study, the frequency-following response (FFR), an auditory evoked potential that reflects the complex spectrotemporal dynamics of speech sounds, was recorded to a two-vowel /oa/ stimulus in a sample of 129 healthy term neonates within 1 to 3 days after birth. Newborns were divided into two groups according to maternal language usage during the last trimester of gestation (monolingual; bilingual). Spectral amplitudes and spectral signal-to-noise ratios (SNR) at the stimulus fundamental (F0) and first formant (F1) frequencies of each vowel were, respectively, taken as measures of pitch and formant structure neural encoding.
RESULTS: Our results reveal that while spectral amplitudes at F0 did not differ between groups, neonates from bilingual mothers exhibited a lower spectral SNR. Additionally, monolingually exposed neonates exhibited a higher spectral amplitude and SNR at F1 frequencies.
DISCUSSION: We interpret our results under the consideration that bilingual maternal speech, as compared to monolingual, is characterized by a greater complexity in the speech sound signal, rendering newborns from bilingual mothers more sensitive to a wider range of speech frequencies without generating a particularly strong response at any of them. Our results contribute to an expanding body of research indicating the influence of prenatal experiences on language acquisition and underscore the necessity of including prenatal language exposure in developmental studies on language acquisition, a variable often overlooked yet capable of influencing research outcomes.},
}
RevDate: 2024-05-31
Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender.
METHOD: Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models.
RESULTS: Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception.
CONCLUSIONS: The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.
Additional Links: PMID-38820240
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38820240,
year = {2024},
author = {Wu, HY},
title = {Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-17},
doi = {10.1044/2024_JSLHR-23-00288},
pmid = {38820240},
issn = {1558-9102},
abstract = {PURPOSE: This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender.
METHOD: Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models.
RESULTS: Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception.
CONCLUSIONS: The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.},
}
RevDate: 2024-05-24
Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00016-X [Epub ahead of print].
OBJECTIVE: To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender.
METHODS: Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment.
RESULTS: Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short.
CONCLUSIONS: The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.
Additional Links: PMID-38789366
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38789366,
year = {2024},
author = {Neuhaus, TJ and Scherer, RC and Whitfield, JA},
title = {Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.01.014},
pmid = {38789366},
issn = {1873-4588},
abstract = {OBJECTIVE: To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender.
METHODS: Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment.
RESULTS: Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short.
CONCLUSIONS: The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.},
}
RevDate: 2024-05-23
CmpDate: 2024-05-23
Male proboscis monkey cranionasal size and shape is associated with visual and acoustic signalling.
Scientific reports, 14(1):10715.
The large nose adorned by adult male proboscis monkeys is hypothesised to serve as an audiovisual signal of sexual selection. It serves as a visual signal of male quality and social status, and as an acoustic signal, through the expression of loud, low-formant nasalised calls in dense rainforests, where visibility is poor. However, it is unclear how the male proboscis monkey nasal complex, including the internal structure of the nose, plays a role in visual or acoustic signalling. Here, we use cranionasal data to assess whether large noses found in male proboscis monkeys serve visual and/or acoustic signalling functions. Our findings support a visual signalling function for male nasal enlargement through a relatively high degree of nasal aperture sexual size dimorphism, the craniofacial region to which nasal soft tissue attaches. We additionally find nasal aperture size increases beyond dental maturity among male proboscis monkeys, consistent with the visual signalling hypothesis. We show that the cranionasal region has an acoustic signalling role through pronounced nasal cavity sexual shape dimorphism, wherein male nasal cavity shape allows the expression of loud, low-formant nasalised calls. Our findings provide robust support for the male proboscis monkey nasal complex serving both visual and acoustic functions.
Additional Links: PMID-38782960
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38782960,
year = {2024},
author = {Balolia, KL and Fitzgerald, PL},
title = {Male proboscis monkey cranionasal size and shape is associated with visual and acoustic signalling.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {10715},
pmid = {38782960},
issn = {2045-2322},
mesh = {Animals ; Male ; *Sex Characteristics ; Nasal Cavity/anatomy & histology/physiology ; Nose/anatomy & histology ; Animal Communication ; Acoustics ; Skull/anatomy & histology ; Vocalization, Animal/physiology ; Female ; },
abstract = {The large nose adorned by adult male proboscis monkeys is hypothesised to serve as an audiovisual signal of sexual selection. It serves as a visual signal of male quality and social status, and as an acoustic signal, through the expression of loud, low-formant nasalised calls in dense rainforests, where visibility is poor. However, it is unclear how the male proboscis monkey nasal complex, including the internal structure of the nose, plays a role in visual or acoustic signalling. Here, we use cranionasal data to assess whether large noses found in male proboscis monkeys serve visual and/or acoustic signalling functions. Our findings support a visual signalling function for male nasal enlargement through a relatively high degree of nasal aperture sexual size dimorphism, the craniofacial region to which nasal soft tissue attaches. We additionally find nasal aperture size increases beyond dental maturity among male proboscis monkeys, consistent with the visual signalling hypothesis. We show that the cranionasal region has an acoustic signalling role through pronounced nasal cavity sexual shape dimorphism, wherein male nasal cavity shape allows the expression of loud, low-formant nasalised calls. Our findings provide robust support for the male proboscis monkey nasal complex serving both visual and acoustic functions.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Animals
Male
*Sex Characteristics
Nasal Cavity/anatomy & histology/physiology
Nose/anatomy & histology
Animal Communication
Acoustics
Skull/anatomy & histology
Vocalization, Animal/physiology
Female
RevDate: 2024-05-23
Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task.
Cognitive neuropsychology [Epub ahead of print].
How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.
Additional Links: PMID-38778635
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38778635,
year = {2024},
author = {Beach, SD and Niziolek, CA},
title = {Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task.},
journal = {Cognitive neuropsychology},
volume = {},
number = {},
pages = {1-19},
doi = {10.1080/02643294.2024.2315831},
pmid = {38778635},
issn = {1464-0627},
abstract = {How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.},
}
RevDate: 2024-05-16
Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00117-6 [Epub ahead of print].
UNLABELLED: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.
METHODS: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.
RESULTS: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.
CONCLUSION: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.
Additional Links: PMID-38755075
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38755075,
year = {2024},
author = {Aaen, M and Sadolin, C},
title = {Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.03.039},
pmid = {38755075},
issn = {1873-4588},
abstract = {UNLABELLED: Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions.
METHODS: Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments.
RESULTS: Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy.
CONCLUSION: Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.},
}
RevDate: 2024-05-16
The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study.
METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /ɛ/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains.
RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch.
CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.
Additional Links: PMID-38754028
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38754028,
year = {2024},
author = {Ning, LH and Hui, TC},
title = {The Accompanying Effect in Responses to Auditory Perturbations: Unconscious Vocal Adjustments to Unperturbed Parameters.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-21},
doi = {10.1044/2024_JSLHR-23-00543},
pmid = {38754028},
issn = {1558-9102},
abstract = {PURPOSE: The present study examined whether participants respond to unperturbed parameters while experiencing specific perturbations in auditory feedback. For instance, we aim to determine if speakers adjust voice loudness when only pitch is artificially altered in auditory feedback. This phenomenon is referred to as the "accompanying effect" in the present study.
METHOD: Thirty native Mandarin speakers were asked to sustain the vowel /ɛ/ for 3 s while their auditory feedback underwent single shifts in one of the three distinct ways: pitch shift (±100 cents; coded as PT), loudness shift (±6 dB; coded as LD), or first formant (F1) shift (±100 Hz; coded as FM). Participants were instructed to ignore the perturbations in their auditory feedback. Response types were categorized based on pitch, loudness, and F1 for each individual trial, such as Popp_Lopp_Fopp indicating opposing responses in all three domains.
RESULTS: The accompanying effect appeared 93% of the time. Bayesian Poisson regression models indicate that opposing responses in all three domains (Popp_Lopp_Fopp) were the most prevalent response type across the conditions (PT, LD, and FM). The more frequently used response types exhibited opposing responses and significantly larger response curves than the less frequently used response types. Following responses became more prevalent only when the perturbed stimuli were perceived as voices from someone else (external references), particularly in the FM condition. In terms of isotropy, loudness and F1 tended to change in the same direction rather than loudness and pitch.
CONCLUSION: The presence of the accompanying effect suggests that the motor systems responsible for regulating pitch, loudness, and formants are not entirely independent but rather interconnected to some degree.},
}
RevDate: 2024-05-14
Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022).
American journal of primatology [Epub ahead of print].
The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.
Additional Links: PMID-38741274
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38741274,
year = {2024},
author = {Ekström, AG},
title = {Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934-2022).},
journal = {American journal of primatology},
volume = {},
number = {},
pages = {e23637},
doi = {10.1002/ajp.23637},
pmid = {38741274},
issn = {1098-2345},
abstract = {The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934-2022) is considered at length, and two research papers-both purported challenges to Lieberman's theoretical work-and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would-be "speech-ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position-that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel-like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel-like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species-unique anatomy.},
}
RevDate: 2024-05-13
Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review.
Journal of thoracic disease, 16(4):2654-2667.
BACKGROUND AND OBJECTIVE: Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach.
METHODS: PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed.
KEY CONTENT AND FINDINGS: Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA.
CONCLUSIONS: Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.
Additional Links: PMID-38738242
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38738242,
year = {2024},
author = {Cao, S and Rosenzweig, I and Bilotta, F and Jiang, H and Xia, M},
title = {Automatic detection of obstructive sleep apnea based on speech or snoring sounds: a narrative review.},
journal = {Journal of thoracic disease},
volume = {16},
number = {4},
pages = {2654-2667},
pmid = {38738242},
issn = {2072-1439},
abstract = {BACKGROUND AND OBJECTIVE: Obstructive sleep apnea (OSA) is a common chronic disorder characterized by repeated breathing pauses during sleep caused by upper airway narrowing or collapse. The gold standard for OSA diagnosis is the polysomnography test, which is time consuming, expensive, and invasive. In recent years, more cost-effective approaches for OSA detection based in predictive value of speech and snoring has emerged. In this paper, we offer a comprehensive summary of current research progress on the applications of speech or snoring sounds for the automatic detection of OSA and discuss the key challenges that need to be overcome for future research into this novel approach.
METHODS: PubMed, IEEE Xplore, and Web of Science databases were searched with related keywords. Literature published between 1989 and 2022 examining the potential of using speech or snoring sounds for automated OSA detection was reviewed.
KEY CONTENT AND FINDINGS: Speech and snoring sounds contain a large amount of information about OSA, and they have been extensively studied in the automatic screening of OSA. By importing features extracted from speech and snoring sounds into artificial intelligence models, clinicians can automatically screen for OSA. Features such as formant, linear prediction cepstral coefficients, mel-frequency cepstral coefficients, and artificial intelligence algorithms including support vector machines, Gaussian mixture model, and hidden Markov models have been extensively studied for the detection of OSA.
CONCLUSIONS: Due to the significant advantages of noninvasive, low-cost, and contactless data collection, an automatic approach based on speech or snoring sounds seems to be a promising tool for the detection of OSA.},
}
RevDate: 2024-05-08
CmpDate: 2024-05-08
Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.
The Journal of the Acoustical Society of America, 155(5):3071-3089.
This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.
Additional Links: PMID-38717213
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38717213,
year = {2024},
author = {Feng, H and Wang, L},
title = {Acoustic analysis of English tense and lax vowels: Comparing the production between Mandarin Chinese learners and native English speakers.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {5},
pages = {3071-3089},
doi = {10.1121/10.0025931},
pmid = {38717213},
issn = {1520-8524},
mesh = {Humans ; *Speech Acoustics ; *Phonetics ; Male ; Female ; *Multilingualism ; Young Adult ; Speech Production Measurement ; Adult ; Language ; Acoustics ; Learning ; Voice Quality ; Sound Spectrography ; East Asian People ; },
abstract = {This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
*Speech Acoustics
*Phonetics
Male
Female
*Multilingualism
Young Adult
Speech Production Measurement
Adult
Language
Acoustics
Learning
Voice Quality
Sound Spectrography
East Asian People
RevDate: 2024-05-07
CmpDate: 2024-05-07
No evidence that averaging voices influences attractiveness.
Scientific reports, 14(1):10488.
Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.
Additional Links: PMID-38714709
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38714709,
year = {2024},
author = {Ostrega, J and Shiramizu, V and Lee, AJ and Jones, BC and Feinberg, DR},
title = {No evidence that averaging voices influences attractiveness.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {10488},
pmid = {38714709},
issn = {2045-2322},
support = {EP/T023783/1//Engineering and Physical Sciences Research Council/ ; RGPIN-2023-05146//Natural Sciences and Engineering Research Council of Canada/ ; },
mesh = {Humans ; Male ; Female ; *Voice/physiology ; Adult ; Young Adult ; *Beauty ; Judgment/physiology ; Adolescent ; },
abstract = {Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.},
}
MeSH Terms:
show MeSH Terms
hide MeSH Terms
Humans
Male
Female
*Voice/physiology
Adult
Young Adult
*Beauty
Judgment/physiology
Adolescent
RevDate: 2024-05-04
Long-term Acoustic Effects of Gender-Affirming Voice Training in Transgender Women.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00123-1 [Epub ahead of print].
OBJECTIVES: One role of a speech-language pathologist (SLP) is to help transgender clients in developing a healthy, gender-congruent communication. Transgender women frequently approach SLPs to train their voices to sound more feminine, however, long-term acoustic effects of the training needs to be rigorously examined in effectiveness studies. The aim of this study was to investigate the long-term effects (follow-up 1: 3months and follow-up 2: 1year after last session) of gender-affirming voice training for transgender women, in terms of acoustic parameters.
STUDY DESIGN: This study was a randomized sham-controlled trial with a cross-over design.
METHODS: Twenty-six transgender women were included for follow-up 1 and 18 for follow-up 2. All participants received 14weeks of gender-affirming voice training (4weeks sham training, 10weeks of voice feminization training: 5weeks pitch elevation training and 5weeks articulation-resonance training), but in a different order. Speech samples were recorded with Praat at four different time points (pre, post, follow-up 1, follow-up 2). Acoustic analysis included fo of sustained vowel /a:/, reading and spontaneous speech. Formant frequencies (F1-F2-F3) of vowels /a/, /i/, and /u/ were determined and vowel space was calculated. A linear mixed model was used to compare the acoustic voice measurements between measurements (pre - post, pre - follow-up 1, pre - follow-up 2, post - follow-up 1, post - follow-up 2, follow-up 1 - follow-up 2).
RESULTS: Most of the fo measurements and formant frequencies that increased immediately after the intervention, were stable at both follow-up measurements. The median fo during the sustained vowel, reading and spontaneous speech stayed increased at both follow-ups compared to the pre-measurement. However, a decrease of 16 Hz/1.7 ST (reading) and 12 Hz/1.5 ST (spontaneous speech) was detected between the post-measurement (169 Hz for reading, 144 Hz for spontaneous speech) and 1year after the last session (153 Hz and 132 Hz, respectively). The lower limit of fo did not change during reading and spontaneous speech, both directly after the intervention and during both follow-ups. F1-2 of vowel /a/ and the vowel space increased after the intervention and both follow-ups. Individual analyses showed that more aspects should be controlled after the intervention, such as exercises that were performed at home, or the duration of extra gender-affirming voice training sessions.
CONCLUSIONS: After 10 sessions of voice feminization training and follow-up measurements after 3months and 1year, stable increases were found for some formant frequencies and fo measurements, but not all of them. More time should be spent on increasing the fifth percentile of fo, as the lower limit of fo also contributes to the perception of more feminine voice.
Additional Links: PMID-38704279
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38704279,
year = {2024},
author = {Leyns, C and Adriaansen, A and Daelman, J and Bostyn, L and Meerschman, I and T'Sjoen, G and D'haeseleer, E},
title = {Long-term Acoustic Effects of Gender-Affirming Voice Training in Transgender Women.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.04.007},
pmid = {38704279},
issn = {1873-4588},
abstract = {OBJECTIVES: One role of a speech-language pathologist (SLP) is to help transgender clients in developing a healthy, gender-congruent communication. Transgender women frequently approach SLPs to train their voices to sound more feminine, however, long-term acoustic effects of the training needs to be rigorously examined in effectiveness studies. The aim of this study was to investigate the long-term effects (follow-up 1: 3months and follow-up 2: 1year after last session) of gender-affirming voice training for transgender women, in terms of acoustic parameters.
STUDY DESIGN: This study was a randomized sham-controlled trial with a cross-over design.
METHODS: Twenty-six transgender women were included for follow-up 1 and 18 for follow-up 2. All participants received 14weeks of gender-affirming voice training (4weeks sham training, 10weeks of voice feminization training: 5weeks pitch elevation training and 5weeks articulation-resonance training), but in a different order. Speech samples were recorded with Praat at four different time points (pre, post, follow-up 1, follow-up 2). Acoustic analysis included fo of sustained vowel /a:/, reading and spontaneous speech. Formant frequencies (F1-F2-F3) of vowels /a/, /i/, and /u/ were determined and vowel space was calculated. A linear mixed model was used to compare the acoustic voice measurements between measurements (pre - post, pre - follow-up 1, pre - follow-up 2, post - follow-up 1, post - follow-up 2, follow-up 1 - follow-up 2).
RESULTS: Most of the fo measurements and formant frequencies that increased immediately after the intervention, were stable at both follow-up measurements. The median fo during the sustained vowel, reading and spontaneous speech stayed increased at both follow-ups compared to the pre-measurement. However, a decrease of 16 Hz/1.7 ST (reading) and 12 Hz/1.5 ST (spontaneous speech) was detected between the post-measurement (169 Hz for reading, 144 Hz for spontaneous speech) and 1year after the last session (153 Hz and 132 Hz, respectively). The lower limit of fo did not change during reading and spontaneous speech, both directly after the intervention and during both follow-ups. F1-2 of vowel /a/ and the vowel space increased after the intervention and both follow-ups. Individual analyses showed that more aspects should be controlled after the intervention, such as exercises that were performed at home, or the duration of extra gender-affirming voice training sessions.
CONCLUSIONS: After 10 sessions of voice feminization training and follow-up measurements after 3months and 1year, stable increases were found for some formant frequencies and fo measurements, but not all of them. More time should be spent on increasing the fifth percentile of fo, as the lower limit of fo also contributes to the perception of more feminine voice.},
}
RevDate: 2024-05-02
Acoustic and Articulatory Visual Feedback in Classroom L2 Vowel Remediation.
Language and speech [Epub ahead of print].
This paper presents L2 vowel remediation in a classroom setting via two real-time visual feedback methods: articulatory ultrasound tongue imaging, which shows tongue shape and position, and a newly developed acoustic formant analyzer, which visualizes a point correlating with the combined effect of tongue position and lip rounding in a vowel quadrilateral. Ten Czech students of the Swedish language participated in the study. Swedish vowel production is difficult for Czech speakers since the languages differ significantly in their vowel systems. The students selected the vowel targets on their own and practiced in two classroom groups, with six students receiving two ultrasound training lessons, followed by one acoustic, and four students receiving two acoustic lessons, followed by one ultrasound. Audio data were collected pre-training, after the two sessions employing the first visual feedback method, and at post-training, allowing measuring Euclidean distance among selected groups of vowels and observing the direction of change within the vowel quadrilateral as a result of practice. Perception tests were performed before and after training, revealing that most learners perceived selected vowels correctly already before the practice. The study showed that both feedback methods can be successfully applied to L2 classroom learning, and both lead to the improvement in the pronunciation of the selected vowels, as well as the Swedish vowel set as a whole. However, ultrasound tongue imaging seems to have an advantage as it resulted in a greater number of improved targets.
Additional Links: PMID-38693788
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38693788,
year = {2024},
author = {Kocjančič, T and Bořil, T and Hofmann, S},
title = {Acoustic and Articulatory Visual Feedback in Classroom L2 Vowel Remediation.},
journal = {Language and speech},
volume = {},
number = {},
pages = {238309231223736},
doi = {10.1177/00238309231223736},
pmid = {38693788},
issn = {1756-6053},
abstract = {This paper presents L2 vowel remediation in a classroom setting via two real-time visual feedback methods: articulatory ultrasound tongue imaging, which shows tongue shape and position, and a newly developed acoustic formant analyzer, which visualizes a point correlating with the combined effect of tongue position and lip rounding in a vowel quadrilateral. Ten Czech students of the Swedish language participated in the study. Swedish vowel production is difficult for Czech speakers since the languages differ significantly in their vowel systems. The students selected the vowel targets on their own and practiced in two classroom groups, with six students receiving two ultrasound training lessons, followed by one acoustic, and four students receiving two acoustic lessons, followed by one ultrasound. Audio data were collected pre-training, after the two sessions employing the first visual feedback method, and at post-training, allowing measuring Euclidean distance among selected groups of vowels and observing the direction of change within the vowel quadrilateral as a result of practice. Perception tests were performed before and after training, revealing that most learners perceived selected vowels correctly already before the practice. The study showed that both feedback methods can be successfully applied to L2 classroom learning, and both lead to the improvement in the pronunciation of the selected vowels, as well as the Swedish vowel set as a whole. However, ultrasound tongue imaging seems to have an advantage as it resulted in a greater number of improved targets.},
}
RevDate: 2024-04-24
Spectral features related to the auditory perception of twang-like voices.
Logopedics, phoniatrics, vocology [Epub ahead of print].
BACKGROUND: To the best of our knowledge, studies on the relationship between spectral energy distribution and the degree of perceived twang-like voices are still sparse. Through an auditory-perceptual test we aimed to explore the spectral features that may relate with the auditory-perception of twang-like voices.
METHODS: Ten judges who were blind to the test's tasks and stimuli rated the amount of twang perceived on seventy-six audio samples. The stimuli consisted of twenty voices recorded from eight CCM singers who sustained the vowel [a:] in different pitches, with and without a twang-like voice. Also, forty filtered and sixteen synthesized-manipulated stimuli were included.
RESULTS AND CONCLUSIONS: Based on the intra-rater reliability scores, four judges were identified as suitable to be included in the analyses. Results showed that the frequency of F1 and F2 correlated strongly with the auditory-perception of twang-like voices (0.90 and 0.74, respectively), whereas F3 showed a moderate negative correlation (-0.52). The frequency difference between F1 and F3 showed a strong negative correlation (-0.82). The mean energy between 1-2 kHz and 2-3 kHz correlated moderately (0.51 and 0.42, respectively). The frequency of F4 and F5, and the energy above 3 kHz showed weak correlations. Since the spectral changes under 2 kHz have been associated with the jaw, lips, and tongue adjustments (i.e. vowel articulation) and a higher vertical laryngeal position might affect the frequency of all formants (including F1 and F2), our results suggest that vowel articulation and the laryngeal height may be relevant when performing twang-like voices.
Additional Links: PMID-38656176
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38656176,
year = {2024},
author = {Saldías O'Hrens, M and Castro, C and Espinoza, VM and Stoney, J and Quezada, C and Laukkanen, AM},
title = {Spectral features related to the auditory perception of twang-like voices.},
journal = {Logopedics, phoniatrics, vocology},
volume = {},
number = {},
pages = {1-18},
doi = {10.1080/14015439.2024.2345373},
pmid = {38656176},
issn = {1651-2022},
abstract = {BACKGROUND: To the best of our knowledge, studies on the relationship between spectral energy distribution and the degree of perceived twang-like voices are still sparse. Through an auditory-perceptual test we aimed to explore the spectral features that may relate with the auditory-perception of twang-like voices.
METHODS: Ten judges who were blind to the test's tasks and stimuli rated the amount of twang perceived on seventy-six audio samples. The stimuli consisted of twenty voices recorded from eight CCM singers who sustained the vowel [a:] in different pitches, with and without a twang-like voice. Also, forty filtered and sixteen synthesized-manipulated stimuli were included.
RESULTS AND CONCLUSIONS: Based on the intra-rater reliability scores, four judges were identified as suitable to be included in the analyses. Results showed that the frequency of F1 and F2 correlated strongly with the auditory-perception of twang-like voices (0.90 and 0.74, respectively), whereas F3 showed a moderate negative correlation (-0.52). The frequency difference between F1 and F3 showed a strong negative correlation (-0.82). The mean energy between 1-2 kHz and 2-3 kHz correlated moderately (0.51 and 0.42, respectively). The frequency of F4 and F5, and the energy above 3 kHz showed weak correlations. Since the spectral changes under 2 kHz have been associated with the jaw, lips, and tongue adjustments (i.e. vowel articulation) and a higher vertical laryngeal position might affect the frequency of all formants (including F1 and F2), our results suggest that vowel articulation and the laryngeal height may be relevant when performing twang-like voices.},
}
RevDate: 2024-04-21
A Comparison of Countertenor Singing at Various Professional Levels Using Acoustic, Electroglottographic, and Videofluoroscopic Methods.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00111-5 [Epub ahead of print].
INTRODUCTION: The vocal characteristics of countertenors (CTTs) are poorly understood due to a lack of studies in this field. This study aims to explore differences among CTTs at various professional levels, examining both disparities and congruences in singing styles to better understand the CTT voice.
MATERIALS AND METHODS: Four CTTs (one student, one amateur, and two professionals) sang "La giustizia ha già sull'arco" from Handel's Giulio Cesare, with concurrent videofluoroscopic, electroglottography (EGG), and acoustic data collection. Auditory-perceptual analysis was employed to rate professional level. Acoustic analysis included LH1-LH2, formant cluster prominence, and vibrato analysis. EGG data was analyzed using FonaDyn software, while anatomical modifications were quantified using videofluoroscopic images.
RESULTS: CTTs exhibited EGG contact quotient values surpassing typical levels for inexperienced falsettos. Their vibrato characteristics aligned with expectations for classical singing, whereas the presence of the singer's formant was not observed. Variations in supraglottic adjustments among CTTs underscored the diversity of techniques employed by CTT singers.
CONCLUSIONS: CTTs exhibited vocal techniques that highlighted the influence of individual preferences, professional experience, and stylistic choices in shaping their singing characteristics. The data revealed discernible differences between professional and amateur CTTs, providing insights into the impact of varying levels of experience on vocal expression.
Additional Links: PMID-38644071
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38644071,
year = {2024},
author = {Cruz, TLB and Frič, M and Andrade, PA},
title = {A Comparison of Countertenor Singing at Various Professional Levels Using Acoustic, Electroglottographic, and Videofluoroscopic Methods.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.03.033},
pmid = {38644071},
issn = {1873-4588},
abstract = {INTRODUCTION: The vocal characteristics of countertenors (CTTs) are poorly understood due to a lack of studies in this field. This study aims to explore differences among CTTs at various professional levels, examining both disparities and congruences in singing styles to better understand the CTT voice.
MATERIALS AND METHODS: Four CTTs (one student, one amateur, and two professionals) sang "La giustizia ha già sull'arco" from Handel's Giulio Cesare, with concurrent videofluoroscopic, electroglottography (EGG), and acoustic data collection. Auditory-perceptual analysis was employed to rate professional level. Acoustic analysis included LH1-LH2, formant cluster prominence, and vibrato analysis. EGG data was analyzed using FonaDyn software, while anatomical modifications were quantified using videofluoroscopic images.
RESULTS: CTTs exhibited EGG contact quotient values surpassing typical levels for inexperienced falsettos. Their vibrato characteristics aligned with expectations for classical singing, whereas the presence of the singer's formant was not observed. Variations in supraglottic adjustments among CTTs underscored the diversity of techniques employed by CTT singers.
CONCLUSIONS: CTTs exhibited vocal techniques that highlighted the influence of individual preferences, professional experience, and stylistic choices in shaping their singing characteristics. The data revealed discernible differences between professional and amateur CTTs, providing insights into the impact of varying levels of experience on vocal expression.},
}
RevDate: 2024-04-17
Acoustic, phonetic, and phonological features of Drehu vowels.
The Journal of the Acoustical Society of America, 155(4):2612-2626.
This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.
Additional Links: PMID-38629882
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38629882,
year = {2024},
author = {Torres, C and Li, W and Escudero, P},
title = {Acoustic, phonetic, and phonological features of Drehu vowels.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {4},
pages = {2612-2626},
doi = {10.1121/10.0025538},
pmid = {38629882},
issn = {1520-8524},
abstract = {This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified.},
}
RevDate: 2024-04-02
Perceptual formant discrimination during speech movement planning.
PloS one, 19(4):e0301514 pii:PONE-D-23-34985.
Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.
Additional Links: PMID-38564597
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38564597,
year = {2024},
author = {Wang, H and Ali, Y and Max, L},
title = {Perceptual formant discrimination during speech movement planning.},
journal = {PloS one},
volume = {19},
number = {4},
pages = {e0301514},
doi = {10.1371/journal.pone.0301514},
pmid = {38564597},
issn = {1932-6203},
abstract = {Evoked potential studies have shown that speech planning modulates auditory cortical responses. The phenomenon's functional relevance is unknown. We tested whether, during this time window of cortical auditory modulation, there is an effect on speakers' perceptual sensitivity for vowel formant discrimination. Participants made same/different judgments for pairs of stimuli consisting of a pre-recorded, self-produced vowel and a formant-shifted version of the same production. Stimuli were presented prior to a "go" signal for speaking, prior to passive listening, and during silent reading. The formant discrimination stimulus /uh/ was tested with a congruent productions list (words with /uh/) and an incongruent productions list (words without /uh/). Logistic curves were fitted to participants' responses, and the just-noticeable difference (JND) served as a measure of discrimination sensitivity. We found a statistically significant effect of condition (worst discrimination before speaking) without congruency effect. Post-hoc pairwise comparisons revealed that JND was significantly greater before speaking than during silent reading. Thus, formant discrimination sensitivity was reduced during speech planning regardless of the congruence between discrimination stimulus and predicted acoustic consequences of the planned speech movements. This finding may inform ongoing efforts to determine the functional relevance of the previously reported modulation of auditory processing during speech planning.},
}
RevDate: 2024-04-01
Articulatory and acoustic dynamics of fronted back vowels in American English.
The Journal of the Acoustical Society of America, 155(4):2285-2301.
Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.
Additional Links: PMID-38557735
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38557735,
year = {2024},
author = {Havenhill, J},
title = {Articulatory and acoustic dynamics of fronted back vowels in American English.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {4},
pages = {2285-2301},
doi = {10.1121/10.0025461},
pmid = {38557735},
issn = {1520-8524},
abstract = {Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda.},
}
RevDate: 2024-03-26
ChildAugment: Data augmentation methods for zero-resource children's speaker verification.
The Journal of the Acoustical Society of America, 155(3):2221-2232.
The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.
Additional Links: PMID-38530014
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38530014,
year = {2024},
author = {Singh, VP and Sahidullah, M and Kinnunen, T},
title = {ChildAugment: Data augmentation methods for zero-resource children's speaker verification.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {3},
pages = {2221-2232},
doi = {10.1121/10.0025178},
pmid = {38530014},
issn = {1520-8524},
abstract = {The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.},
}
RevDate: 2024-03-19
Gender-Affirming Voice Training for Trans Women: Acoustic Outcomes and Their Associations With Listener Perceptions Related to Gender.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00023-7 [Epub ahead of print].
OBJECTIVES: To investigate acoustic outcomes of gender-affirming voice training for trans women wanting to develop a female sounding voice and to describe what happens acoustically when male sounding voices become more female sounding.
STUDY DESIGN: Prospective treatment study with repeated measures.
METHODS: N = 74 trans women completed a voice training program of 8-12 sessions and had their voices audio recorded twice before and twice after training. Reference data were obtained from N = 40 cisgender speakers. Fundamental frequency (fo), formant frequencies (F1-F4), sound pressure level (Leq), and level difference between first and second harmonic (L1-L2) were extracted from a reading passage and spontaneous speech. N = 79 naive listeners provided gender-related ratings of participants' audio recordings. A linear mixed-effects model was used to estimate average training effects. Individual level analyses determined how changes in acoustic data were related to listeners' ratings.
RESULTS: Group data showed substantial training effects on fo (average, minimum, and maximum) and formant frequencies. Individual data demonstrated that many participants also increased Leq and some increased L1-L2. Measures that most strongly predicted listener ratings of a female sounding voice were: fo, average formant frequency, and Leq.
CONCLUSIONS: This is the largest prospective study reporting on acoustic outcomes of gender-affirming voice training for trans women. We confirm findings from previous smaller scale studies by demonstrating that listener perceptions of male and female sounding voices are related to acoustic voice features, and that voice training for trans women wanting to sound female is associated with desirable acoustic changes, indicating training effectiveness. Although acoustic measures can be a valuable indicator of training effectiveness, particularly from the perspective of clinicians and researchers, we contend that a combination of outcome measures, including client perspectives, are needed to provide comprehensive evaluation of gender-affirming voice training that is relevant for all stakeholders.
Additional Links: PMID-38503674
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38503674,
year = {2024},
author = {Södersten, M and Oates, J and Sand, A and Granqvist, S and Quinn, S and Dacakis, G and Nygren, U},
title = {Gender-Affirming Voice Training for Trans Women: Acoustic Outcomes and Their Associations With Listener Perceptions Related to Gender.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.02.003},
pmid = {38503674},
issn = {1873-4588},
abstract = {OBJECTIVES: To investigate acoustic outcomes of gender-affirming voice training for trans women wanting to develop a female sounding voice and to describe what happens acoustically when male sounding voices become more female sounding.
STUDY DESIGN: Prospective treatment study with repeated measures.
METHODS: N = 74 trans women completed a voice training program of 8-12 sessions and had their voices audio recorded twice before and twice after training. Reference data were obtained from N = 40 cisgender speakers. Fundamental frequency (fo), formant frequencies (F1-F4), sound pressure level (Leq), and level difference between first and second harmonic (L1-L2) were extracted from a reading passage and spontaneous speech. N = 79 naive listeners provided gender-related ratings of participants' audio recordings. A linear mixed-effects model was used to estimate average training effects. Individual level analyses determined how changes in acoustic data were related to listeners' ratings.
RESULTS: Group data showed substantial training effects on fo (average, minimum, and maximum) and formant frequencies. Individual data demonstrated that many participants also increased Leq and some increased L1-L2. Measures that most strongly predicted listener ratings of a female sounding voice were: fo, average formant frequency, and Leq.
CONCLUSIONS: This is the largest prospective study reporting on acoustic outcomes of gender-affirming voice training for trans women. We confirm findings from previous smaller scale studies by demonstrating that listener perceptions of male and female sounding voices are related to acoustic voice features, and that voice training for trans women wanting to sound female is associated with desirable acoustic changes, indicating training effectiveness. Although acoustic measures can be a valuable indicator of training effectiveness, particularly from the perspective of clinicians and researchers, we contend that a combination of outcome measures, including client perspectives, are needed to provide comprehensive evaluation of gender-affirming voice training that is relevant for all stakeholders.},
}
RevDate: 2024-03-19
Clinical Focus: The Development and Description of a Palette of Transmasculine Voices.
American journal of speech-language pathology [Epub ahead of print].
PURPOSE: The study of gender and speech has historically excluded studies of transmasculine individuals. Consequently, generalizations about speech and gender are based on cisgender individuals. This lack of representation hinders clinical training and clinical service delivery, particularly by speech-language pathologists providing gender-affirming communication services. This letter describes a new corpus of the speech of American English-speaking transmasculine men, transmasculine nonbinary people, and cisgender men that is open and available to clinicians and researchers.
METHOD: Twenty masculine-presenting native English speakers from the Upper Midwestern United States (including cisgender men, transmasculine men, and transmasculine nonbinary people) were recorded, producing three sets of speech materials: Consensus Auditory-Perceptual Evaluation of Voice sentences, the Rainbow Passage, and a novel set of sentences developed for this project. Acoustic measures vowels (overall formant frequency scaling, vowel-space dispersion, fundamental frequency, breathiness), consonants (voice onset time of word-initial voiceless stops, spectral moments of word-initial /s/), and the entire sentence (rate of speech) that were made.
RESULTS: The acoustic measures reveal a wide range for all dependent measures and low correlations among the measures. Results show that many of the voices depart considerably from the norms for men's speech in published studies.
CONCLUSION: This new corpus can be used to illustrate different ways of sounding masculine by speech-language pathologists performing gender-affirming communication services and by higher education teachers as examples of diverse ways of sounding masculine.
Additional Links: PMID-38501906
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38501906,
year = {2024},
author = {Dolquist, DV and Munson, B},
title = {Clinical Focus: The Development and Description of a Palette of Transmasculine Voices.},
journal = {American journal of speech-language pathology},
volume = {},
number = {},
pages = {1-14},
doi = {10.1044/2024_AJSLP-23-00398},
pmid = {38501906},
issn = {1558-9110},
abstract = {PURPOSE: The study of gender and speech has historically excluded studies of transmasculine individuals. Consequently, generalizations about speech and gender are based on cisgender individuals. This lack of representation hinders clinical training and clinical service delivery, particularly by speech-language pathologists providing gender-affirming communication services. This letter describes a new corpus of the speech of American English-speaking transmasculine men, transmasculine nonbinary people, and cisgender men that is open and available to clinicians and researchers.
METHOD: Twenty masculine-presenting native English speakers from the Upper Midwestern United States (including cisgender men, transmasculine men, and transmasculine nonbinary people) were recorded, producing three sets of speech materials: Consensus Auditory-Perceptual Evaluation of Voice sentences, the Rainbow Passage, and a novel set of sentences developed for this project. Acoustic measures vowels (overall formant frequency scaling, vowel-space dispersion, fundamental frequency, breathiness), consonants (voice onset time of word-initial voiceless stops, spectral moments of word-initial /s/), and the entire sentence (rate of speech) that were made.
RESULTS: The acoustic measures reveal a wide range for all dependent measures and low correlations among the measures. Results show that many of the voices depart considerably from the norms for men's speech in published studies.
CONCLUSION: This new corpus can be used to illustrate different ways of sounding masculine by speech-language pathologists performing gender-affirming communication services and by higher education teachers as examples of diverse ways of sounding masculine.},
}
RevDate: 2024-03-18
Effects of Deep-Brain Stimulation on Speech: Perceptual and Acoustic Data.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: This study examined speech changes induced by deep-brain stimulation (DBS) in speakers with Parkinson's disease (PD) using a set of auditory-perceptual and acoustic measures.
METHOD: Speech recordings from nine speakers with PD and DBS were compared between DBS-On and DBS-Off conditions using auditory-perceptual and acoustic analyses. Auditory-perceptual ratings included voice quality, articulation precision, prosody, speech intelligibility, and listening effort obtained from 44 listeners. Acoustic measures were made for voicing proportion, second formant frequency slope, vowel dispersion, articulation rate, and range of fundamental frequency and intensity.
RESULTS: No significant changes were found between DBS-On and DBS-Off for the five perceptual ratings. Four of six acoustic measures revealed significant differences between the two conditions. While articulation rate and acoustic vowel dispersion increased, voicing proportion and intensity range decreased from the DBS-Off to DBS-On condition. However, a visual examination of the data indicated that the statistical significance was mostly driven by a small number of participants, while the majority did not show a consistent pattern of such changes.
CONCLUSIONS: Our data, in general, indicate no-to-minimal changes in speech production ensued from DBS stimulation. The findings are discussed with a focus on large interspeaker variability in PD in terms of their speech characteristics and the potential effects of DBS on speech.
Additional Links: PMID-38498664
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38498664,
year = {2024},
author = {Kim, Y and Thompson, A and Nip, ISB},
title = {Effects of Deep-Brain Stimulation on Speech: Perceptual and Acoustic Data.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-17},
doi = {10.1044/2024_JSLHR-23-00511},
pmid = {38498664},
issn = {1558-9102},
abstract = {PURPOSE: This study examined speech changes induced by deep-brain stimulation (DBS) in speakers with Parkinson's disease (PD) using a set of auditory-perceptual and acoustic measures.
METHOD: Speech recordings from nine speakers with PD and DBS were compared between DBS-On and DBS-Off conditions using auditory-perceptual and acoustic analyses. Auditory-perceptual ratings included voice quality, articulation precision, prosody, speech intelligibility, and listening effort obtained from 44 listeners. Acoustic measures were made for voicing proportion, second formant frequency slope, vowel dispersion, articulation rate, and range of fundamental frequency and intensity.
RESULTS: No significant changes were found between DBS-On and DBS-Off for the five perceptual ratings. Four of six acoustic measures revealed significant differences between the two conditions. While articulation rate and acoustic vowel dispersion increased, voicing proportion and intensity range decreased from the DBS-Off to DBS-On condition. However, a visual examination of the data indicated that the statistical significance was mostly driven by a small number of participants, while the majority did not show a consistent pattern of such changes.
CONCLUSIONS: Our data, in general, indicate no-to-minimal changes in speech production ensued from DBS stimulation. The findings are discussed with a focus on large interspeaker variability in PD in terms of their speech characteristics and the potential effects of DBS on speech.},
}
RevDate: 2024-03-18
The acoustics of Contemporary Standard Bulgarian vowels: A corpus study.
The Journal of the Acoustical Society of America, 155(3):2128-2138.
A comprehensive examination of the acoustics of Contemporary Standard Bulgarian vowels is lacking to date, and this article aims to fill that gap. Six acoustic variables-the first three formant frequencies, duration, mean f0, and mean intensity-of 11 615 vowel tokens from 140 speakers were analysed using linear mixed models, multivariate analysis of variance, and linear discriminant analysis. The vowel system, which comprises six phonemes in stressed position, [ε a ɔ i ɤ u], was examined from four angles. First, vowels in pretonic syllables were compared to other unstressed vowels, and no spectral or durational differences were found, contrary to an oft-repeated claim that pretonic vowels reduce less. Second, comparisons of stressed and unstressed vowels revealed significant differences in all six variables for the non-high vowels [ε a ɔ]. No spectral or durational differences were found in [i ɤ u], which disproves another received view that high vowels are lowered when unstressed. Third, non-high vowels were compared with their high counterparts; the height contrast was completely neutralized in unstressed [a-ɤ] and [ɔ-u] while [ε-i] remained distinct. Last, the acoustic correlates of vowel contrasts were examined, and it was demonstrated that only F1, F2 frequencies and duration were systematically employed in differentiating vowel phonemes.
Additional Links: PMID-38498508
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38498508,
year = {2024},
author = {Sabev, M and Andreeva, B},
title = {The acoustics of Contemporary Standard Bulgarian vowels: A corpus study.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {3},
pages = {2128-2138},
doi = {10.1121/10.0025293},
pmid = {38498508},
issn = {1520-8524},
abstract = {A comprehensive examination of the acoustics of Contemporary Standard Bulgarian vowels is lacking to date, and this article aims to fill that gap. Six acoustic variables-the first three formant frequencies, duration, mean f0, and mean intensity-of 11 615 vowel tokens from 140 speakers were analysed using linear mixed models, multivariate analysis of variance, and linear discriminant analysis. The vowel system, which comprises six phonemes in stressed position, [ε a ɔ i ɤ u], was examined from four angles. First, vowels in pretonic syllables were compared to other unstressed vowels, and no spectral or durational differences were found, contrary to an oft-repeated claim that pretonic vowels reduce less. Second, comparisons of stressed and unstressed vowels revealed significant differences in all six variables for the non-high vowels [ε a ɔ]. No spectral or durational differences were found in [i ɤ u], which disproves another received view that high vowels are lowered when unstressed. Third, non-high vowels were compared with their high counterparts; the height contrast was completely neutralized in unstressed [a-ɤ] and [ɔ-u] while [ε-i] remained distinct. Last, the acoustic correlates of vowel contrasts were examined, and it was demonstrated that only F1, F2 frequencies and duration were systematically employed in differentiating vowel phonemes.},
}
RevDate: 2024-03-18
Changes in Speech Production Following Perceptual Training With Orofacial Somatosensory Inputs.
Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].
PURPOSE: Orofacial somatosensory inputs play an important role in speech motor control and speech learning. Since receiving specific auditory-somatosensory inputs during speech perceptual training alters speech perception, similar perceptual training could also alter speech production. We examined whether the production performance was changed by perceptual training with orofacial somatosensory inputs.
METHOD: We focused on the French vowels /e/ and /ø/, contrasted in their articulation by horizontal gestures. Perceptual training consisted of a vowel identification task contrasting /e/ and /ø/. Along with training, for the first group of participants, somatosensory stimulation was applied as facial skin stretch in backward direction. We recorded the target vowels uttered by the participants before and after the perceptual training and compared their F1, F2, and F3 formants. We also tested a control group with no somatosensory stimulation and another somatosensory group with a different vowel continuum (/e/-/i/) for perceptual training.
RESULTS: Perceptual training with somatosensory stimulation induced changes in F2 and F3 in the produced vowel sounds. F2 decreased consistently in the two somatosensory groups. F3 increased following the /e/-/ø/ training and decreased following the /e/-/i/ training. F2 change was significantly correlated with the perceptual shift between the first and second half of the training phase in the somatosensory group with the /e/-/ø/ training, but not with the /e/-/i/ training. The control group displayed no effect on F2 and F3, and just a tendency of F1 increase.
CONCLUSION: The results suggest that somatosensory inputs associated to speech sound inputs can play a role in speech training and learning in both production and perception.
Additional Links: PMID-38497731
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38497731,
year = {2024},
author = {Ashokumar, M and Schwartz, JL and Ito, T},
title = {Changes in Speech Production Following Perceptual Training With Orofacial Somatosensory Inputs.},
journal = {Journal of speech, language, and hearing research : JSLHR},
volume = {},
number = {},
pages = {1-12},
doi = {10.1044/2023_JSLHR-23-00249},
pmid = {38497731},
issn = {1558-9102},
abstract = {PURPOSE: Orofacial somatosensory inputs play an important role in speech motor control and speech learning. Since receiving specific auditory-somatosensory inputs during speech perceptual training alters speech perception, similar perceptual training could also alter speech production. We examined whether the production performance was changed by perceptual training with orofacial somatosensory inputs.
METHOD: We focused on the French vowels /e/ and /ø/, contrasted in their articulation by horizontal gestures. Perceptual training consisted of a vowel identification task contrasting /e/ and /ø/. Along with training, for the first group of participants, somatosensory stimulation was applied as facial skin stretch in backward direction. We recorded the target vowels uttered by the participants before and after the perceptual training and compared their F1, F2, and F3 formants. We also tested a control group with no somatosensory stimulation and another somatosensory group with a different vowel continuum (/e/-/i/) for perceptual training.
RESULTS: Perceptual training with somatosensory stimulation induced changes in F2 and F3 in the produced vowel sounds. F2 decreased consistently in the two somatosensory groups. F3 increased following the /e/-/ø/ training and decreased following the /e/-/i/ training. F2 change was significantly correlated with the perceptual shift between the first and second half of the training phase in the somatosensory group with the /e/-/ø/ training, but not with the /e/-/i/ training. The control group displayed no effect on F2 and F3, and just a tendency of F1 increase.
CONCLUSION: The results suggest that somatosensory inputs associated to speech sound inputs can play a role in speech training and learning in both production and perception.},
}
RevDate: 2024-03-14
A pilot observation using ultrasonography and vowel articulation to investigate the influence of suspected obstructive sleep apnea on upper airway.
Scientific reports, 14(1):6144.
Failure to employ suitable measures before administering full anesthesia to patients with obstructive sleep apnea (OSA) who are undergoing surgery may lead to developing complications after surgery. Therefore, it is very important to screen OSA before performing a surgery, which is currently done by subjective questionnaires such as STOP-Bang, Berlin scores. These questionnaires have 10-36% specificity in detecting sleep apnea, along with no information given on anatomy of upper airway, which is important for intubation. To address these challenges, we performed a pilot study to understand the utility of ultrasonography and vowel articulation in screening OSA. Our objective was to investigate the influence of OSA risk factors in vowel articulation through ultrasonography and acoustic features analysis. To accomplish this, we recruited 18 individuals with no risk of OSA and 13 individuals with high risk of OSA and asked them to utter vowels, such as /a/ (as in "Sah"), /e/ (as in "See"). An expert ultra-sonographer measured the parasagittal anterior-posterior (PAP) and transverse diameter of the upper airway. From the recorded vowel sounds, we extracted 106 features, including power, pitch, formant, and Mel frequency cepstral coefficients (MFCC). We analyzed the variation of the PAP diameters and vowel features from "See: /i/" to "Sah /a/" between control and OSA groups by two-way repeated measures ANOVA. We found that, there was a variation of upper airway diameter from "See" to "Sah" was significantly smaller in OSA group than control group (OSA: ∆12.8 ± 5.3 mm vs. control: ∆22.5 ± 3.9 mm OSA, p < 0.01). Moreover, we found several vowel features showed the exact same or opposite trend as PAP diameter variation, which led us to build a machine learning model to estimate PAP diameter from vowel features. We found a correlation coefficient of 0.75 between the estimated and measured PAP diameter after applying four estimation models and combining their output with a random forest model, which showed the feasibility of using acoustic features of vowel sounds to monitor upper airway diameter. Overall, this study has proven the concept that ultrasonography and vowel sounds analysis may be useful as an easily accessible imaging tool of upper airway.
Additional Links: PMID-38480766
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38480766,
year = {2024},
author = {Saha, S and Rattansingh, A and Martino, R and Viswanathan, K and Saha, A and Montazeri Ghahjaverestan, N and Yadollahi, A},
title = {A pilot observation using ultrasonography and vowel articulation to investigate the influence of suspected obstructive sleep apnea on upper airway.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {6144},
pmid = {38480766},
issn = {2045-2322},
abstract = {Failure to employ suitable measures before administering full anesthesia to patients with obstructive sleep apnea (OSA) who are undergoing surgery may lead to developing complications after surgery. Therefore, it is very important to screen OSA before performing a surgery, which is currently done by subjective questionnaires such as STOP-Bang, Berlin scores. These questionnaires have 10-36% specificity in detecting sleep apnea, along with no information given on anatomy of upper airway, which is important for intubation. To address these challenges, we performed a pilot study to understand the utility of ultrasonography and vowel articulation in screening OSA. Our objective was to investigate the influence of OSA risk factors in vowel articulation through ultrasonography and acoustic features analysis. To accomplish this, we recruited 18 individuals with no risk of OSA and 13 individuals with high risk of OSA and asked them to utter vowels, such as /a/ (as in "Sah"), /e/ (as in "See"). An expert ultra-sonographer measured the parasagittal anterior-posterior (PAP) and transverse diameter of the upper airway. From the recorded vowel sounds, we extracted 106 features, including power, pitch, formant, and Mel frequency cepstral coefficients (MFCC). We analyzed the variation of the PAP diameters and vowel features from "See: /i/" to "Sah /a/" between control and OSA groups by two-way repeated measures ANOVA. We found that, there was a variation of upper airway diameter from "See" to "Sah" was significantly smaller in OSA group than control group (OSA: ∆12.8 ± 5.3 mm vs. control: ∆22.5 ± 3.9 mm OSA, p < 0.01). Moreover, we found several vowel features showed the exact same or opposite trend as PAP diameter variation, which led us to build a machine learning model to estimate PAP diameter from vowel features. We found a correlation coefficient of 0.75 between the estimated and measured PAP diameter after applying four estimation models and combining their output with a random forest model, which showed the feasibility of using acoustic features of vowel sounds to monitor upper airway diameter. Overall, this study has proven the concept that ultrasonography and vowel sounds analysis may be useful as an easily accessible imaging tool of upper airway.},
}
RevDate: 2024-03-12
Attention-based speech feature transfer between speakers.
Frontiers in artificial intelligence, 7:1259641.
In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.
Additional Links: PMID-38469160
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38469160,
year = {2024},
author = {Lee, H and Cho, M and Kwon, HY},
title = {Attention-based speech feature transfer between speakers.},
journal = {Frontiers in artificial intelligence},
volume = {7},
number = {},
pages = {1259641},
pmid = {38469160},
issn = {2624-8212},
abstract = {In this study, we propose a simple yet effective method for incorporating the source speaker's characteristics in the target speaker's speech. This allows our model to generate the speech of the target speaker with the style of the source speaker. To achieve this, we focus on the attention model within the speech synthesis model, which learns various speaker features such as spectrogram, pitch, intensity, formant, pulse, and voice breaks. The model is trained separately using datasets specific to the source and target speakers. Subsequently, we replace the attention weights learned from the source speaker's dataset with the attention weights from the target speaker's model. Finally, by providing new input texts to the target model, we generate the speech of the target speaker with the styles of the source speaker. We validate the effectiveness of our model through similarity analysis utilizing five evaluation metrics and showcase real-world examples.},
}
RevDate: 2024-03-08
Discrimination and sensorimotor adaptation of self-produced vowels in cochlear implant users.
The Journal of the Acoustical Society of America, 155(3):1895-1908.
Humans rely on auditory feedback to monitor and adjust their speech for clarity. Cochlear implants (CIs) have helped over a million people restore access to auditory feedback, which significantly improves speech production. However, there is substantial variability in outcomes. This study investigates the extent to which CI users can use their auditory feedback to detect self-produced sensory errors and make adjustments to their speech, given the coarse spectral resolution provided by their implants. First, we used an auditory discrimination task to assess the sensitivity of CI users to small differences in formant frequencies of their self-produced vowels. Then, CI users produced words with altered auditory feedback in order to assess sensorimotor adaptation to auditory error. Almost half of the CI users tested can detect small, within-channel differences in their self-produced vowels, and they can utilize this auditory feedback towards speech adaptation. An acoustic hearing control group showed better sensitivity to the shifts in vowels, even in CI-simulated speech, and elicited more robust speech adaptation behavior than the CI users. Nevertheless, this study confirms that CI users can compensate for sensory errors in their speech and supports the idea that sensitivity to these errors may relate to variability in production.
Additional Links: PMID-38456732
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38456732,
year = {2024},
author = {Borjigin, A and Bakst, S and Anderson, K and Litovsky, RY and Niziolek, CA},
title = {Discrimination and sensorimotor adaptation of self-produced vowels in cochlear implant users.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {3},
pages = {1895-1908},
doi = {10.1121/10.0025063},
pmid = {38456732},
issn = {1520-8524},
abstract = {Humans rely on auditory feedback to monitor and adjust their speech for clarity. Cochlear implants (CIs) have helped over a million people restore access to auditory feedback, which significantly improves speech production. However, there is substantial variability in outcomes. This study investigates the extent to which CI users can use their auditory feedback to detect self-produced sensory errors and make adjustments to their speech, given the coarse spectral resolution provided by their implants. First, we used an auditory discrimination task to assess the sensitivity of CI users to small differences in formant frequencies of their self-produced vowels. Then, CI users produced words with altered auditory feedback in order to assess sensorimotor adaptation to auditory error. Almost half of the CI users tested can detect small, within-channel differences in their self-produced vowels, and they can utilize this auditory feedback towards speech adaptation. An acoustic hearing control group showed better sensitivity to the shifts in vowels, even in CI-simulated speech, and elicited more robust speech adaptation behavior than the CI users. Nevertheless, this study confirms that CI users can compensate for sensory errors in their speech and supports the idea that sensitivity to these errors may relate to variability in production.},
}
RevDate: 2024-03-05
Experienced and Inexperienced Listeners' Perception of Vocal Strain.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(24)00024-9 [Epub ahead of print].
OBJECTIVE: The ability to perceive strain or tension in a voice is critical for both speech-language pathologists and singing teachers. Research on voice quality has focused primarily on the perception of breathiness or roughness. The perception of vocal strain has not been extensively researched and is poorly understood.
METHODS/DESIGN: This study employs a group and a within-subject design. Synthetic female sung stimuli were created that varied in source slope and vocal tract transfer function. Two groups of listeners, inexperienced listeners and experienced vocal pedagogues, listened to the stimuli and rated the perceived strain using a visual analog scale Synthetic female stimuli were constructed on the vowel /ɑ/ at 2 pitches, A3 and F5, using glottal source slopes that drop in amplitude at constant rates varying from - 6 dB/octave to - 18 dB/octave. All stimuli were filtered using three vocal tract transfer functions, one derived from a lyric/coloratura soprano, one derived from a mezzo-soprano, and a third that has resonance frequencies mid-way between the two. Listeners heard the stimuli over headphones and rated them on a scale from "no strain" to "very strained" using a visual-analog scale.
RESULTS: Spectral source slope was strongly related to the perception of strain in both groups of listeners. Experienced listeners' perception of strain was also related to formant pattern, while inexperienced listeners' perception of strain was also related to pitch.
CONCLUSION: This study has shown that spectral source slope can be a powerful cue to the perception of strain. However, inexperienced and experienced listeners also differ from each other in how strain is perceived across speaking and singing pitches. These differences may be based on both experience and the goals of the listener.
Additional Links: PMID-38443265
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38443265,
year = {2024},
author = {Stone, TC and Erickson, ML},
title = {Experienced and Inexperienced Listeners' Perception of Vocal Strain.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2024.02.002},
pmid = {38443265},
issn = {1873-4588},
abstract = {OBJECTIVE: The ability to perceive strain or tension in a voice is critical for both speech-language pathologists and singing teachers. Research on voice quality has focused primarily on the perception of breathiness or roughness. The perception of vocal strain has not been extensively researched and is poorly understood.
METHODS/DESIGN: This study employs a group and a within-subject design. Synthetic female sung stimuli were created that varied in source slope and vocal tract transfer function. Two groups of listeners, inexperienced listeners and experienced vocal pedagogues, listened to the stimuli and rated the perceived strain using a visual analog scale Synthetic female stimuli were constructed on the vowel /ɑ/ at 2 pitches, A3 and F5, using glottal source slopes that drop in amplitude at constant rates varying from - 6 dB/octave to - 18 dB/octave. All stimuli were filtered using three vocal tract transfer functions, one derived from a lyric/coloratura soprano, one derived from a mezzo-soprano, and a third that has resonance frequencies mid-way between the two. Listeners heard the stimuli over headphones and rated them on a scale from "no strain" to "very strained" using a visual-analog scale.
RESULTS: Spectral source slope was strongly related to the perception of strain in both groups of listeners. Experienced listeners' perception of strain was also related to formant pattern, while inexperienced listeners' perception of strain was also related to pitch.
CONCLUSION: This study has shown that spectral source slope can be a powerful cue to the perception of strain. However, inexperienced and experienced listeners also differ from each other in how strain is perceived across speaking and singing pitches. These differences may be based on both experience and the goals of the listener.},
}
RevDate: 2024-03-05
Comparative Study on the Acoustic Analysis of Voice in Auditory Brainstem Implantees, Cochlear Implantees, and Normal Hearing Children.
Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 76(1):645-652.
The aim of the study was to compare the acoustic characteristics of voice between Auditory Brainstem Implantees, Cochlear Implantees and normal hearing children. Voice parameters such as fundamental frequency, formant frequencies, perturbation measures, and harmonic to noise ratio were measured in a total of 30 children out of which 10 were Auditory Brainstem Implantees, 10 were Cochlear Implantees and 10 were normal hearing children. Parametric and nonparametric statistics were done to establish the nature of significance between the three groups. Overall deviancies were seen in the implanted group for all acoustic parameters. However abnormal deviations were seen in individuals with Auditory Brainstem Implants indicating the deficit in the feedback loop impacting the voice characteristics. The deviancy in feedback could attribute to the poor performance in ABI and CI. The CI performed comparatively better when compared to the ABI group indicating a slight feedback loop due to the type of Implant. However, there needs to be additional evidence supporting this and there is a need to carry out the same study using a larger sample size and a longitudinal design.
Additional Links: PMID-38440592
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38440592,
year = {2024},
author = {Umashankar, A and Ramamoorthy, S and Selvaraj, JL and Dhandayutham, S},
title = {Comparative Study on the Acoustic Analysis of Voice in Auditory Brainstem Implantees, Cochlear Implantees, and Normal Hearing Children.},
journal = {Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India},
volume = {76},
number = {1},
pages = {645-652},
pmid = {38440592},
issn = {2231-3796},
abstract = {The aim of the study was to compare the acoustic characteristics of voice between Auditory Brainstem Implantees, Cochlear Implantees and normal hearing children. Voice parameters such as fundamental frequency, formant frequencies, perturbation measures, and harmonic to noise ratio were measured in a total of 30 children out of which 10 were Auditory Brainstem Implantees, 10 were Cochlear Implantees and 10 were normal hearing children. Parametric and nonparametric statistics were done to establish the nature of significance between the three groups. Overall deviancies were seen in the implanted group for all acoustic parameters. However abnormal deviations were seen in individuals with Auditory Brainstem Implants indicating the deficit in the feedback loop impacting the voice characteristics. The deviancy in feedback could attribute to the poor performance in ABI and CI. The CI performed comparatively better when compared to the ABI group indicating a slight feedback loop due to the type of Implant. However, there needs to be additional evidence supporting this and there is a need to carry out the same study using a larger sample size and a longitudinal design.},
}
RevDate: 2024-03-04
DIVA Meets EEG: Model Validation Using Formant-Shift Reflex.
Applied sciences (Basel, Switzerland), 13(13):.
The neurocomputational model 'Directions into Velocities of Articulators' (DIVA) was developed to account for various aspects of normal and disordered speech production and acquisition. The neural substrates of DIVA were established through functional magnetic resonance imaging (fMRI), providing physiological validation of the model. This study introduces DIVA_EEG an extension of DIVA that utilizes electroencephalography (EEG) to leverage the high temporal resolution and broad availability of EEG over fMRI. For the development of DIVA_EEG, EEG-like signals were derived from original equations describing the activity of the different DIVA maps. Synthetic EEG associated with the utterance of syllables was generated when both unperturbed and perturbed auditory feedback (first formant perturbations) were simulated. The cortical activation maps derived from synthetic EEG closely resembled those of the original DIVA model. To validate DIVA_EEG, the EEG of individuals with typical voices (N = 30) was acquired during an altered auditory feedback paradigm. The resulting empirical brain activity maps significantly overlapped with those predicted by DIVA_EEG. In conjunction with other recent model extensions, DIVA_EEG lays the foundations for constructing a complete neurocomputational framework to tackle vocal and speech disorders, which can guide model-driven personalized interventions.
Additional Links: PMID-38435340
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38435340,
year = {2023},
author = {Cuadros, J and Z-Rivera, L and Castro, C and Whitaker, G and Otero, M and Weinstein, A and Martínez-Montes, E and Prado, P and Zañartu, M},
title = {DIVA Meets EEG: Model Validation Using Formant-Shift Reflex.},
journal = {Applied sciences (Basel, Switzerland)},
volume = {13},
number = {13},
pages = {},
pmid = {38435340},
issn = {2076-3417},
abstract = {The neurocomputational model 'Directions into Velocities of Articulators' (DIVA) was developed to account for various aspects of normal and disordered speech production and acquisition. The neural substrates of DIVA were established through functional magnetic resonance imaging (fMRI), providing physiological validation of the model. This study introduces DIVA_EEG an extension of DIVA that utilizes electroencephalography (EEG) to leverage the high temporal resolution and broad availability of EEG over fMRI. For the development of DIVA_EEG, EEG-like signals were derived from original equations describing the activity of the different DIVA maps. Synthetic EEG associated with the utterance of syllables was generated when both unperturbed and perturbed auditory feedback (first formant perturbations) were simulated. The cortical activation maps derived from synthetic EEG closely resembled those of the original DIVA model. To validate DIVA_EEG, the EEG of individuals with typical voices (N = 30) was acquired during an altered auditory feedback paradigm. The resulting empirical brain activity maps significantly overlapped with those predicted by DIVA_EEG. In conjunction with other recent model extensions, DIVA_EEG lays the foundations for constructing a complete neurocomputational framework to tackle vocal and speech disorders, which can guide model-driven personalized interventions.},
}
RevDate: 2024-02-28
Improved tactile speech perception using audio-to-tactile sensory substitution with formant frequency focusing.
Scientific reports, 14(1):4889.
Haptic hearing aids, which provide speech information through tactile stimulation, could substantially improve outcomes for both cochlear implant users and for those unable to access cochlear implants. Recent advances in wide-band haptic actuator technology have made new audio-to-tactile conversion strategies viable for wearable devices. One such strategy filters the audio into eight frequency bands, which are evenly distributed across the speech frequency range. The amplitude envelopes from the eight bands modulate the amplitudes of eight low-frequency tones, which are delivered through vibration to a single site on the wrist. This tactile vocoder strategy effectively transfers some phonemic information, but vowels and obstruent consonants are poorly portrayed. In 20 participants with normal touch perception, we tested (1) whether focusing the audio filters of the tactile vocoder more densely around the first and second formant frequencies improved tactile vowel discrimination, and (2) whether focusing filters at mid-to-high frequencies improved obstruent consonant discrimination. The obstruent-focused approach was found to be ineffective. However, the formant-focused approach improved vowel discrimination by 8%, without changing overall consonant discrimination. The formant-focused tactile vocoder strategy, which can readily be implemented in real time on a compact device, could substantially improve speech perception for haptic hearing aid users.
Additional Links: PMID-38418558
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38418558,
year = {2024},
author = {Fletcher, MD and Akis, E and Verschuur, CA and Perry, SW},
title = {Improved tactile speech perception using audio-to-tactile sensory substitution with formant frequency focusing.},
journal = {Scientific reports},
volume = {14},
number = {1},
pages = {4889},
pmid = {38418558},
issn = {2045-2322},
support = {EP/W032422/1//Engineering and Physical Sciences Research Council/ ; EP/T517859/1//Engineering and Physical Sciences Research Council/ ; },
abstract = {Haptic hearing aids, which provide speech information through tactile stimulation, could substantially improve outcomes for both cochlear implant users and for those unable to access cochlear implants. Recent advances in wide-band haptic actuator technology have made new audio-to-tactile conversion strategies viable for wearable devices. One such strategy filters the audio into eight frequency bands, which are evenly distributed across the speech frequency range. The amplitude envelopes from the eight bands modulate the amplitudes of eight low-frequency tones, which are delivered through vibration to a single site on the wrist. This tactile vocoder strategy effectively transfers some phonemic information, but vowels and obstruent consonants are poorly portrayed. In 20 participants with normal touch perception, we tested (1) whether focusing the audio filters of the tactile vocoder more densely around the first and second formant frequencies improved tactile vowel discrimination, and (2) whether focusing filters at mid-to-high frequencies improved obstruent consonant discrimination. The obstruent-focused approach was found to be ineffective. However, the formant-focused approach improved vowel discrimination by 8%, without changing overall consonant discrimination. The formant-focused tactile vocoder strategy, which can readily be implemented in real time on a compact device, could substantially improve speech perception for haptic hearing aid users.},
}
RevDate: 2024-02-21
Mantled howler monkey males assess their rivals through formant spacing of long-distance calls.
Primates; journal of primatology [Epub ahead of print].
Formant frequency spacing of long-distance vocalizations is allometrically related to body size and could represent an honest signal of fighting potential. There is, however, only limited evidence that primates use formant spacing to assess the competitive potential of rivals during interactions with extragroup males, a risky context. We hypothesized that if formant spacing of long-distance calls is inversely related to the fighting potential of male mantled howler monkeys (Alouatta palliata), then males should: (1) be more likely and (2) faster to display vocal responses to calling rivals; (3) be more likely and (4) faster to approach calling rivals; and have higher fecal (5) glucocorticoid and (6) testosterone metabolite concentrations in response to rivals calling at intermediate and high formant spacing than to those with low formant spacing. We studied the behavioral responses of 11 adult males to playback experiments of long-distance calls from unknown individuals with low (i.e., emulating large individuals), intermediate, and high (i.e., small individuals) formant spacing (n = 36 experiments). We assayed fecal glucocorticoid and testosterone metabolite concentrations (n = 174). Playbacks always elicited vocal responses, but males responded quicker to intermediate than to low formant spacing playbacks. Low formant spacing calls were less likely to elicit approaches whereas high formant spacing calls resulted in quicker approaches. Males showed stronger hormonal responses to low than to both intermediate and high formant spacing calls. It is possible that males do not escalate conflicts with rivals with low formant spacing calls if these are perceived as large, and against whom winning probabilities should decrease and confrontation costs increase; but are willing to escalate conflicts with rivals of high formant spacing. Formant spacing may therefore be an important signal for rival assessment in this species.
Additional Links: PMID-38381271
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38381271,
year = {2024},
author = {Maya Lastra, N and Rangel Negrín, A and Coyohua Fuentes, A and Dias, PAD},
title = {Mantled howler monkey males assess their rivals through formant spacing of long-distance calls.},
journal = {Primates; journal of primatology},
volume = {},
number = {},
pages = {},
pmid = {38381271},
issn = {1610-7365},
support = {726265//Consejo Nacional de Ciencia y Tecnología/ ; 15 1529//Consejo Veracruzano de Ciencia y Tecnología/ ; },
abstract = {Formant frequency spacing of long-distance vocalizations is allometrically related to body size and could represent an honest signal of fighting potential. There is, however, only limited evidence that primates use formant spacing to assess the competitive potential of rivals during interactions with extragroup males, a risky context. We hypothesized that if formant spacing of long-distance calls is inversely related to the fighting potential of male mantled howler monkeys (Alouatta palliata), then males should: (1) be more likely and (2) faster to display vocal responses to calling rivals; (3) be more likely and (4) faster to approach calling rivals; and have higher fecal (5) glucocorticoid and (6) testosterone metabolite concentrations in response to rivals calling at intermediate and high formant spacing than to those with low formant spacing. We studied the behavioral responses of 11 adult males to playback experiments of long-distance calls from unknown individuals with low (i.e., emulating large individuals), intermediate, and high (i.e., small individuals) formant spacing (n = 36 experiments). We assayed fecal glucocorticoid and testosterone metabolite concentrations (n = 174). Playbacks always elicited vocal responses, but males responded quicker to intermediate than to low formant spacing playbacks. Low formant spacing calls were less likely to elicit approaches whereas high formant spacing calls resulted in quicker approaches. Males showed stronger hormonal responses to low than to both intermediate and high formant spacing calls. It is possible that males do not escalate conflicts with rivals with low formant spacing calls if these are perceived as large, and against whom winning probabilities should decrease and confrontation costs increase; but are willing to escalate conflicts with rivals of high formant spacing. Formant spacing may therefore be an important signal for rival assessment in this species.},
}
RevDate: 2024-02-16
Auditory free classification of gender diverse speakersa).
The Journal of the Acoustical Society of America, 155(2):1422-1436.
Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.
Additional Links: PMID-38364044
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38364044,
year = {2024},
author = {Merritt, B and Bent, T and Kilgore, R and Eads, C},
title = {Auditory free classification of gender diverse speakersa).},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {2},
pages = {1422-1436},
doi = {10.1121/10.0024521},
pmid = {38364044},
issn = {1520-8524},
abstract = {Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.},
}
RevDate: 2024-02-15
Dynamic specification of vowels in Hijazi Arabic.
Phonetica [Epub ahead of print].
Research on various languages shows that dynamic approaches to vowel acoustics - in particular Vowel-Inherent Spectral Change (VISC) - can play a vital role in characterising and classifying monophthongal vowels compared with a static model. This study's aim was to investigate whether dynamic cues also allow for better description and classification of the Hijazi Arabic (HA) vowel system, a phonological system based on both temporal and spectral distinctions. Along with static and dynamic F1 and F2 patterns, we evaluated the extent to which vowel duration, F0, and F3 contribute to increased/decreased discriminability among vowels. Data were collected from 20 native HA speakers (10 females and 10 males) producing eight HA monophthongal vowels in a word list with varied consonantal contexts. Results showed that dynamic cues provide further insights regarding HA vowels that are not normally gleaned from static measures alone. Using discriminant analysis, the dynamic cues (particularly the seven-point model) had relatively higher classification rates, and vowel duration was found to play a significant role as an additional cue. Our results are in line with dynamic approaches and highlight the importance of looking beyond static cues and beyond the first two formants for further insights into the description and classification of vowel systems.
Additional Links: PMID-38358292
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38358292,
year = {2024},
author = {Almurashi, W and Al-Tamimi, J and Khattab, G},
title = {Dynamic specification of vowels in Hijazi Arabic.},
journal = {Phonetica},
volume = {},
number = {},
pages = {},
pmid = {38358292},
issn = {1423-0321},
abstract = {Research on various languages shows that dynamic approaches to vowel acoustics - in particular Vowel-Inherent Spectral Change (VISC) - can play a vital role in characterising and classifying monophthongal vowels compared with a static model. This study's aim was to investigate whether dynamic cues also allow for better description and classification of the Hijazi Arabic (HA) vowel system, a phonological system based on both temporal and spectral distinctions. Along with static and dynamic F1 and F2 patterns, we evaluated the extent to which vowel duration, F0, and F3 contribute to increased/decreased discriminability among vowels. Data were collected from 20 native HA speakers (10 females and 10 males) producing eight HA monophthongal vowels in a word list with varied consonantal contexts. Results showed that dynamic cues provide further insights regarding HA vowels that are not normally gleaned from static measures alone. Using discriminant analysis, the dynamic cues (particularly the seven-point model) had relatively higher classification rates, and vowel duration was found to play a significant role as an additional cue. Our results are in line with dynamic approaches and highlight the importance of looking beyond static cues and beyond the first two formants for further insights into the description and classification of vowel systems.},
}
RevDate: 2024-02-13
Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children.
Autism research : official journal of the International Society for Autism Research [Epub ahead of print].
Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.
Additional Links: PMID-38348589
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38348589,
year = {2024},
author = {Simeone, PJ and Green, JR and Tager-Flusberg, H and Chenausky, KV},
title = {Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children.},
journal = {Autism research : official journal of the International Society for Autism Research},
volume = {},
number = {},
pages = {},
doi = {10.1002/aur.3102},
pmid = {38348589},
issn = {1939-3806},
support = {/NH/NIH HHS/United States ; K24 DC016312/DC/NIDCD NIH HHS/United States ; R00 DC017490/DC/NIDCD NIH HHS/United States ; P50 DC018006/DC/NIDCD NIH HHS/United States ; },
abstract = {Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.},
}
RevDate: 2024-02-11
Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.
The Journal of the Acoustical Society of America, 155(2):1253-1263.
The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.
Additional Links: PMID-38341748
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38341748,
year = {2024},
author = {Shadle, CH and Fulop, SA and Chen, WR and Whalen, DH},
title = {Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {2},
pages = {1253-1263},
doi = {10.1121/10.0024548},
pmid = {38341748},
issn = {1520-8524},
abstract = {The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.},
}
RevDate: 2024-02-06
Exploring the impact of type II diabetes mellitus on voice quality.
European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery [Epub ahead of print].
PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants.
METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests.
RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP.
CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.
Additional Links: PMID-38319369
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38319369,
year = {2024},
author = {Saghiri, MA and Vakhnovetsky, J and Amanabi, M and Karamifar, K and Farhadi, M and Amini, SB and Conte, M},
title = {Exploring the impact of type II diabetes mellitus on voice quality.},
journal = {European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery},
volume = {},
number = {},
pages = {},
pmid = {38319369},
issn = {1434-4726},
abstract = {PURPOSE: This cross-sectional study aimed to investigate the potential of voice analysis as a prescreening tool for type II diabetes mellitus (T2DM) by examining the differences in voice recordings between non-diabetic and T2DM participants.
METHODS: 60 participants diagnosed as non-diabetic (n = 30) or T2DM (n = 30) were recruited on the basis of specific inclusion and exclusion criteria in Iran between February 2020 and September 2023. Participants were matched according to their year of birth and then placed into six age categories. Using the WhatsApp application, participants recorded the translated versions of speech elicitation tasks. Seven acoustic features [fundamental frequency, jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), voice onset time (VOT), and formant (F1-F2)] were extracted from each recording and analyzed using Praat software. Data was analyzed with Kolmogorov-Smirnov, two-way ANOVA, post hoc Tukey, binary logistic regression, and student t tests.
RESULTS: The comparison between groups showed significant differences in fundamental frequency, jitter, shimmer, CPP, and HNR (p < 0.05), while there were no significant differences in formant and VOT (p > 0.05). Binary logistic regression showed that shimmer was the most significant predictor of the disease group. There was also a significant difference between diabetes status and age, in the case of CPP.
CONCLUSIONS: Participants with type II diabetes exhibited significant vocal variations compared to non-diabetic controls.},
}
RevDate: 2024-02-01
Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.
JASA express letters, 4(2):.
The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.
Additional Links: PMID-38299984
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38299984,
year = {2024},
author = {Benway, NR and Preston, JL and Salekin, A and Hitchcock, E and McAllister, T},
title = {Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.},
journal = {JASA express letters},
volume = {4},
number = {2},
pages = {},
doi = {10.1121/10.0024632},
pmid = {38299984},
issn = {2691-1191},
abstract = {The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.},
}
RevDate: 2024-01-23
Study on a Pig Vocalization Classification Method Based on Multi-Feature Fusion.
Sensors (Basel, Switzerland), 24(2): pii:s24020313.
To improve the classification of pig vocalization using vocal signals and improve recognition accuracy, a pig vocalization classification method based on multi-feature fusion is proposed in this study. With the typical vocalization of pigs in large-scale breeding houses as the research object, short-time energy, frequency centroid, formant frequency and first-order difference, and Mel frequency cepstral coefficient and first-order difference were extracted as the fusion features. These fusion features were improved using principal component analysis. A pig vocalization classification model with a BP neural network optimized based on the genetic algorithm was constructed. The results showed that using the improved features to recognize pig grunting, squealing, and coughing, the average recognition accuracy was 93.2%; the recognition precisions were 87.9%, 98.1%, and 92.7%, respectively, with an average of 92.9%; and the recognition recalls were 92.0%, 99.1%, and 87.4%, respectively, with an average of 92.8%, which indicated that the proposed pig vocalization classification method had good recognition precision and recall, and could provide a reference for pig vocalization information feedback and automatic recognition.
Additional Links: PMID-38257406
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38257406,
year = {2024},
author = {Hou, Y and Li, Q and Wang, Z and Liu, T and He, Y and Li, H and Ren, Z and Guo, X and Yang, G and Liu, Y and Yu, L},
title = {Study on a Pig Vocalization Classification Method Based on Multi-Feature Fusion.},
journal = {Sensors (Basel, Switzerland)},
volume = {24},
number = {2},
pages = {},
doi = {10.3390/s24020313},
pmid = {38257406},
issn = {1424-8220},
support = {2021ZD0113803//Scientific and Technological Innovation 2030 Program of China Ministry of Science and Technology/ ; 20YFZCSN00220//Tianjin Science and Technology Planning Project/ ; JKZX202214//Beijing Academy of Agriculture and Forestry Sciences Outstanding Scientist Training Program/ ; },
abstract = {To improve the classification of pig vocalization using vocal signals and improve recognition accuracy, a pig vocalization classification method based on multi-feature fusion is proposed in this study. With the typical vocalization of pigs in large-scale breeding houses as the research object, short-time energy, frequency centroid, formant frequency and first-order difference, and Mel frequency cepstral coefficient and first-order difference were extracted as the fusion features. These fusion features were improved using principal component analysis. A pig vocalization classification model with a BP neural network optimized based on the genetic algorithm was constructed. The results showed that using the improved features to recognize pig grunting, squealing, and coughing, the average recognition accuracy was 93.2%; the recognition precisions were 87.9%, 98.1%, and 92.7%, respectively, with an average of 92.9%; and the recognition recalls were 92.0%, 99.1%, and 87.4%, respectively, with an average of 92.8%, which indicated that the proposed pig vocalization classification method had good recognition precision and recall, and could provide a reference for pig vocalization information feedback and automatic recognition.},
}
RevDate: 2024-01-22
Formant dynamics in second language speech: Japanese speakers' production of English liquids.
The Journal of the Acoustical Society of America, 155(1):479-495.
This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2-F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2-F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately.
Additional Links: PMID-38252795
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38252795,
year = {2024},
author = {Nagamine, T},
title = {Formant dynamics in second language speech: Japanese speakers' production of English liquids.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {1},
pages = {479-495},
doi = {10.1121/10.0024351},
pmid = {38252795},
issn = {1520-8524},
abstract = {This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2-F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2-F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately.},
}
RevDate: 2024-01-16
What Is the Effect of Maxillary Impaction Orthognathic Surgery on Voice Characteristics? A Quasi-Experimental Study.
World journal of plastic surgery, 12(3):44-56.
BACKGROUND: Regarding the impact of orthognathic surgery on the airway and voice, this study was carried out to investigate the effects of maxillary impaction surgery on patients' voices through acoustic analysis and articulation assessment.
METHODS: This quasi-experimental, before-and-after, double-blind study aimed at examining the effects of maxillary impaction surgery on the voice of orthognathic surgery patients. Before the surgery, a speech therapist conducted acoustic analysis, which included fundament frequency (F0), Jitter, Shimmer, and the harmonic-to-noise ratio (HNR), as well as first, second, and third formants (F1, F2, and F3). The patient's age, sex, degree of maxillary deformity, and impaction were documented in a checklist. Voice analysis was repeated during follow-up appointments at one and six months after the surgery in a blinded manner. The data were statistically analyzed using SPSS 23, and the significance level was set at 0.05.
RESULTS: Twenty two patients (18 females, 4 males) were examined, with ages ranging from 18 to 40 years and an average age of 25.54 years. F2, F3, HNR, and Shimmer demonstrated a significant increase over the investigation period compared to the initial phase of the study (P <0.001 for each). Conversely, the Jitter variable exhibited a significant decrease during the follow-up assessments in comparison to the initial phase of the study (P< 0.001).
CONCLUSION: Following maxillary impaction surgery, improvements in voice quality were observed compared to the preoperative condition. However, further studies with larger samples are needed to confirm the relevancy.
Additional Links: PMID-38226202
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38226202,
year = {2023},
author = {Ghaemi, H and Grillo, R and Alizadeh, O and Shirzadeh, A and Ejtehadi, B and Torkzadeh, M and Samieirad, S},
title = {What Is the Effect of Maxillary Impaction Orthognathic Surgery on Voice Characteristics? A Quasi-Experimental Study.},
journal = {World journal of plastic surgery},
volume = {12},
number = {3},
pages = {44-56},
pmid = {38226202},
issn = {2228-7914},
abstract = {BACKGROUND: Regarding the impact of orthognathic surgery on the airway and voice, this study was carried out to investigate the effects of maxillary impaction surgery on patients' voices through acoustic analysis and articulation assessment.
METHODS: This quasi-experimental, before-and-after, double-blind study aimed at examining the effects of maxillary impaction surgery on the voice of orthognathic surgery patients. Before the surgery, a speech therapist conducted acoustic analysis, which included fundament frequency (F0), Jitter, Shimmer, and the harmonic-to-noise ratio (HNR), as well as first, second, and third formants (F1, F2, and F3). The patient's age, sex, degree of maxillary deformity, and impaction were documented in a checklist. Voice analysis was repeated during follow-up appointments at one and six months after the surgery in a blinded manner. The data were statistically analyzed using SPSS 23, and the significance level was set at 0.05.
RESULTS: Twenty two patients (18 females, 4 males) were examined, with ages ranging from 18 to 40 years and an average age of 25.54 years. F2, F3, HNR, and Shimmer demonstrated a significant increase over the investigation period compared to the initial phase of the study (P <0.001 for each). Conversely, the Jitter variable exhibited a significant decrease during the follow-up assessments in comparison to the initial phase of the study (P< 0.001).
CONCLUSION: Following maxillary impaction surgery, improvements in voice quality were observed compared to the preoperative condition. However, further studies with larger samples are needed to confirm the relevancy.},
}
RevDate: 2024-01-12
Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments.
JASA express letters, 4(1):.
Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.
Additional Links: PMID-38214609
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38214609,
year = {2024},
author = {Hedrick, M and Thornton, K},
title = {Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments.},
journal = {JASA express letters},
volume = {4},
number = {1},
pages = {},
doi = {10.1121/10.0024334},
pmid = {38214609},
issn = {2691-1191},
abstract = {Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.},
}
RevDate: 2024-01-04
Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa).
The Journal of the Acoustical Society of America, 155(1):68-77.
Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically-with ΔF0 varied-to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination.
Additional Links: PMID-38174963
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38174963,
year = {2024},
author = {Sathe, NC and Kain, A and Reiss, LAJ},
title = {Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa).},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {1},
pages = {68-77},
doi = {10.1121/10.0024245},
pmid = {38174963},
issn = {1520-8524},
abstract = {Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically-with ΔF0 varied-to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination.},
}
RevDate: 2024-01-02
Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.
Applied psychophysiology and biofeedback [Epub ahead of print].
Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.
Additional Links: PMID-38165498
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38165498,
year = {2024},
author = {Wang, L and Liu, R and Wang, Y and Xu, X and Zhang, R and Wei, Y and Zhu, R and Zhang, X and Wang, F},
title = {Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.},
journal = {Applied psychophysiology and biofeedback},
volume = {},
number = {},
pages = {},
pmid = {38165498},
issn = {1573-3270},
support = {ZD2021026//Key Project supported by Medical Science and Technology Development Foundation, Jiangsu Commission of Health/ ; 62176129//National Natural Science Foundation of China/ ; 81725005//National Science Fund for Distinguished Young Scholars/ ; U20A6005//National Natural Science Foundation Regional Innovation and Development Joint Fund/ ; BE2021617//Jiangsu Provincial Key Research and Development Program/ ; },
abstract = {Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.},
}
RevDate: 2023-12-29
A practical guide to calculating vocal tract length and scale-invariant formant patterns.
Behavior research methods [Epub ahead of print].
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
Additional Links: PMID-38158551
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38158551,
year = {2023},
author = {Anikin, A and Barreda, S and Reby, D},
title = {A practical guide to calculating vocal tract length and scale-invariant formant patterns.},
journal = {Behavior research methods},
volume = {},
number = {},
pages = {},
pmid = {38158551},
issn = {1554-3528},
abstract = {Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.},
}
RevDate: 2023-12-23
On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations.
Bioengineering (Basel, Switzerland), 10(12): pii:bioengineering10121369.
Sound generation in human phonation and the underlying fluid-structure-acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds' coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds' vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates.
Additional Links: PMID-38135960
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38135960,
year = {2023},
author = {Kraxberger, F and Näger, C and Laudato, M and Sundström, E and Becker, S and Mihaescu, M and Kniesburges, S and Schoder, S},
title = {On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations.},
journal = {Bioengineering (Basel, Switzerland)},
volume = {10},
number = {12},
pages = {},
doi = {10.3390/bioengineering10121369},
pmid = {38135960},
issn = {2306-5354},
support = {39480417//Austrian Research Promotion Agency/ ; 446965891//Deutsche Forschungsgemeinschaft/ ; n/a//TU Graz Open Access Publishing Fund/ ; },
abstract = {Sound generation in human phonation and the underlying fluid-structure-acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds' coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds' vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates.},
}
RevDate: 2023-12-12
The Change of Vocal Tract Length in People with Parkinson's Disease.
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference, 2023:1-4.
Hypokinetic dysarthria is one of the early symptoms of Parkinson's disease (PD) and has been proposed for early detection and also for monitoring of the progression of the disease. PD reduces the control of vocal tract muscles such as the tongue and lips and, therefore the length of the active vocal tract is altered. However, the change in the vocal tract length due to the disease has not been investigated. The aim of this study was to determine the difference in the apparent vocal tract length (AVTL) between people with PD and age-matched control healthy people. The phoneme, /a/ from the UCI Parkinson's Disease Classification Dataset and the Italian Parkinson's Voice and Speech Dataset were used and AVTL was calculated based on the first four formants of the sustained phoneme (F1-F4). The results show a correlation between Parkinson's disease and an increase in vocal tract length. The most sensitive feature was the AVTL calculated using the first formants of sustained phonemes (F1). The other significant finding reported in this article is that the difference is significant and only appeared in the male participants. However, the size of the database is not sufficiently large to identify the possible confounding factors such as the severity and duration of the disease, medication, age, and comorbidity factors.Clinical relevance-The outcomes of this research have the potential to improve the identification of early Parkinsonian dysarthria and monitor PD progression.
Additional Links: PMID-38082914
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38082914,
year = {2023},
author = {Pah, ND and Motin, MA and Oliveira, GC and Kumar, DK},
title = {The Change of Vocal Tract Length in People with Parkinson's Disease.},
journal = {Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference},
volume = {2023},
number = {},
pages = {1-4},
doi = {10.1109/EMBC40787.2023.10340263},
pmid = {38082914},
issn = {2694-0604},
abstract = {Hypokinetic dysarthria is one of the early symptoms of Parkinson's disease (PD) and has been proposed for early detection and also for monitoring of the progression of the disease. PD reduces the control of vocal tract muscles such as the tongue and lips and, therefore the length of the active vocal tract is altered. However, the change in the vocal tract length due to the disease has not been investigated. The aim of this study was to determine the difference in the apparent vocal tract length (AVTL) between people with PD and age-matched control healthy people. The phoneme, /a/ from the UCI Parkinson's Disease Classification Dataset and the Italian Parkinson's Voice and Speech Dataset were used and AVTL was calculated based on the first four formants of the sustained phoneme (F1-F4). The results show a correlation between Parkinson's disease and an increase in vocal tract length. The most sensitive feature was the AVTL calculated using the first formants of sustained phonemes (F1). The other significant finding reported in this article is that the difference is significant and only appeared in the male participants. However, the size of the database is not sufficiently large to identify the possible confounding factors such as the severity and duration of the disease, medication, age, and comorbidity factors.Clinical relevance-The outcomes of this research have the potential to improve the identification of early Parkinsonian dysarthria and monitor PD progression.},
}
RevDate: 2023-12-07
Different hemispheric lateralization for periodicity and formant structure of vowels in the auditory cortex and its changes between childhood and adulthood.
Cortex; a journal devoted to the study of the nervous system and behavior, 171:287-307 pii:S0010-9452(23)00281-2 [Epub ahead of print].
The spectral formant structure and periodicity pitch are the major features that determine the identity of vowels and the characteristics of the speaker. However, very little is known about how the processing of these features in the auditory cortex changes during development. To address this question, we independently manipulated the periodicity and formant structure of vowels while measuring auditory cortex responses using magnetoencephalography (MEG) in children aged 7-12 years and adults. We analyzed the sustained negative shift of source current associated with these vowel properties, which was present in the auditory cortex in both age groups despite differences in the transient components of the auditory response. In adults, the sustained activation associated with formant structure was lateralized to the left hemisphere early in the auditory processing stream requiring neither attention nor semantic mapping. This lateralization was not yet established in children, in whom the right hemisphere contribution to formant processing was strong and decreased during or after puberty. In contrast to the formant structure, periodicity was associated with a greater response in the right hemisphere in both children and adults. These findings suggest that left-lateralization for the automatic processing of vowel formant structure emerges relatively late in ontogenesis and pose a serious challenge to current theories of hemispheric specialization for speech processing.
Additional Links: PMID-38061210
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38061210,
year = {2023},
author = {Orekhova, EV and Fadeev, KA and Goiaeva, DE and Obukhova, TS and Ovsiannikova, TM and Prokofyev, AO and Stroganova, TA},
title = {Different hemispheric lateralization for periodicity and formant structure of vowels in the auditory cortex and its changes between childhood and adulthood.},
journal = {Cortex; a journal devoted to the study of the nervous system and behavior},
volume = {171},
number = {},
pages = {287-307},
doi = {10.1016/j.cortex.2023.10.020},
pmid = {38061210},
issn = {1973-8102},
abstract = {The spectral formant structure and periodicity pitch are the major features that determine the identity of vowels and the characteristics of the speaker. However, very little is known about how the processing of these features in the auditory cortex changes during development. To address this question, we independently manipulated the periodicity and formant structure of vowels while measuring auditory cortex responses using magnetoencephalography (MEG) in children aged 7-12 years and adults. We analyzed the sustained negative shift of source current associated with these vowel properties, which was present in the auditory cortex in both age groups despite differences in the transient components of the auditory response. In adults, the sustained activation associated with formant structure was lateralized to the left hemisphere early in the auditory processing stream requiring neither attention nor semantic mapping. This lateralization was not yet established in children, in whom the right hemisphere contribution to formant processing was strong and decreased during or after puberty. In contrast to the formant structure, periodicity was associated with a greater response in the right hemisphere in both children and adults. These findings suggest that left-lateralization for the automatic processing of vowel formant structure emerges relatively late in ontogenesis and pose a serious challenge to current theories of hemispheric specialization for speech processing.},
}
RevDate: 2023-12-07
Neural alpha oscillations index context-driven perception of ambiguous vowel sequences.
iScience, 26(12):108457.
Perception of bistable stimuli is influenced by prior context. In some cases, the interpretation matches with how the preceding stimulus was perceived; in others, it tends to be the opposite of the previous stimulus percept. We measured high-density electroencephalography (EEG) while participants were presented with a sequence of vowels that varied in formant transition, promoting the perception of one or two auditory streams followed by an ambiguous bistable sequence. For the bistable sequence, participants were more likely to report hearing the opposite percept of the one heard immediately before. This auditory contrast effect coincided with changes in alpha power localized in the left angular gyrus and left sensorimotor and right sensorimotor/supramarginal areas. The latter correlated with participants' perception. These results suggest that the contrast effect for a bistable sequence of vowels may be related to neural adaptation in posterior auditory areas, which influences participants' perceptual construal level of ambiguous stimuli.
Additional Links: PMID-38058304
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38058304,
year = {2023},
author = {Alain, C and Göke, K and Shen, D and Bidelman, GM and Bernstein, LJ and Snyder, JS},
title = {Neural alpha oscillations index context-driven perception of ambiguous vowel sequences.},
journal = {iScience},
volume = {26},
number = {12},
pages = {108457},
pmid = {38058304},
issn = {2589-0042},
abstract = {Perception of bistable stimuli is influenced by prior context. In some cases, the interpretation matches with how the preceding stimulus was perceived; in others, it tends to be the opposite of the previous stimulus percept. We measured high-density electroencephalography (EEG) while participants were presented with a sequence of vowels that varied in formant transition, promoting the perception of one or two auditory streams followed by an ambiguous bistable sequence. For the bistable sequence, participants were more likely to report hearing the opposite percept of the one heard immediately before. This auditory contrast effect coincided with changes in alpha power localized in the left angular gyrus and left sensorimotor and right sensorimotor/supramarginal areas. The latter correlated with participants' perception. These results suggest that the contrast effect for a bistable sequence of vowels may be related to neural adaptation in posterior auditory areas, which influences participants' perceptual construal level of ambiguous stimuli.},
}
RevDate: 2023-12-05
Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders.
Amyotrophic lateral sclerosis & frontotemporal degeneration [Epub ahead of print].
OBJECTIVE: To evaluate automated digital speech measures, derived from spontaneous speech (picture descriptions), in assessing bulbar motor impairments in patients with ALS-FTD spectrum disorders (ALS-FTSD).
METHODS: Automated vowel algorithms were employed to extract two vowel acoustic measures: vowel space area (VSA), and mean second formant slope (F2 slope). Vowel measures were compared between ALS with and without clinical bulbar symptoms (ALS + bulbar (n = 49, ALSFRS-r bulbar subscore: x¯ = 9.8 (SD = 1.7)) vs. ALS-nonbulbar (n = 23), behavioral variant frontotemporal dementia (bvFTD, n = 25) without a motor syndrome, and healthy controls (HC, n = 32). Correlations with bulbar motor clinical scales, perceived listener effort, and MRI cortical thickness of the orobuccal primary motor cortex (oral PMC) were examined. We compared vowel measures to speaking rate, a conventional metric for assessing bulbar dysfunction.
RESULTS: ALS + bulbar had significantly reduced VSA and F2 slope than ALS-nonbulbar (|d|=0.94 and |d|=1.04, respectively), bvFTD (|d|=0.89 and |d|=1.47), and HC (|d|=0.73 and |d|=0.99). These reductions correlated with worse bulbar clinical scores (VSA: R = 0.33, p = 0.043; F2 slope: R = 0.38, p = 0.011), greater listener effort (VSA: R=-0.43, p = 0.041; F2 slope: p > 0.05), and cortical thinning in oral PMC (F2 slope: β = 0.0026, p = 0.017). Vowel measures demonstrated greater sensitivity and specificity for bulbar impairment than speaking rate, while showing independence from cognitive and respiratory impairments.
CONCLUSION: Automatic vowel measures are easily derived from a brief spontaneous speech sample, are sensitive to mild-moderate stage of bulbar disease in ALS-FTSD, and may present better sensitivity to bulbar impairment compared to traditional assessments such as speaking rate.
Additional Links: PMID-38050971
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38050971,
year = {2023},
author = {Shellikeri, S and Cho, S and Ash, S and Gonzalez-Recober, C and Mcmillan, CT and Elman, L and Quinn, C and Amado, DA and Baer, M and Irwin, DJ and Massimo, L and Olm, CA and Liberman, MY and Grossman, M and Nevler, N},
title = {Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders.},
journal = {Amyotrophic lateral sclerosis & frontotemporal degeneration},
volume = {},
number = {},
pages = {1-9},
doi = {10.1080/21678421.2023.2288106},
pmid = {38050971},
issn = {2167-9223},
abstract = {OBJECTIVE: To evaluate automated digital speech measures, derived from spontaneous speech (picture descriptions), in assessing bulbar motor impairments in patients with ALS-FTD spectrum disorders (ALS-FTSD).
METHODS: Automated vowel algorithms were employed to extract two vowel acoustic measures: vowel space area (VSA), and mean second formant slope (F2 slope). Vowel measures were compared between ALS with and without clinical bulbar symptoms (ALS + bulbar (n = 49, ALSFRS-r bulbar subscore: x¯ = 9.8 (SD = 1.7)) vs. ALS-nonbulbar (n = 23), behavioral variant frontotemporal dementia (bvFTD, n = 25) without a motor syndrome, and healthy controls (HC, n = 32). Correlations with bulbar motor clinical scales, perceived listener effort, and MRI cortical thickness of the orobuccal primary motor cortex (oral PMC) were examined. We compared vowel measures to speaking rate, a conventional metric for assessing bulbar dysfunction.
RESULTS: ALS + bulbar had significantly reduced VSA and F2 slope than ALS-nonbulbar (|d|=0.94 and |d|=1.04, respectively), bvFTD (|d|=0.89 and |d|=1.47), and HC (|d|=0.73 and |d|=0.99). These reductions correlated with worse bulbar clinical scores (VSA: R = 0.33, p = 0.043; F2 slope: R = 0.38, p = 0.011), greater listener effort (VSA: R=-0.43, p = 0.041; F2 slope: p > 0.05), and cortical thinning in oral PMC (F2 slope: β = 0.0026, p = 0.017). Vowel measures demonstrated greater sensitivity and specificity for bulbar impairment than speaking rate, while showing independence from cognitive and respiratory impairments.
CONCLUSION: Automatic vowel measures are easily derived from a brief spontaneous speech sample, are sensitive to mild-moderate stage of bulbar disease in ALS-FTSD, and may present better sensitivity to bulbar impairment compared to traditional assessments such as speaking rate.},
}
RevDate: 2023-11-30
Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss.
Frontiers in neuroscience, 17:1238941.
INTRODUCTION: Understanding speech in a noisy environment, as opposed to speech in quiet, becomes increasingly more difficult with increasing age. Using the quiet-aged gerbil, we studied the effects of aging on speech-in-noise processing. Specifically, behavioral vowel discrimination and the encoding of these vowels by single auditory-nerve fibers were compared, to elucidate some of the underlying mechanisms of age-related speech-in-noise perception deficits.
METHODS: Young-adult and quiet-aged Mongolian gerbils, of either sex, were trained to discriminate a deviant naturally-spoken vowel in a sequence of vowel standards against a speech-like background noise. In addition, we recorded responses from single auditory-nerve fibers of young-adult and quiet-aged gerbils while presenting the same speech stimuli.
RESULTS: Behavioral vowel discrimination was not significantly affected by aging. For both young-adult and quiet-aged gerbils, the behavioral discrimination between /eː/ and /iː/ was more difficult to make than /eː/ vs. /aː/ or /iː/ vs. /aː/, as evidenced by longer response times and lower d' values. In young-adults, spike timing-based vowel discrimination agreed with the behavioral vowel discrimination, while in quiet-aged gerbils it did not. Paradoxically, discrimination between vowels based on temporal responses was enhanced in aged gerbils for all vowel comparisons. Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the lower formant frequencies in fibers of quiet-aged gerbils, but no qualitative changes in vowel encoding. Elevated thresholds in combination with a fixed stimulus level, i.e., lower sensation levels of the stimuli for old individuals, can explain the enhanced temporal coding of the vowels in noise.
DISCUSSION: These results suggest that the altered auditory-nerve discrimination metrics in old gerbils may mask age-related deterioration in the central (auditory) system to the extent that behavioral vowel discrimination matches that of the young adults.
Additional Links: PMID-38033551
Full Text:
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38033551,
year = {2023},
author = {Heeringa, AN and Jüchter, C and Beutelmann, R and Klump, GM and Köppl, C},
title = {Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss.},
journal = {Frontiers in neuroscience},
volume = {17},
number = {},
pages = {1238941},
doi = {10.3389/fnins.2023.1238941},
pmid = {38033551},
issn = {1662-4548},
abstract = {INTRODUCTION: Understanding speech in a noisy environment, as opposed to speech in quiet, becomes increasingly more difficult with increasing age. Using the quiet-aged gerbil, we studied the effects of aging on speech-in-noise processing. Specifically, behavioral vowel discrimination and the encoding of these vowels by single auditory-nerve fibers were compared, to elucidate some of the underlying mechanisms of age-related speech-in-noise perception deficits.
METHODS: Young-adult and quiet-aged Mongolian gerbils, of either sex, were trained to discriminate a deviant naturally-spoken vowel in a sequence of vowel standards against a speech-like background noise. In addition, we recorded responses from single auditory-nerve fibers of young-adult and quiet-aged gerbils while presenting the same speech stimuli.
RESULTS: Behavioral vowel discrimination was not significantly affected by aging. For both young-adult and quiet-aged gerbils, the behavioral discrimination between /eː/ and /iː/ was more difficult to make than /eː/ vs. /aː/ or /iː/ vs. /aː/, as evidenced by longer response times and lower d' values. In young-adults, spike timing-based vowel discrimination agreed with the behavioral vowel discrimination, while in quiet-aged gerbils it did not. Paradoxically, discrimination between vowels based on temporal responses was enhanced in aged gerbils for all vowel comparisons. Representation schemes, based on the spectrum of the inter-spike interval histogram, revealed stronger encoding of both the fundamental and the lower formant frequencies in fibers of quiet-aged gerbils, but no qualitative changes in vowel encoding. Elevated thresholds in combination with a fixed stimulus level, i.e., lower sensation levels of the stimuli for old individuals, can explain the enhanced temporal coding of the vowels in noise.
DISCUSSION: These results suggest that the altered auditory-nerve discrimination metrics in old gerbils may mask age-related deterioration in the central (auditory) system to the extent that behavioral vowel discrimination matches that of the young adults.},
}
RevDate: 2023-11-29
Selectivity to acoustic features of human speech in the auditory cortex of the mouse.
Hearing research, 441:108920 pii:S0378-5955(23)00232-0 [Epub ahead of print].
A better understanding of the neural mechanisms of speech processing can have a major impact in the development of strategies for language learning and in addressing disorders that affect speech comprehension. Technical limitations in research with human subjects hinder a comprehensive exploration of these processes, making animal models essential for advancing the characterization of how neural circuits make speech perception possible. Here, we investigated the mouse as a model organism for studying speech processing and explored whether distinct regions of the mouse auditory cortex are sensitive to specific acoustic features of speech. We found that mice can learn to categorize frequency-shifted human speech sounds based on differences in formant transitions (FT) and voice onset time (VOT). Moreover, neurons across various auditory cortical regions were selective to these speech features, with a higher proportion of speech-selective neurons in the dorso-posterior region. Last, many of these neurons displayed mixed-selectivity for both features, an attribute that was most common in dorsal regions of the auditory cortex. Our results demonstrate that the mouse serves as a valuable model for studying the detailed mechanisms of speech feature encoding and neural plasticity during speech-sound learning.
Additional Links: PMID-38029503
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38029503,
year = {2023},
author = {Mohn, JL and Baese-Berk, MM and Jaramillo, S},
title = {Selectivity to acoustic features of human speech in the auditory cortex of the mouse.},
journal = {Hearing research},
volume = {441},
number = {},
pages = {108920},
doi = {10.1016/j.heares.2023.108920},
pmid = {38029503},
issn = {1878-5891},
abstract = {A better understanding of the neural mechanisms of speech processing can have a major impact in the development of strategies for language learning and in addressing disorders that affect speech comprehension. Technical limitations in research with human subjects hinder a comprehensive exploration of these processes, making animal models essential for advancing the characterization of how neural circuits make speech perception possible. Here, we investigated the mouse as a model organism for studying speech processing and explored whether distinct regions of the mouse auditory cortex are sensitive to specific acoustic features of speech. We found that mice can learn to categorize frequency-shifted human speech sounds based on differences in formant transitions (FT) and voice onset time (VOT). Moreover, neurons across various auditory cortical regions were selective to these speech features, with a higher proportion of speech-selective neurons in the dorso-posterior region. Last, many of these neurons displayed mixed-selectivity for both features, an attribute that was most common in dorsal regions of the auditory cortex. Our results demonstrate that the mouse serves as a valuable model for studying the detailed mechanisms of speech feature encoding and neural plasticity during speech-sound learning.},
}
RevDate: 2023-11-27
The role of loudness in vocal intimidation.
Journal of experimental psychology. General pii:2024-28586-001 [Epub ahead of print].
Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered "large" in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness-frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Additional Links: PMID-38010781
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38010781,
year = {2023},
author = {Anikin, A and Valente, D and Pisanski, K and Cornec, C and Bryant, GA and Reby, D},
title = {The role of loudness in vocal intimidation.},
journal = {Journal of experimental psychology. General},
volume = {},
number = {},
pages = {},
doi = {10.1037/xge0001508},
pmid = {38010781},
issn = {1939-2222},
support = {//Vetenskapsrådet/ ; //French National Research Agency (ANR)/ ; },
abstract = {Across many species, a major function of vocal communication is to convey formidability, with low voice frequencies traditionally considered the main vehicle for projecting large size and aggression. Vocal loudness is often ignored, yet it might explain some puzzling exceptions to this frequency code. Here we demonstrate, through acoustic analyses of over 3,000 human vocalizations and four perceptual experiments, that vocalizers produce low frequencies when attempting to sound large, but loudness is prioritized for displays of strength and aggression. Our results show that, although being loud is effective for signaling strength and aggression, it poses a physiological trade-off with low frequencies because a loud voice is achieved by elevating pitch and opening the mouth wide into a-like vowels. This may explain why aggressive vocalizations are often high-pitched and why open vowels are considered "large" in sound symbolism despite their high first formant. Callers often compensate by adding vocal harshness (nonlinear vocal phenomena) to undesirably high-pitched loud vocalizations, but a combination of low and loud remains an honest predictor of both perceived and actual physical formidability. The proposed notion of a loudness-frequency trade-off thus adds a new dimension to the widely accepted frequency code and requires a fundamental rethinking of the evolutionary forces shaping the form of acoustic signals. (PsycInfo Database Record (c) 2023 APA, all rights reserved).},
}
RevDate: 2023-11-24
Estimating Formant Frequencies of Vowels Sung by Sopranos Using Weighted Linear Prediction.
Journal of voice : official journal of the Voice Foundation pii:S0892-1997(23)00322-3 [Epub ahead of print].
This study introduces the weighted linear prediction adapted to high-pitched singing voices (WLP-HPSV) method for accurately estimating formant frequencies of vowels sung by lyric sopranos. The WLP-HPSV method employs a variant of the WLP analysis combined with the zero-frequency filtering (ZFF) technique to address specific challenges in formant estimation from singing signals. Evaluation of the WLP-HPSV method compared to the LPC method demonstrated its superior performance in accurately capturing the spectral characteristics of synthetic /u/ vowels and the /a/ and /u/ natural singing vowels. The QCP parameters used in the WLP-HPSV method varied with pitch, revealing insights into the interplay between the vocal tract and glottal characteristics during vowel production. The comparison between the LPC and WLP-HPSV methods highlighted the robustness of the WLP-HPSV method in accurately estimating formant frequencies across different pitches.
Additional Links: PMID-38000960
Publisher:
PubMed:
Citation:
show bibtex listing
hide bibtex listing
@article {pmid38000960,
year = {2023},
author = {Barrientos, E and Cataldo, E},
title = {Estimating Formant Frequencies of Vowels Sung by Sopranos Using Weighted Linear Prediction.},
journal = {Journal of voice : official journal of the Voice Foundation},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.jvoice.2023.10.018},
pmid = {38000960},
issn = {1873-4588},
abstract = {This study introduces the weighted linear prediction adapted to high-pitched singing voices (WLP-HPSV) method for accurately estimating formant frequencies of vowels sung by lyric sopranos. The WLP-HPSV method employs a variant of the WLP analysis combined with the zero-frequency filtering (ZFF) technique to address specific challenges in formant estimation from singing signals. Evaluation of the WLP-HPSV method compared to the LPC method demonstrated its superior performance in accurately capturing the spectral characteristics of synthetic /u/ vowels and the /a/ and /u/ natural singing vowels. The QCP parameters used in the WLP-HPSV method varied with pitch, revealing insights into the interplay between the vocal tract and glottal characteristics during vowel production. The comparison between the LPC and WLP-HPSV methods highlighted the robustness of the WLP-HPSV method in accurately estimating formant frequencies across different pitches.},
}
▼ ▼ LOAD NEXT 100 CITATIONS
RJR Experience and Expertise
Researcher
Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.
Educator
Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.
Administrator
Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.
Technologist
Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.
Publisher
While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.
Speaker
Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.
Facilitator
Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.
Designer
Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.
RJR Picks from Around the Web (updated 11 MAY 2018 )
Old Science
Weird Science
Treating Disease with Fecal Transplantation
Fossils of miniature humans (hobbits) discovered in Indonesia
Paleontology
Dinosaur tail, complete with feathers, found preserved in amber.
Astronomy
Mysterious fast radio burst (FRB) detected in the distant universe.
Big Data & Informatics
Big Data: Buzzword or Big Deal?
Hacking the genome: Identifying anonymized human subjects using publicly available data.