About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

30 Mar 2023 at 01:46
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 30 Mar 2023 at 01:46 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2023-03-23

Lou Q, Wang X, Chen Y, et al (2023)

Subjective and Objective Evaluation of Speech in Adult Patients With Repaired Cleft Palate.

The Journal of craniofacial surgery pii:00001665-990000000-00653 [Epub ahead of print].

OBJECTIVE: To explore the speech outcomes of adult patients with repaired cleft palate through subjective perception evaluation and objective acoustic analysis, and to compare the differences in pronunciation characteristics between speakers with complete velopharyngeal closure (VPC) and velopharyngeal insufficiency (VPI) patients.

PARTICIPANTS AND INTERVENTION: Subjective evaluation indicators included speech intelligibility, nasality and consonant missing rate, for objective acoustic analysis, we used speech sample normalization and objective acoustic parameters included normalized vowel formants, voice onset time and the analysis of 3-dimensional spectrogram and spectrum, were carried out on speech samples produced by 3 groups of speakers: (a) speakers with velopharyngeal competence after palatorrhaphy (n=38); (b) speakers with velopharyngeal incompetence after palatorrhaphy (n=70), (c) adult patients with cleft palate (n=65) and (d) typical speakers (n=30).

RESULTS: There was a highly negative correlation between VPC grade and speech intelligibility (ρ=-0.933), and a highly positive correlation between VPC and nasality (ρ=0.813). In subjective evaluation, the speech level of VPI patients was significantly lower than that of VPC patients and normal adults. Although the nasality and consonant loss rate of VPC patients were significantly higher than that of normal adults, the speech intelligibility of VPC patients was not significantly different from that of normal adults. In acoustic analysis, patients with VPI still performed poorly compared with patients with VPC.

CONCLUSIONS: The speech function of adult cleft palate patients is affected by abnormal palatal structure and bad pronunciation habits. In subjective evaluation, there was no significant difference in speech level between VPC patients and normal adults, whereas there was significant difference between VPI patients and normal adults. The acoustic parameters were different between the 2 groups after cleft palate repair. The condition of palatopharyngeal closure after cleft palate can affect the patient's speech.

RevDate: 2023-03-22

Easwar V, Purcell D, T Wright (2023)

Predicting Hearing aid Benefit Using Speech-Evoked Envelope Following Responses in Children With Hearing Loss.

Trends in hearing, 27:23312165231151468.

Electroencephalography could serve as an objective tool to evaluate hearing aid benefit in infants who are developmentally unable to participate in hearing tests. We investigated whether speech-evoked envelope following responses (EFRs), a type of electroencephalography-based measure, could predict improved audibility with the use of a hearing aid in children with mild-to-severe permanent, mainly sensorineural, hearing loss. In 18 children, EFRs were elicited by six male-spoken band-limited phonemic stimuli--the first formants of /u/ and /i/, the second and higher formants of /u/ and /i/, and the fricatives /s/ and /∫/--presented together as /su∫i/. EFRs were recorded between the vertex and nape, when /su∫i/ was presented at 55, 65, and 75 dB SPL using insert earphones in unaided conditions and individually fit hearing aids in aided conditions. EFR amplitude and detectability improved with the use of a hearing aid, and the degree of improvement in EFR amplitude was dependent on the extent of change in behavioral thresholds between unaided and aided conditions. EFR detectability was primarily influenced by audibility; higher sensation level stimuli had an increased probability of detection. Overall EFR sensitivity in predicting audibility was significantly higher in aided (82.1%) than unaided conditions (66.5%) and did not vary as a function of stimulus or frequency. EFR specificity in ascertaining inaudibility was 90.8%. Aided improvement in EFR detectability was a significant predictor of hearing aid-facilitated change in speech discrimination accuracy. Results suggest that speech-evoked EFRs could be a useful objective tool in predicting hearing aid benefit in children with hearing loss.

RevDate: 2023-03-22

Duan H, Xie Q, Z Zhang (2023)

Characteristics of Alveolo-palatal Affricates Produced by Mandarin-speaking Children with Repaired Cleft Palate.

American journal of health behavior, 47(1):13-20.

Objectives: In this study, examined the acoustic properties of affricates /t/ and /t[h]/ in Mandarin Chinese, and analyzed the differences of the acoustic characteristics of these affricates produced by children with repaired cleft palate and normally developing children. We also explored the relationship between the affricates and high-front vowel /i/. Methods: We analyzed 16 monosyllabic words with alveolo-palatal affricates as the initial consonants produced by children with repaired cleft palate (N=13, Mean=5.9 years) and normally developing children (N=6, Mean age=5.3 years). We used several acoustic parameters to investigate the characteristics of these affricates, such as the center of gravity, VOT and the formants of vowels. Results: Compared with normally developing children, children with cleft palate exhibited a lower center of gravity for the 2 affricates /t/ and /t[h]/. Data from the control group showed that the affricate /t[h]/ had a significantly greater center of gravity than that of /t/. The accuracy of /t , t[h]/ produced by speakers of cleft palate was significantly correlated with that of /i/ (r=0.63). High-front vowel /i/ is a significant index in diagnosing speech intelligibility which is more valuable than /a/ and /u/. There was a significant difference in F2 of vowel /i/ between children with cleft palate without speech therapy (CS1) and after speech therapy (CS2). After speech intervention, the accuracy of affricates produced by children with cleft palate was improved, the acoustic properties "stop + noise segments" appeared. Conclusion: Children with cleft palate can be distinguished better from children with normal development by 2 significant acoustic characteristics: center of gravity and VOT. As alveolo-palatal affricates /t , t[h]/ and high-front vowel /i/ have a similar place of articulation, front-tongue-blade, their production accuracy can be improved mutually. The analysis showed that the articulation of Chinese /i/ has a higher frontal lingual position and less variability, which is more conducive to articulation training and improves the effect of cleft palate training. These findings provide a potential relationship on affricates /t, t[h]/ and vowel /i/. Children with cleft palate have difficulty pronouncing the /t, t [h]/ and /i/. It is better to start with a vowel /i/, resulting in improvement in overall speech intelligibility.

RevDate: 2023-03-20

Alghowinem S, Gedeon T, Goecke R, et al (2023)

Interpretation of Depression Detection Models via Feature Selection Methods.

IEEE transactions on affective computing, 14(1):133-152.

Given the prevalence of depression worldwide and its major impact on society, several studies employed artificial intelligence modelling to automatically detect and assess depression. However, interpretation of these models and cues are rarely discussed in detail in the AI community, but have received increased attention lately. In this study, we aim to analyse the commonly selected features using a proposed framework of several feature selection methods and their effect on the classification results, which will provide an interpretation of the depression detection model. The developed framework aggregates and selects the most promising features for modelling depression detection from 38 feature selection algorithms of different categories. Using three real-world depression datasets, 902 behavioural cues were extracted from speech behaviour, speech prosody, eye movement and head pose. To verify the generalisability of the proposed framework, we applied the entire process to depression datasets individually and when combined. The results from the proposed framework showed that speech behaviour features (e.g. pauses) are the most distinctive features of the depression detection model. From the speech prosody modality, the strongest feature groups were F0, HNR, formants, and MFCC, while for the eye activity modality they were left-right eye movement and gaze direction, and for the head modality it was yaw head movement. Modelling depression detection using the selected features (even though there are only 9 features) outperformed using all features in all the individual and combined datasets. Our feature selection framework did not only provide an interpretation of the model, but was also able to produce a higher accuracy of depression detection with a small number of features in varied datasets. This could help to reduce the processing time needed to extract features and creating the model.

RevDate: 2023-03-08

Hauser I (2023)

Differential Cue Weighting in Mandarin Sibilant Production.

Language and speech [Epub ahead of print].

Individual talkers vary in their relative use of different cues to signal phonological contrast. Previous work provides limited and conflicting data on whether such variation is modulated by cue trading or individual differences in speech style. This paper examines differential cue weighting patterns in Mandarin sibilants as a test case for these hypotheses. Standardized Mandarin exhibits a three-way place contrast between retroflex, alveopalatal, and alveolar sibilants with individual differences in relative weighting of spectral center of gravity (COG) and the second formant of the following vowel (F2). In results from a speech production task, cue weights of COG and F2 are inversely correlated across speakers, demonstrating a trade-off relationship in cue use. These findings are consistent with a cue trading account of individual differences in contrast signaling.

RevDate: 2023-03-07

Yang X, Guo C, Zhang M, et al (2023)

Ultrahigh-sensitivity multi-parameter tacrolimus solution detection based on an anchor planar millifluidic microwave biosensor.

Analytical methods : advancing methods and applications [Epub ahead of print].

To detect drug concentration in tacrolimus solution, an anchor planar millifluidic microwave (APMM) biosensor is proposed. The millifluidic system integrated with the sensor enables accurate and efficient detection while eliminating interference caused by the fluidity of the tacrolimus sample. Different concentrations (10-500 ng mL[-1]) of the tacrolimus analyte were introduced into the millifluidic channel, where it completely interacts with the radio frequency patch electromagnetic field, thereby effectively and sensitively modifying the resonant frequency and amplitude of the transmission coefficient. Experimental results indicate that the sensor has an extremely low limit of detection (LoD) of 0.12 pg mL[-1] and a frequency detection resolution (FDR) of 1.59 (MHz (ng mL[-1])). The greater the FDR and the lower the LoD, the more the feasibility of a label-free biosensing method. Regression analysis revealed a strong linear correlation (R[2] = 0.992) between the concentration of tacrolimus and the frequency difference of the two resonant peaks of APMM. In addition, the difference in the reflection coefficient between the two formants was measured and calculated, and a strong linear correlation (R[2] = 0.998) was found between the difference and tacrolimus concentration. Five measurements were performed on each individual sample of tacrolimus to validate the biosensor's high repeatability. Consequently, the proposed biosensor is a potential candidate for the early detection of tacrolimus drug concentration levels in organ transplant recipients. This study presents a simple method for constructing microwave biosensors with high sensitivity and rapid response.

RevDate: 2023-03-01

Liu Z, Y Xu (2023)

Deep learning assessment of syllable affiliation of intervocalic consonants.

The Journal of the Acoustical Society of America, 153(2):848.

In English, a sentence like "He made out our intentions." could be misperceived as "He may doubt our intentions." because the coda /d/ sounds like it has become the onset of the next syllable. The nature and occurrence condition of this resyllabification phenomenon are unclear, however. Previous empirical studies mainly relied on listener judgment, limited acoustic evidence, such as voice onset time, or average formant values to determine the occurrence of resyllabification. This study tested the hypothesis that resyllabification is a coarticulatory reorganisation that realigns the coda consonant with the vowel of the next syllable. Deep learning in conjunction with dynamic time warping (DTW) was used to assess syllable affiliation of intervocalic consonants. The results suggest that convolutional neural network- and recurrent neural network-based models can detect cases of resyllabification using Mel-frequency spectrograms. DTW analysis shows that neural network inferred resyllabified sequences are acoustically more similar to their onset counterparts than their canonical productions. A binary classifier further suggests that, similar to the genuine onsets, the inferred resyllabified coda consonants are coarticulated with the following vowel. These results are interpreted with an account of resyllabification as a speech-rate-dependent coarticulatory reorganisation mechanism in speech.

RevDate: 2023-03-01

Lasota M, Šidlof P, Maurerlehner P, et al (2023)

Anisotropic minimum dissipation subgrid-scale model in hybrid aeroacoustic simulations of human phonation.

The Journal of the Acoustical Society of America, 153(2):1052.

This article deals with large-eddy simulations of three-dimensional incompressible laryngeal flow followed by acoustic simulations of human phonation of five cardinal English vowels, /ɑ, æ, i, o, u/. The flow and aeroacoustic simulations were performed in OpenFOAM and in-house code openCFS, respectively. Given the large variety of scales in the flow and acoustics, the simulation is separated into two steps: (1) computing the flow in the larynx using the finite volume method on a fine moving grid with 2.2 million elements, followed by (2) computing the sound sources separately and wave propagation to the radiation zone around the mouth using the finite element method on a coarse static grid with 33 000 elements. The numerical results showed that the anisotropic minimum dissipation model, which is not well known since it is not available in common CFD software, predicted stronger sound pressure levels at higher harmonics, and especially at first two formants, than the wall-adapting local eddy-viscosity model. The model on turbulent flow in the larynx was employed and a positive impact on the quality of simulated vowels was found.

RevDate: 2023-03-01

Huang Z, Lobbezoo F, Vanhommerig JW, et al (2023)

Effects of demographic and sleep-related factors on snoring sound parameters.

Sleep medicine, 104:3-10 pii:S1389-9457(23)00059-X [Epub ahead of print].

OBJECTIVE: To investigate the effect of frequently reported between-individual (viz., age, gender, body mass index [BMI], and apnea-hypopnea index [AHI]) and within-individual (viz., sleep stage and sleep position) snoring sound-related factors on snoring sound parameters in temporal, intensity, and frequency domains.

METHODS: This study included 83 adult snorers (mean ± SD age: 42.2 ± 11.3 yrs; male gender: 59%) who underwent an overnight polysomnography (PSG) and simultaneous sound recording, from which a total of 131,745 snoring events were extracted and analyzed. Data on both between-individual and within-individual factors were extracted from the participants' PSG reports.

RESULTS: Gender did not have any significant effect on snoring sound parameters. The fundamental frequency (FF; coefficient = -0.31; P = 0.02) and dominant frequency (DF; coefficient = -12.43; P < 0.01) of snoring sounds decreased with the increase of age, and the second formant increased (coefficient = 22.91; P = 0.02) with the increase of BMI. Severe obstructive sleep apnea (OSA; AHI ≥30 events/hour), non-rapid eye movement sleep stage 3 (N3), and supine position were all associated with more, longer, and louder snoring events (P < 0.05). Supine position was associated with higher FF and DF, and lateral decubitus positions were associated with higher formants.

CONCLUSIONS: Within the limitations of the current patient profile and included factors, AHI was found to have greater effects on snoring sound parameters than the other between-individual factors. The included within-individual factors were found to have greater effects on snoring sound parameters than the between-individual factors under study.

RevDate: 2023-02-27

Wang L, Z Jiang (2023)

Tidal Volume Level Estimation Using Respiratory Sounds.

Journal of healthcare engineering, 2023:4994668.

Respiratory sounds have been used as a noninvasive and convenient method to estimate respiratory flow and tidal volume. However, current methods need calibration, making them difficult to use in a home environment. A respiratory sound analysis method is proposed to estimate tidal volume levels during sleep qualitatively. Respiratory sounds are filtered and segmented into one-minute clips, all clips are clustered into three categories: normal breathing/snoring/uncertain with agglomerative hierarchical clustering (AHC). Formant parameters are extracted to classify snoring clips into simple snoring and obstructive snoring with the K-means algorithm. For simple snoring clips, the tidal volume level is calculated based on snoring last time. For obstructive snoring clips, the tidal volume level is calculated by the maximum breathing pause interval. The performance of the proposed method is evaluated on an open dataset, PSG-Audio, in which full-night polysomnography (PSG) and tracheal sound were recorded simultaneously. The calculated tidal volume levels are compared with the corresponding lowest nocturnal oxygen saturation (LoO2) data. Experiments show that the proposed method calculates tidal volume levels with high accuracy and robustness.

RevDate: 2023-02-23

Aldamen H, M Al-Deaibes (2023)

Arabic emphatic consonants as produced by English speakers: An acoustic study.

Heliyon, 9(2):e13401.

This study examines the production of emphatic consonants as produced by American L2 learners of Arabic. To this end, 19 participants, 5 native speakers and 14 L2 learners, participated in a production experiment in which they produced monosyllabic CVC pairs that were contrasted in terms of whether the initial consonant was plain or emphatic. The acoustic parameters that were investigated are VOT of voiceless stops, COG of fricatives, and the first three formant frequencies of the target vowels. The results of the native speakers showed that VOT is a reliable acoustic correlate of emphasis in MSA. The results also showed that vowels in the emphatic context have higher F1 and F3 and lower F2. The results showed that the L2 learners produced comparable VOT values to those of native Arabic speakers. Further, L2 learners produced a significantly lower F2 of the vowels in the emphatic context than that in the plain context. Proficiency in Arabic played a role on the F2 measure; the intermediate learners tended to be more native-like than the beginning learners. As for F3, the results of the L2 learners unexpectedly showed that the beginning learners produced a higher F3 in the context of fricatives only. This suggests that the relationship between emphasis and proficiency depends on whether the preceding consonant is a stop or fricative.

RevDate: 2023-02-23

Ali IE, Sumita Y, N Wakabayashi (2023)

Comparison of Praat and Computerized Speech Lab for formant analysis of five Japanese vowels in maxillectomy patients.

Frontiers in neuroscience, 17:1098197.

INTRODUCTION: Speech impairment is a common complication after surgical resection of maxillary tumors. Maxillofacial prosthodontists play a critical role in restoring this function so that affected patients can enjoy better lives. For that purpose, several acoustic software packages have been used for speech evaluation, among which Computerized Speech Lab (CSL) and Praat are widely used in clinical and research contexts. Although CSL is a commercial product, Praat is freely available on the internet and can be used by patients and clinicians to practice several therapy goals. Therefore, this study aimed to determine if both software produced comparable results for the first two formant frequencies (F1 and F2) and their respective formant ranges obtained from the same voice samples from Japanese participants with maxillectomy defects.

METHODS: CSL was used as a reference to evaluate the accuracy of Praat with both the default and newly proposed adjusted settings. Thirty-seven participants were enrolled in this study for formant analysis of the five Japanese vowels (a/i/u/e/o) using CSL and Praat. Spearman's rank correlation coefficient was used to judge the correlation between the analysis results of both programs regarding F1 and F2 and their respective formant ranges.

RESULTS: As the findings pointed out, highly positive correlations between both software were found for all acoustic features and all Praat settings.

DISCUSSION: The strong correlations between the results of both CSL and Praat suggest that both programs may have similar decision strategies for atypical speech and for both sexes. This study highlights that the default settings in Praat can be used for formant analysis in maxillectomy patients with predictable accuracy. The proposed adjusted settings in Praat can yield more accurate results for formant analysis of atypical speech in maxillectomy cases when the examiner cannot precisely locate the formant frequencies using the default settings or confirm analysis results obtained using CSL.

RevDate: 2023-02-07

Zhang C, Hou Q, Guo TT, et al (2023)

[The effect of Wendler Glottoplasty to elevate vocal pitch in transgender women].

Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery, 58(2):139-144.

Objective: To evaluate the effect of Wendler Glottoplasty to elevate vocal pitch in transgender women. Methods: The voice parameters of pre-and 3-month post-surgery of 29 transgender women who underwent Wendler Glottoplasty in department of otorhinolaryngology head and neck surgery of Beijing Friendship Hospital from January, 2017 to October, 2020 were retrospectively analyzed. The 29 transgender women ranged in age from 19-47 (27.0±6.3) years old. Subjective evaluation was performed using Transsexual Voice Questionnaire for Male to Female (TVQ[MtF]). Objective parameters included fundamental frequency (F0), highest pitch, lowest pitch, habitual volume, Jitter, Shimmer, maximal phonation time (MPT), noise to harmonic ratio (NHR) and formants frequencies(F1, F2, F3, F4). SPSS 25.0 software was used for statistically analysis. Results: Three months after surgery, the score of TVQ[MtF] was significantly decreased [(89.9±14.7) vs. (50.4±13.6), t=11.49, P<0.001]. The F0 was significantly elevated [(152.7±23.3) Hz vs. (207.7±45.9) Hz, t=-6.03, P<0.001]. Frequencies of F1, F2 and F3 were significantly elevated. No statistical difference was observed in the frequencies of F4. The highest pitch was not significantly altered while the lowest pitch was significantly elevated [(96.8±17.7) Hz vs. (120.0±28.9) Hz, t=-3.71, P=0.001]. Habitual speech volume was significantly increased [(60.0±5.2) dB vs. (63.6±9.6) dB, t=-2.12, P=0.043]. Jitter, Shimmer, NHR and MPT were not obviously altered (P>0.05). Conclusions: Wendler Glottoplasty could notably elevate the vocal pitch, formants frequencies and degree of vocal femininity in transgender women without affecting phonation ability and voice quality. It can be an effective treatment modality for voice feminization.

RevDate: 2023-02-06

Gunjawate DR, Ravi R, Tauro JP, et al (2022)

Spectral and Temporal Characteristics of Vowels in Konkani.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 74(Suppl 3):4870-4879.

The present study was undertaken to study the acoustic characteristics of vowels using spectrographic analysis in Mangalorean Catholic Konkani dialect of Konkani spoken in Mangalore, Karnataka, India. Recordings were done using CVC words in 11 males and 19 females between the age range of 18-55 years. The CVC words consisted of combinations of vowels such as (/i, i:, e, ɵ, ə, u, o, ɐ, ӓ, ɔ/) and consonants such as (/m, k, w, s, ʅ, h, l, r, p, ʤ, g, n, Ɵ, ṭ, ḷ, b, dh/). Recordings were done in a sound-treated room using PRAAT software and spectrographic analysis was done and spectral and temporal characteristics such as fundamental frequency (F0), formants (F1, F2, F3) and vowel duration. The results showed that higher fundamental frequency values were observed for short, high and back vowels. Higher F1 values were noted for open vowels and F2 was higher for front vowels. Long vowels had longer duration compared to short vowels and females had longer vowel duration compared to males. The acoustic information in terms of spectral and temporal cues helps in better understanding the production and perception of languages and dialects.

RevDate: 2023-02-06

Prakash P, Boominathan P, S Mahalingam (2022)

Acoustic Description of Bhramari Pranayama.

Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India, 74(Suppl 3):4738-4747.

UNLABELLED: The study's aim was (1) To describe the acoustic characteristics of Bhramari pranayama, and (2) to compare the acoustic features of nasal consonant /m/ and the sound of Bhramari pranayama produced by yoga trainers. Cross-sectional study design. Thirty-three adult male yoga trainers performed five repeats of nasal consonant /m/ and Bhramari pranayama. These samples were recorded into Computerized Speech Lab, Kay Pentax model 4500b using a microphone (SM48). Formant frequencies (f F1, f F2, f F3, & f F4), formant bandwidths (BF1, BF2, BF3, & BF4), anti-formant, alpha and beta ratio were analyzed. Nasal consonant /m/ had higher f F2 and anti-formant compared to Bhramari pranayama. Statistical significant differences were noted in f F2, BF3, and anti-formants. Bhramari pranayama revealed a low alpha ratio and a higher beta ratio than /m/. However, these differences were not statistically significant. Findings are discussed from acoustic and physiological perspectives. Bhramari pranayama was assumed to be produced with a larger pharyngeal cavity and narrower velar passage when compared to nasal consonant /m/. Verification at the level of the glottis and with aerodynamic parameters may ascertain the above propositions.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12070-021-03054-1.

RevDate: 2023-02-02

Kondaurova MV, Zheng Q, Donaldson CW, et al (2023)

Effect of telepractice on pediatric cochlear implant users and provider vowel space: A preliminary report.

The Journal of the Acoustical Society of America, 153(1):467.

Clear speaking styles are goal-oriented modifications in which talkers adapt acoustic-phonetic characteristics of speech to compensate for communication challenges. Do children with hearing loss and a clinical provider modify speech characteristics during telepractice to adjust for remote communication? The study examined the effect of telepractice (tele-) on vowel production in seven (mean age 4:11 years, SD 1:2 years) children with cochlear implants (CIs) and a provider. The first (F1) and second (F2) formant frequencies of /i/, /ɑ/, and /u/ vowels were measured in child and provider speech during one in-person and one tele-speech-language intervention, order counterbalanced. Child and provider vowel space areas (VSA) were calculated. The results demonstrated an increase in F2 formant frequency for /i/ vowel in child and provider speech and an increase in F1 formant frequency for /ɑ/ vowel in the provider speech during tele- compared to in-person intervention. An expansion of VSA was found in child and provider speech in tele- compared to in-person intervention. In children, the earlier age of CI activation was associated with larger VSA in both tele- and in-person intervention. The results suggest that the children and the provider adjust vowel articulation in response to remote communication during telepractice.

RevDate: 2023-01-31

Kirby J, Pittayaporn P, M Brunelle (2023)

Transphonologization of onset voicing: revisiting Northern and Eastern Kmhmu'.

Phonetica [Epub ahead of print].

Phonation and vowel quality are often thought to play a vital role at the initial stage of tonogenesis. This paper investigates the production of voicing and tones in a tonal Northern Kmhmu' dialect spoken in Nan Province, Thailand, and a non-tonal Eastern Kmhmu' dialect spoken in Vientiane, Laos, from both acoustic and electroglottographic perspectives. Large and consistent VOT differences between voiced and voiceless stops are preserved in Eastern Kmhmu', but are not found in Northern Kmhmu', consistent with previous reports. With respect to pitch, f0 is clearly a secondary property of the voicing contrast in Eastern Kmhmu', but unquestionably the primary contrastive property in Northern Kmhmu'. Crucially, no evidence is found to suggest that either phonation type or formant differences act as significant cues to voicing in Eastern Kmhmu' or tones in Northern Kmhmu'. These results suggests that voicing contrasts can also be transphonologized directly into f0-based contrasts, skipping a registral stage based primarily on phonation and/or vowel quality.

RevDate: 2023-01-30

Viegas F, Camargo Z, Viegas D, et al (2023)

Acoustic Measurements of Speech and Voice in Men with Angle Class II, Division 1, Malocclusion.

International archives of otorhinolaryngology, 27(1):e10-e15.

Introduction The acoustic analysis of speech (measurements of the fundamental frequency and formant frequencies) of different vowels produced by speakers with the Angle class II, division 1, malocclusion can provide information about the relationship between articulatory and phonatory mechanisms in this type of maxillomandibular disproportion. Objectives To investigate acoustic measurements related to the fundamental frequency (F0) and formant frequencies (F1 and F2) of the oral vowels of Brazilian Portuguese (BP) produced by male speakers with Angle class II, division 1, malocclusion (study group) and compare with men with Angle class I malocclusion (control group). Methods In total, 60 men (20 with class II, 40 with class I) aged between 18 and 40 years were included in the study. Measurements of F0, F1 and F2 of the seven oral vowels of BP were estimated from the audio samples containing repetitions of carrier sentences. The statistical analysis was performed using the Student t -test and the effect size was calculated. Results Significant differences (p -values) were detected for F0 values in five vowels ([e], [i], [ᴐ], [o] and [u]), and for F1 in vowels [a] and [ᴐ], with high levels for class II, division 1. Conclusion Statistical differences were found in the F0 measurements with higher values in five of the seven vowels analysed in subjects with Angle class II, division 1. The formant frequencies showed differences only in F1 in two vowels with higher values in the study group. The data suggest that data on voice and speech production must be included in the protocol's assessment of patients with malocclusion.

RevDate: 2023-01-30

Freeman V (2023)

Production and perception of prevelar merger: Two-dimensional comparisons using Pillai scores and confusion matrices.

Journal of phonetics, 97:.

Vowel merger production is quantified with gradient acoustic measures, while phonemic perception methods are often coarser, complicating comparisons within mergers in progress. This study implements a perception experiment in two-dimensional formant space (F1 × F2), allowing unified plotting, quantification, and statistics with production data. Production and perception are compared within 20 speakers for a two-part prevelar merger in progress in Pacific Northwest English, where mid-front /ɛ, e/ approximate or merge before voiced velar /ɡ/ (leg-vague merger), and low-front prevelar /æɡ/ raises toward them (bag-raising). Distributions are visualized with kernel density plots and overlap quantified with Pillai scores and confusion matrices from linear discriminant analysis models. Results suggest that leg-vague merger is perceived as more complete than it is produced (in both the sample and community), while bag-raising is highly variable in production but rejected in perception. Relationships between production and perception varied by age, with raising and merger progressing across two generations in production but not perception, followed by younger adults perceiving leg-vague merger but not producing it and varying in (minimal) raising perception while varying in bag-raising in production. Thus, prevelar raising/merger may be progressing among some social groups but reversing in others.

RevDate: 2023-01-26

Holmes E, IS Johnsrude (2023)

Intelligibility benefit for familiar voices is not accompanied by better discrimination of fundamental frequency or vocal tract length.

Hearing research, 429:108704 pii:S0378-5955(23)00016-3 [Epub ahead of print].

Speech is more intelligible when it is spoken by familiar than unfamiliar people. If this benefit arises because key voice characteristics like perceptual correlates of fundamental frequency or vocal tract length (VTL) are more accurately represented for familiar voices, listeners may be able to discriminate smaller manipulations to such characteristics for familiar than unfamiliar voices. We measured participants' (N = 17) thresholds for discriminating pitch (correlate of fundamental frequency, or glottal pulse rate) and formant spacing (correlate of VTL; 'VTL-timbre') for voices that were familiar (participants' friends) and unfamiliar (other participants' friends). As expected, familiar voices were more intelligible. However, discrimination thresholds were no smaller for the same familiar voices. The size of the intelligibility benefit for a familiar over an unfamiliar voice did not relate to the difference in discrimination thresholds for the same voices. Also, the familiar-voice intelligibility benefit was just as large following perceptible manipulations to pitch and VTL-timbre. These results are more consistent with cognitive accounts of speech perception than traditional accounts that predict better discrimination.

RevDate: 2023-01-23

Ettore E, Müller P, Hinze J, et al (2023)

Digital Phenotyping for Differential Diagnosis of Major Depressive Episode: Narrative Review.

JMIR mental health, 10:e37225 pii:v10i1e37225.

BACKGROUND: Major depressive episode (MDE) is a common clinical syndrome. It can be found in different pathologies such as major depressive disorder (MDD), bipolar disorder (BD), posttraumatic stress disorder (PTSD), or even occur in the context of psychological trauma. However, only 1 syndrome is described in international classifications (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition [DSM-5]/International Classification of Diseases 11th Revision [ICD-11]), which do not take into account the underlying pathology at the origin of the MDE. Clinical interviews are currently the best source of information to obtain the etiological diagnosis of MDE. Nevertheless, it does not allow an early diagnosis and there are no objective measures of extracted clinical information. To remedy this, the use of digital tools and their correlation with clinical symptomatology could be useful.

OBJECTIVE: We aimed to review the current application of digital tools for MDE diagnosis while highlighting shortcomings for further research. In addition, our work was focused on digital devices easy to use during clinical interview and mental health issues where depression is common.

METHODS: We conducted a narrative review of the use of digital tools during clinical interviews for MDE by searching papers published in PubMed/MEDLINE, Web of Science, and Google Scholar databases since February 2010. The search was conducted from June to September 2021. Potentially relevant papers were then compared against a checklist for relevance and reviewed independently for inclusion, with focus on 4 allocated topics of (1) automated voice analysis, behavior analysis by (2) video and physiological measures, (3) heart rate variability (HRV), and (4) electrodermal activity (EDA). For this purpose, we were interested in 4 frequently found clinical conditions in which MDE can occur: (1) MDD, (2) BD, (3) PTSD, and (4) psychological trauma.

RESULTS: A total of 74 relevant papers on the subject were qualitatively analyzed and the information was synthesized. Thus, a digital phenotype of MDE seems to emerge consisting of modifications in speech features (namely, temporal, prosodic, spectral, source, and formants) and in speech content, modifications in nonverbal behavior (head, hand, body and eyes movement, facial expressivity, and gaze), and a decrease in physiological measurements (HRV and EDA). We not only found similarities but also differences when MDE occurs in MDD, BD, PTSD, or psychological trauma. However, comparative studies were rare in BD or PTSD conditions, which does not allow us to identify clear and distinct digital phenotypes.

CONCLUSIONS: Our search identified markers from several modalities that hold promise for helping with a more objective diagnosis of MDE. To validate their potential, further longitudinal and prospective studies are needed.

RevDate: 2023-01-21

Aoyama K, Hong L, Flege JE, et al (2023)

Relationships Between Acoustic Characteristics and Intelligibility Scores: A Reanalysis of Japanese Speakers' Productions of American English Liquids.

Language and speech [Epub ahead of print].

The primary purpose of this research report was to investigate the relationships between acoustic characteristics and perceived intelligibility for native Japanese speakers' productions of American English liquids. This report was based on a reanalysis of intelligibility scores and acoustic analyses that were reported in two previous studies. We examined which acoustic parameters were associated with higher perceived intelligibility scores for their productions of /l/ and /ɹ/ in American English, and whether Japanese speakers' productions of the two liquids were acoustically differentiated from each other. Results demonstrated that the second formant (F2) was strongly correlated with the perceived intelligibility scores for the Japanese adults' productions. Results also demonstrated that the Japanese adults' and children's productions of /l/ and /ɹ/ were indeed differentiated by some acoustic parameters including the third formant (F3). In addition, some changes occurred in the Japanese children's productions over the course of 1 year. Overall, the present report shows that Japanese speakers of American English may be making a distinction between /l/ and /ɹ/ in production, although the distinctions are made in a different way compared with native English speakers' productions. These findings have implications for setting realistic goals for improving intelligibility of English /l/ and /ɹ/ for Japanese speakers, as well as theoretical advancement of second-language speech learning.

RevDate: 2023-01-06

Sahin S, B Sen Yilmaz (2023)

Effects of the Orthognathic Surgery on the Voice Characteristics of Skeletal Class III Patients.

The Journal of craniofacial surgery, 34(1):253-257.

OBJECTIVES: To analyze the effects of the bimaxillary orthognathic surgery on the voice characteristics of skeletal Class III cases, and to evaluate correlations between acoustic and skeletal changes.

METHOD: Skeletal Class III adult patients (7 male, 18 female) were asked to pronounce the sounds "[a], [ɛ], [ɯ], [i], [ɔ], [œ], [u], [y]" for 3 seconds. Voice records and lateral cephalometric x-rays were taken before the surgery (T0) and 6 months after (T1). Voice records were taken for the control group with 6 months of interval (n=20). The formant frequencies (F0, F1, F2, and F3), Shimmer, Jitter and Noise to Harmonic Ratio (NHR) parameters were considered with Praat version 6.0.43.

RESULTS: In the surgery group, significant differences were observed in the F1 of [e], F2 and Shimmer of [ɯ] and F1 and F2 of [œ] and F1 of [y] sound, the post-surgery values were lower. F3 of [u] sound was higher. In comparison with the control group, ΔF3 of the [ɔ], ΔF3 of the [u] and ΔF1 of the [y] sound, ΔShimmer of [ɛ], [ɯ], [i], [ɔ], [u] and [y], and the ΔNHR of [ɔ] sound significantly changed. The Pearson correlation analysis proved some correlations; ΔF2 between ΔSNA for [ɯ] and [œ] sounds, ΔF1 between ΔHBV for [y] sound.

CONCLUSION: Bimaxillary orthognathic surgery changed some voice parameters in skeletal Class III patients. Some correlations were found between skeletal and acoustic parameters. We advise clinicians to consider these findings and inform their patients.

RevDate: 2023-01-03

Kim S, Choi J, T Cho (2023)

Data on English coda voicing contrast under different prosodic conditions produced by American English speakers and Korean learners of English.

Data in brief, 46:108816.

This data article provides acoustic data for individual speakers' production of coda voicing contrast between stops in English, which are based on laboratory speech recorded by twelve native speakers of American English and twenty-four Korean learners of English. There were four pairs of English monosyllabic target words with voicing contrast in the coda position (bet-bed, pet-ped, bat-bad, pat-pad). The words were produced in carrier sentences in which they were placed in two different prosodic boundary conditions (Intonational Phrase initial and Intonation Phrase medial), two pitch accent conditions (nuclear-pitch accented and unaccented), and three focus conditions (lexical focus, phonological focus and no focus). The raw acoustic measurement values that are included in a CSV-formated file are F0, F1, F2 and duration of each vowel preceding a coda consonant; and Voice Onset Time of word-initial stops. This article also provides figures that exemplify individual speaker variation of vowel duration, F0, F1 and F2 as a function of focus conditions. The data can thus be potentially reused to observe individual variations in phonetic encoding of coda voicing contrast as a function of the aforementioned prosodically-conditioned factors (i.e., prosodic boundary, pitch accent, focus) in native vs. non-native English. Some theoretical aspects of the data are discussed in the full-length article entitled "Phonetic encoding of coda voicing contrast under different focus conditions in L1 vs. L2 English" [1].

RevDate: 2022-12-31

Herbst CT, BH Story (2022)

Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing.

The Journal of the Acoustical Society of America, 152(6):3548.

A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances (fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.

RevDate: 2022-12-29

Ji Y, Hu Y, X Jiang (2022)

Segmental and suprasegmental encoding of speaker confidence in Wuxi dialect vowels.

Frontiers in psychology, 13:1028106.

INTRODUCTION: Wuxi dialect is a variation of Wu dialect spoken in eastern China and is characterized by a rich tonal system. Compared with standard Mandarin speakers, those of Wuxi dialect as their mother tongue can be more efficient in varying vocal cues to encode communicative meanings in speech communication. While literature has demonstrated that speakers encode high vs. low confidence in global prosodic cues at the sentence level, it is unknown how speakers' intended confidence is encoded at a more local, phonetic level. This study aimed to explore the effects of speakers' intended confidence on both prosodic and formant features of vowels in two lexical tones (the flat tone and the contour tone) of Wuxi dialect.

METHODS: Words of a single vowel were spoken in confident, unconfident, or neutral tone of voice by native Wuxi dialect speakers using a standard elicitation procedure. Linear-mixed effects modeling and parametric bootstrapping testing were performed.

RESULTS: The results showed that (1) the speakers raised both F1 and F2 in the confident level (compared with the neutral-intending expression). Additionally, F1 can distinguish between the confident and unconfident expressions; (2) Compared with the neutral-intending expression, the speakers raised mean f0, had a greater variation of f0 and prolonged pronunciation time in the unconfident level while they raised mean intensity, had a greater variation of intensity and prolonged pronunciation time in the confident level. (3) The speakers modulated mean f0 and mean intensity to a larger extent on the flat tone than the contour tone to differentiate between levels of confidence in the voice, while they modulated f0 and intensity range more only on the contour tone.

DISCUSSION: These findings shed new light on the mechanisms of segmental and suprasegmental encoding of speaker confidence and lack of confidence at the vowel level, highlighting the interplay of lexical tone and vocal expression in speech communication.

RevDate: 2022-12-26

Grawunder S, Uomini N, Samuni L, et al (2023)

Expression of concern: 'Chimpanzee vowel-like sounds and voice quality suggest formant space expansion through the hominoid lineage' (2022) by Grawunder et al.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 378(1870):20220476.

RevDate: 2022-12-12

Moya-Galé G, Wisler AA, Walsh SJ, et al (2022)

Acoustic Predictors of Ease of Understanding in Spanish Speakers With Dysarthria Associated With Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: The purpose of this study was to examine selected baseline acoustic features of hypokinetic dysarthria in Spanish speakers with Parkinson's disease (PD) and identify potential acoustic predictors of ease of understanding in Spanish.

METHOD: Seventeen Spanish-speaking individuals with mild-to-moderate hypokinetic dysarthria secondary to PD and eight healthy controls were recorded reading a translation of the Rainbow Passage. Acoustic measures of vowel space area, as indicated by the formant centralization ratio (FCR), envelope modulation spectra (EMS), and articulation rate were derived from the speech samples. Additionally, 15 healthy adults rated ease of understanding of the recordings on a visual analogue scale. A multiple linear regression model was implemented to investigate the predictive value of the selected acoustic parameters on ease of understanding.

RESULTS: Listeners' ease of understanding was significantly lower for speakers with dysarthria than for healthy controls. The FCR, EMS from the first 10 s of the reading passage, and the difference in EMS between the end and the beginning sections of the passage differed significantly between the two groups of speakers. Findings indicated that 67.7% of the variability in ease of understanding was explained by the predictive model, suggesting a moderately strong relationship between the acoustic and perceptual domains.

CONCLUSIONS: Measures of envelope modulation spectra were found to be highly significant model predictors of ease of understanding of Spanish-speaking individuals with hypokinetic dysarthria associated with PD. Articulation rate was also found to be important (albeit to a lesser degree) in the predictive model. The formant centralization ratio should be further examined with a larger sample size and more severe dysarthria to determine its efficacy in predicting ease of understanding.

RevDate: 2022-12-08

Peng H, Li S, Xing J, et al (2022)

Surface plasmon resonance of Au/Ag metals for the photoluminescence enhancement of lanthanide ion Ln[3+] doped upconversion nanoparticles in bioimaging.

Journal of materials chemistry. B [Epub ahead of print].

Deep tissue penetration, chemical inertness and biocompatibility give UCNPs a competitive edge over traditional fluorescent materials like organic dyes or quantum dots. However, the low quantum efficiency of UNCPs becomes an obstacle. Among extensive methods and strategies currently used to prominently solve this concerned issue, surface plasmon resonance (SPR) of noble metals is of great use due to the agreement between the SPR peak of metals and absorption band of UCNPs. A key challenge of this match is that the structures and sizes of noble metals have significant influences on the peak of SPR formants, where achieving an explicit elucidation of relationships between the physical properties of noble metals and their SPR formants is of great importance. This review aims to clarify the mechanism of the SPR effect of noble metals on the optical performance of UCNPs. Furthermore, novel research studies in which Au, Ag or Au/Ag composites in various structures and sizes are combined with UCNPs through different synthetic methods are summarized. We provide an overview of improved photoluminescence for bioimaging exhibited by different composite nanoparticles with respect to UCNPs acting as both cores and shells, taking Au@UCNPs, Ag@UCNPs and Au/Ag@UCNPs into account. Finally, there are remaining shortcomings and latent opportunities which deserve further research. This review will provide directions for the bioimaging applications of UCNPs through the introduction of the SPR effect of noble metals.

RevDate: 2022-12-02

Wang Y, Hattori M, Liu R, et al (2022)

Digital acoustic analysis of the first three formant frequencies in patients with a prosthesis after maxillectomy.

The Journal of prosthetic dentistry pii:S0022-3913(22)00654-0 [Epub ahead of print].

STATEMENT OF PROBLEM: Prosthetic rehabilitation with an obturator can help to restore or improve the intelligibility of speech in patients after maxillectomy. The frequency of formants 1 and 2 as well as their ranges were initially reported in patients with maxillary defects in 2002, and the evaluation method that was used is now applied in clinical evaluation. However, the details of formant 3 are not known and warrant investigation because, according to speech science, formant 3 is related to the pharyngeal volume. Clarifying the formant frequency values of formant 3 in patients after maxillectomy would enable prosthodontists to refer to these data when planning treatment and when assessing the outcome of an obturator.

PURPOSE: The purpose of this clinical study was to determine the acoustic characteristics of formant 3, together with those of formants 1 and 2, by using a digital acoustic analysis during maxillofacial prosthetic treatment. The utility of determining formant 3 in the evaluation of speech in patients after maxillectomy was also evaluated.

MATERIAL AND METHODS: Twenty-six male participants after a maxillectomy (mean age, 63 years; range, 20 to 93 years) were included, and the 5 Japanese vowels /a/, /e/, /i/, /o/, and /u/ produced with and without a definitive obturator prosthesis were recorded. The frequencies of the 3 formants were determined, and their ranges were calculated by using a speech analysis system (Computerized Speech Lab CSL 4400). The Wilcoxon signed rank test was used to compare the formants between the 2 use conditions (α=0.05).

RESULTS: Significant differences were found in the frequencies and ranges of all 3 formants between the use conditions. The ranges of all 3 formants produced with the prosthesis were significantly greater than those produced without it.

CONCLUSIONS: Based on the findings, both the first 2 formants and the third formant were changed by wearing an obturator prosthesis. Because formant 3 is related to the volume of the pharynx, evaluation of this formant and its range can reflect the effectiveness of the prosthesis to seal the oronasal communication and help reduce hypernasality, suggesting the utility of formant 3 analysis in prosthodontic rehabilitation.

RevDate: 2022-12-05
CmpDate: 2022-12-05

Voeten CC, Heeringa W, H Van de Velde (2022)

Normalization of nonlinearly time-dynamic vowels.

The Journal of the Acoustical Society of America, 152(5):2692.

This study compares 16 vowel-normalization methods for purposes of sociophonetic research. Most of the previous work in this domain has focused on the performance of normalization methods on steady-state vowels. By contrast, this study explicitly considers dynamic formant trajectories, using generalized additive models to model these nonlinearly. Normalization methods were compared using a hand-corrected dataset from the Flemish-Dutch Teacher Corpus, which contains 160 speakers from 8 geographical regions, who spoke regionally accented versions of Netherlandic/Flemish Standard Dutch. Normalization performance was assessed by comparing the methods' abilities to remove anatomical variation, retain vowel distinctions, and explain variation in the normalized F0-F3. In addition, it was established whether normalization competes with by-speaker random effects or supplements it, by comparing how much between-speaker variance remained to be apportioned to random effects after normalization. The results partly reproduce the good performance of Lobanov, Gerstman, and Nearey 1 found earlier and generally favor log-mean and centroid methods. However, newer methods achieve higher effect sizes (i.e., explain more variance) at only marginally worse performances. Random effects were found to be equally useful before and after normalization, showing that they complement it. The findings are interpreted in light of the way that the different methods handle formant dynamics.

RevDate: 2022-12-01

Leyns C, Daelman J, Adriaansen A, et al (2022)

Short-Term Acoustic Effects of Speech Therapy in Transgender Women: A Randomized Controlled Trial.

American journal of speech-language pathology [Epub ahead of print].

PURPOSE: This study measured and compared the acoustic short-term effects of pitch elevation training (PET) and articulation-resonance training (ART) and the combination of both programs, in transgender women.

METHOD: A randomized controlled study with cross-over design was used. Thirty transgender women were included and received 14 weeks of speech training. All participants started with 4 weeks of sham training; after which they were randomly assigned to one of two groups: One group continued with PET (5 weeks), followed by ART (5 weeks); the second group received both trainings in opposite order. Participants were recorded 4 times, in between the training blocks: pre, post 1 (after sham), post 2 (after training 1), and post 3 (after training 2). Speech samples included a sustained vowel, continuous speech during reading, and spontaneous speech and were analyzed using Praat software. Fundamental frequency (f o), intensity, voice range profile, vowel formant frequencies (F 1-2-3-4-5 of /a/-/i/-/u/), formant contrasts, vowel space, and vocal quality (Acoustic Voice Quality Index) were determined.

RESULTS AND CONCLUSIONS: Fundamental frequencies increased after both the PET and ART program, with a higher increase after PET. The combination of both interventions showed a mean increase of the f o of 49 Hz during a sustained vowel, 49 Hz during reading, and 29 Hz during spontaneous speech. However, the lower limit (percentile 5) of the f o during spontaneous speech did not change. Higher values were detected for F 1-2 of /a/, F 3 of /u/, and vowel space after PET and ART separately. F 1-2-3 of /a/, F 1-3-4 of /u/, vowel space, and formant contrasts increased after the combination of PET and ART; hence, the combination induced more increases in formant frequencies. Intensity and voice quality measurements did not change. No order effect was detected; that is, starting with PET or ART did not change the outcome.

RevDate: 2022-11-26

Chen S, Han C, Wang S, et al (2022)

Hearing the physical condition: The relationship between sexually dimorphic vocal traits and underlying physiology.

Frontiers in psychology, 13:983688.

A growing amount of research has shown associations between sexually dimorphic vocal traits and physiological conditions related to reproductive advantage. This paper presented a review of the literature on the relationship between sexually dimorphic vocal traits and sex hormones, body size, and physique. Those physiological conditions are important in reproductive success and mate selection. Regarding sex hormones, there are associations between sex-specific hormones and sexually dimorphic vocal traits; about body size, formant frequencies are more reliable predictors of human body size than pitch/fundamental frequency; with regard to the physique, there is a possible but still controversial association between human voice and strength and combat power, while pitch is more often used as a signal of aggressive intent in conflict. Future research should consider demographic, cross-cultural, cognitive interaction, and emotional motivation influences, in order to more accurately assess the relationship between voice and physiology. Moreover, neurological studies were recommended to gain a deeper understanding of the evolutionary origins and adaptive functions of voice modulation.

RevDate: 2022-11-21
CmpDate: 2022-11-21

Eichner ACO, Donadon C, Skarżyński PH, et al (2022)

A Systematic Review of the Literature Between 2009 and 2019 to Identify and Evaluate Publications on the Effects of Age-Related Hearing Loss on Speech Processing.

Medical science monitor : international medical journal of experimental and clinical research, 28:e938089 pii:938089.

Changes in central auditory processing due to aging in normal-hearing elderly patients, as well as age-related hearing loss, are often associated with difficulties in speech processing, especially in unfavorable acoustic environments. Speech processing depends on the perception of temporal and spectral features, and for this reason can be assessed by recordings of phase-locked neural activity when synchronized to transient and periodic sound stimuli frequency-following responses (FFRs). An electronic search of the PubMed and Web of Science databases was carried out in July 2019. Studies that evaluated the effects of age-related hearing loss on components of FFRs were included. Studies that were not in English, studies performed on animals, studies with cochlear implant users, literature reviews, letters to the editor, and case studies were excluded. Our search yielded 6 studies, each of which included 30 to 94 subjects aged between 18 and 80 years. Latency increases and significant amplitude reduction of the onset, offset, and sloop V/A components of FFRs were observed. Latency and amplitude impairment of the fundamental frequency, first formant, and high formants were related to peripheral sensorineural hearing loss in the elderly population. Conclusions: Temporal changes in FFR tracing were related to the aging process. Hearing loss also impacts the envelope fine structure, producing poorer speech comprehension in noisy environments. More research is needed to understand aspects related to hearing loss and cognitive aspects common to the elderly.

RevDate: 2022-11-14

Raveendran R, K Yeshoda (2022)

Effects of Resonant Voice Therapy on Perceptual and Acoustic Source and Tract Parameters - A Preliminary Study on Indian Carnatic Classical Singers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00299-5 [Epub ahead of print].

PURPOSE: The aim of the study was to examine the effects of resonant voice therapy (RVT) on the vocal resonance of trained Carnatic singers. The specific objectives were to evaluate the effects of resonant voice therapy on the auditory perceptual judgments and acoustic source and tract parameters before and after RVT on phonation and sung voice samples.

METHOD: Six vocally healthy trained Carnatic singers, three males and three females aged 18-25 years (M = 23; S.D = 2.09) participated in the study. All the participants were assigned to a 21-days-long Resonance Voice Therapy (RVT) training program. The participants' pre and post training phonation and sung samples were subjected to auditory perceptual analysis and acoustic analysis.

RESULTS: The results revealed that the post training auditory perceptual ratings of the phonation task showed a statistically significant difference from the pre training scores (Z= 2.35; P = 0.019). While for the singing task, the post training perceptual ratings were not significantly different from the pre training perceptual rating scores (Z= 2.66; P = 0.08). A significant difference was observed between the pre and post training values for all the measured acoustic parameters of the phonation task. In singing task, though the fundamental frequency, third and fourth formant frequencies showed no significant difference in the pre and post training conditions (P > 0.05), the parameter of- difference between the first formant frequency and the fundamental frequency showed a significant decrease (P = 0.028).

CONCLUSION: The effects of resonant voice production led to a high vocal economy, as evidenced from the improved source and filter acoustic parameters. Indication for formant tuning through vocal tract modifications, probably an enlarged pharyngeal area resulting in increased resonant voice quality in both phonation and singing tasks, is inferred from these results.

RevDate: 2022-11-17
CmpDate: 2022-11-15

Rocchesso D, Andolina S, Ilardo G, et al (2022)

A perceptual sound space for auditory displays based on sung-vowel synthesis.

Scientific reports, 12(1):19370.

When designing displays for the human senses, perceptual spaces are of great importance to give intuitive access to physical attributes. Similar to how perceptual spaces based on hue, saturation, and lightness were constructed for visual color, research has explored perceptual spaces for sounds of a given timbral family based on timbre, brightness, and pitch. To promote an embodied approach to the design of auditory displays, we introduce the Vowel-Type-Pitch (VTP) space, a cylindrical sound space based on human sung vowels, whose timbres can be synthesized by the composition of acoustic formants and can be categorically labeled. Vowels are arranged along the circular dimension, while voice type and pitch of the vowel correspond to the remaining two axes of the cylindrical VTP space. The decoupling and perceptual effectiveness of the three dimensions of the VTP space are tested through a vowel labeling experiment, whose results are visualized as maps on circular slices of the VTP cylinder. We discuss implications for the design of auditory and multi-sensory displays that account for human perceptual capabilities.

RevDate: 2022-11-26

Yoon TJ, S Ha (2022)

Adults' Perception of Children's Vowel Production.

Children (Basel, Switzerland), 9(11):.

The study examined the link between Korean-speaking children's vowel production and its perception by inexperienced adults and also observed whether ongoing vowel changes in mid-back vowels affect adults' perceptions when the vowels are produced by children. This study analyzed vowels in monosyllabic words produced by 20 children, ranging from 2 to 6 years old, with a focus on gender distinction, and used them as perceptual stimuli for word perception by 20 inexperienced adult listeners. Acoustic analyses indicated that F0 was not a reliable cue for distinguishing gender, but the first two formants served as reliable cues for gender distinction. The results confirmed that the spacing of the two low formants is linguistically and para-linguistically important in identifying vowel types and gender. However, a pair of non-low back vowels caused difficulties in correct vowel identification. Proximal distance between the vowels could be interpreted to result in the highest mismatch between children's production and adults' perception of the two non-low back vowels in the Korean language. We attribute the source of the highest mismatch of the two non-low back vowels to the ongoing sound change observed in high and mid-back vowels in adult speech. The ongoing vowel change is also observed in the children's vowel space, which may well be shaped after the caregivers whose non-low back vowels are close to each other.

RevDate: 2022-11-17

Guo S, Wu W, Liu Y, et al (2022)

Effects of Valley Topography on Acoustic Communication in Birds: Why Do Birds Avoid Deep Valleys in Daqinggou Nature Reserve?.

Animals : an open access journal from MDPI, 12(21):.

To investigate the effects of valley topography on the acoustic transmission of avian vocalisations, we carried out playback experiments in Daqinggou valley, Inner Mongolia, China. During the experiments, we recorded the vocalisations of five avian species, the large-billed crow (Corvus macrorhynchos Wagler, 1827), common cuckoo (Cuculus canorus Linnaeus, 1758), Eurasian magpie (Pica pica Linnaeus, 1758), Eurasian tree sparrow (Passer montanus Linnaeus, 1758), and meadow bunting (Emberiza cioides Brand, 1843), at transmission distances of 30 m and 50 m in the upper and lower parts of the valley and analysed the intensity, the fundamental frequency (F0), and the first three formant frequencies (F1/F2/F3) of the sounds. We also investigated bird species diversity in the upper and lower valley. We found that: (1) at the distance of 30 m, there were significant differences in F0/F1/F2/F3 in Eurasian magpies, significant differences in F1/F2/F3 in the meadow bunting and Eurasian tree sparrow, and partially significant differences in sound frequency between the upper and lower valley in the other two species; (2) at the distance of 50 m, there were significant differences in F0/F1/F2/F3 in two avian species (large-billed crow and common cuckoo) between the upper and lower valley and partially significant differences in sound frequency between the upper and lower valley in the other three species; (2) there were significant differences in the acoustic intensities of crow, cuckoo, magpie, and bunting calls between the upper and lower valley. (3) Species number and richness were significantly higher in the upper valley than in the lower valley. We suggested that the structure of valley habitats may lead to the breakdown of acoustic signals and communication in birds to varying degrees. The effect of valley topography on acoustic communication could be one reason for animal species avoiding deep valleys.

RevDate: 2022-11-09

Kim Y, A Thompson (2022)

An Acoustic-Phonetic Approach to Effects of Face Masks on Speech Intelligibility.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

PURPOSE: This study aimed to examine the effects of wearing a face mask on speech acoustics and intelligibility, using an acoustic-phonetic analysis of speech. In addition, the effects of speakers' behavioral modification while wearing a mask were examined.

METHOD: Fourteen female adults were asked to read a set of words and sentences under three conditions: (a) conversational, mask-off; (b) conversational, mask-on; and (c) clear, mask-on. Seventy listeners rated speech intelligibility using two methods: orthographic transcription and visual analog scale (VAS). Acoustic measures for vowels included duration, first (F1) and second (F2) formant frequency, and intensity ratio of F1/F2. For consonants, spectral moment coefficients and consonant-vowel (CV) boundary (intensity ratio between consonant and vowel) were measured.

RESULTS: Face masks had a negative impact on speech intelligibility as measured by both intelligibility ratings. However, speech intelligibility was recovered in the clear speech condition for VAS but not for transcription scores. Analysis of orthographic transcription showed that listeners tended to frequently confuse consonants (particularly fricatives, affricates, and stops), rather than vowels in the word-initial position. Acoustic data indicated a significant effect of condition on CV intensity ratio only.

CONCLUSIONS: Our data demonstrate a negative effect of face masks on speech intelligibility, mainly affecting consonants. However, intelligibility can be enhanced by speaking clearly, likely driven by prosodic alterations.

RevDate: 2022-11-02

Baker CP, Sundberg J, Purdy SC, et al (2022)

Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering.

Logopedics, phoniatrics, vocology [Epub ahead of print].

Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence.Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences.Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal.Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics.

RevDate: 2022-11-02

Easwar V, Purcell D, Eeckhoutte MV, et al (2022)

The Influence of Male- and Female-Spoken Vowel Acoustics on Envelope-Following Responses.

Seminars in hearing, 43(3):223-239.

The influence of male and female vowel characteristics on the envelope-following responses (EFRs) is not well understood. This study explored the role of vowel characteristics on the EFR at the fundamental frequency (f0) in response to the vowel /ε/ (as in "head"). Vowel tokens were spoken by five males and five females and EFRs were measured in 25 young adults (21 females). An auditory model was used to estimate changes in auditory processing that might account for talker effects on EFR amplitude. There were several differences between male and female vowels in relation to the EFR. For male talkers, EFR amplitudes were correlated with the bandwidth and harmonic count of the first formant, and the amplitude of the trough below the second formant. For female talkers, EFR amplitudes were correlated with the range of f0 frequencies and the amplitude of the trough above the second formant. The model suggested that the f0 EFR reflects a wide distribution of energy in speech, with primary contributions from high-frequency harmonics mediated from cochlear regions basal to the peaks of the first and second formants, not from low-frequency harmonics with energy near f0. Vowels produced by female talkers tend to produce lower-amplitude EFR, likely because they depend on higher-frequency harmonics where speech sound levels tend to be lower. This work advances auditory electrophysiology by showing how the EFR evoked by speech relates to the acoustics of speech, for both male and female voices.

RevDate: 2022-11-21
CmpDate: 2022-10-31

Pah ND, Indrawati V, DK Kumar (2022)

Voice Features of Sustained Phoneme as COVID-19 Biomarker.

IEEE journal of translational engineering in health and medicine, 10:4901309.

BACKGROUND: The COVID-19 pandemic has resulted in enormous costs to our society. Besides finding medicines to treat those infected by the virus, it is important to find effective and efficient strategies to prevent the spreading of the disease. One key factor to prevent transmission is to identify COVID-19 biomarkers that can be used to develop an efficient, accurate, noninvasive, and self-administered screening procedure. Several COVID-19 variants cause significant respiratory symptoms, and thus a voice signal may be a potential biomarker for COVID-19 infection.

AIM: This study investigated the effectiveness of different phonemes and a range of voice features in differentiating people infected by COVID-19 with respiratory tract symptoms.

METHOD: This cross-sectional, longitudinal study recorded six phonemes (i.e., /a/, /e/, /i/, /o/, /u/, and /m/) from 40 COVID-19 patients and 48 healthy subjects for 22 days. The signal features were obtained for the recordings, which were statistically analyzed and classified using Support Vector Machine (SVM).

RESULTS: The statistical analysis and SVM classification show that the voice features related to the vocal tract filtering (e.g., MFCC, VTL, and formants) and the stability of the respiratory muscles and lung volume (Intensity-SD) were the most sensitive to voice change due to COVID-19. The result also shows that the features extracted from the vowel /i/ during the first 3 days after admittance to the hospital were the most effective. The SVM classification accuracy with 18 ranked features extracted from /i/ was 93.5% (with F1 score of 94.3%).

CONCLUSION: A measurable difference exists between the voices of people with COVID-19 and healthy people, and the phoneme /i/ shows the most pronounced difference. This supports the potential for using computerized voice analysis to detect the disease and consider it a biomarker.

RevDate: 2022-10-30
CmpDate: 2022-10-28

Choi MK, Yoo SD, EJ Park (2022)

Destruction of Vowel Space Area in Patients with Dysphagia after Stroke.

International journal of environmental research and public health, 19(20):.

Dysphagia is associated with dysarthria in stroke patients. Vowel space decreases in stroke patients with dysarthria; destruction of the vowel space is often observed. We determined the correlation of destruction of acoustic vowel space with dysphagia in stroke patients. Seventy-four individuals with dysphagia and dysarthria who had experienced stroke were enrolled. For /a/, /ae/, /i/, and /u/ vowels, we determined formant parameter (it reflects vocal tract resonance frequency as a two-dimensional coordinate point), formant centralization ratio (FCR), and quadrilateral vowel space area (VSA). Swallowing function was assessed using the videofluoroscopic dysphagia scale (VDS) during videofluoroscopic swallowing studies. Pearson's correlation and linear regression were used to determine the correlation between VSA, FCR, and VDS. Subgroups were created based on VSA; vowel space destruction groups were compared using ANOVA and Scheffe's test. VSA and FCR were negatively and positively correlated with VDS, respectively. Groups were separated based on mean and standard deviation of VSA. One-way ANOVA revealed significant differences in VDS, FCR, and age between the VSA groups and no significant differences in VDS between mild and moderate VSA reduction and vowel space destruction groups. VSA and FCR values correlated with swallowing function. Vowel space destruction has characteristics similar to VSA reduction at a moderate-to-severe degree and has utility as an indicator of dysphagia severity.

RevDate: 2022-10-30
CmpDate: 2022-10-28

Müller M, Wang Z, Caffier F, et al (2022)

New objective timbre parameters for classification of voice type and fach in professional opera singers.

Scientific reports, 12(1):17921.

Voice timbre is defined as sound color independent of pitch and volume, based on a broad frequency band between 2 and 4 kHz. Since there are no specific timbre parameters, previous studies have come to the very general conclusion that the center frequencies of the singer's formants are somewhat higher in the higher voice types than in the lower ones. For specification, a database was created containing 1723 sound examples of various voice types. The energy distribution in the frequency bands of the singer's formants was extracted for quantitative analysis. When the energy distribution function reached 50%, the corresponding absolute frequency in Hz was defined as Frequency of Half Energy (FHE). This new parameter quantifies the timbre of a singing voice as a concrete measure, independent of fundamental frequency, vowel color and volume. The database allows assigning FHE means ± SD as characteristic or comparative values for sopranos (3092 ± 284 Hz), tenors (2705 ± 221 Hz), baritones (2454 ± 206 Hz) and basses (2384 ± 164 Hz). In addition to vibrato, specific timbre parameters provide another valuable feature in vocal pedagogy for classification of voice type and fach according to the lyric or dramatic character of the voice.

RevDate: 2022-11-21
CmpDate: 2022-11-21

Hussain RO, Kumar P, NK Singh (2022)

Subcortical and Cortical Electrophysiological Measures in Children With Speech-in-Noise Deficits Associated With Auditory Processing Disorders.

Journal of speech, language, and hearing research : JSLHR, 65(11):4454-4468.

PURPOSE: The aim of this study was to analyze the subcortical and cortical auditory evoked potentials for speech stimuli in children with speech-in-noise (SIN) deficits associated with auditory processing disorder (APD) without any reading or language deficits.

METHOD: The study included 20 children in the age range of 9-13 years. Ten children were recruited to the APD group; they had below-normal scores on the speech-perception-in-noise test and were diagnosed as having APD. The remaining 10 were typically developing (TD) children and were recruited to the TD group. Speech-evoked subcortical (brainstem) and cortical (auditory late latency) responses were recorded and compared across both groups.

RESULTS: The results showed a statistically significant reduction in the amplitudes of the subcortical potentials (both for stimulus in quiet and in noise) and the magnitudes of the spectral components (fundamental frequency and the second formant) in children with SIN deficits in the APD group compared to the TD group. In addition, the APD group displayed enhanced amplitudes of the cortical potentials compared to the TD group.

CONCLUSION: Children with SIN deficits associated with APD exhibited impaired coding/processing of the auditory information at the level of the brainstem and the auditory cortex.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21357735.

RevDate: 2022-11-21
CmpDate: 2022-11-21

Bochner J, Samar V, Prud'hommeaux E, et al (2022)

Phoneme Categorization in Prelingually Deaf Adult Cochlear Implant Users.

Journal of speech, language, and hearing research : JSLHR, 65(11):4429-4453.

PURPOSE: Phoneme categorization (PC) for voice onset time and second formant transition was studied in adult cochlear implant (CI) users with early-onset deafness and hearing controls.

METHOD: Identification and discrimination tasks were administered to 30 participants implanted before 4 years of age, 21 participants implanted after 7 years of age, and 21 hearing individuals.

RESULTS: Distinctive identification and discrimination functions confirmed PC within all groups. Compared to hearing participants, the CI groups generally displayed longer/higher category boundaries, shallower identification function slopes, reduced identification consistency, and reduced discrimination performance. A principal component analysis revealed that identification consistency, discrimination accuracy, and identification function slope, but not boundary location, loaded on a single factor, reflecting general PC performance. Earlier implantation was associated with better PC performance within the early CI group, but not the late CI group. Within the early CI group, earlier implantation age but not PC performance was associated with better speech recognition. Conversely, within the late CI group, better PC performance but not earlier implantation age was associated with better speech recognition.

CONCLUSIONS: Results suggest that implantation timing within the sensitive period before 4 years of age partly determines the level of PC performance. They also suggest that early implantation may promote development of higher level processes that can compensate for relatively poor PC performance, as can occur in challenging listening conditions.

RevDate: 2022-10-23

Skrabal D, Rusz J, Novotny M, et al (2022)

Articulatory undershoot of vowels in isolated REM sleep behavior disorder and early Parkinson's disease.

NPJ Parkinson's disease, 8(1):137.

Imprecise vowels represent a common deficit associated with hypokinetic dysarthria resulting from a reduced articulatory range of motion in Parkinson's disease (PD). It is not yet unknown whether the vowel articulation impairment is already evident in the prodromal stages of synucleinopathy. We aimed to assess whether vowel articulation abnormalities are present in isolated rapid eye movement sleep behaviour disorder (iRBD) and early-stage PD. A total of 180 male participants, including 60 iRBD, 60 de-novo PD and 60 age-matched healthy controls performed reading of a standardized passage. The first and second formant frequencies of the corner vowels /a/, /i/, and /u/ extracted from predefined words, were utilized to construct articulatory-acoustic measures of Vowel Space Area (VSA) and Vowel Articulation Index (VAI). Compared to controls, VSA was smaller in both iRBD (p = 0.01) and PD (p = 0.001) while VAI was lower only in PD (p = 0.002). iRBD subgroup with abnormal olfactory function had smaller VSA compared to iRBD subgroup with preserved olfactory function (p = 0.02). In PD patients, the extent of bradykinesia and rigidity correlated with VSA (r = -0.33, p = 0.01), while no correlation between axial gait symptoms or tremor and vowel articulation was detected. Vowel articulation impairment represents an early prodromal symptom in the disease process of synucleinopathy. Acoustic assessment of vowel articulation may provide a surrogate marker of synucleinopathy in scenarios where a single robust feature to monitor the dysarthria progression is needed.

RevDate: 2022-10-20

Zhang T, He M, Li B, et al (2022)

Acoustic Characteristics of Cantonese Speech Through Protective Facial Coverings.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00269-7 [Epub ahead of print].

OBJECTIVES: Protective facial coverings (PFCs) such as surgical masks attenuate speech transmission and affect speech intelligibility, which is reported in languages such as English and German. The present study intended to verify the detrimental impacts on production of tonal languages such as Cantonese, by examining realization of speech correlates in Cantonese under PFCs including facial masks and shields.

METHODS: We recorded scripted speech in Hong Kong Cantonese produced by three adult speakers who wore various PFCs, including surgical masks, KF94 masks, and face shields (with and without surgical masks). Spectral and temporal parameters were measured, including mean intensity, speaking rate, long-term amplitude spectrum, formant frequencies of vowels, and duration and fundamental frequency (F0) of tone-bearing parts.

RESULTS: Significant changes were observed in all acoustic correlates of Cantonese speech under PFCs. Sound pressure levels were attenuated more intensely at ranges of higher frequencies in speech through face masks, whereas sound transmission was affected more at ranges of lower frequencies in speech under face shields. Vowel spaces derived from formant frequencies shrank under all PFCs, with the vowel /aa/ demonstrating largest changes in the first two formants. All tone-bearing parts were shortened and showed increments of F0 means in speech through PFCs. The decrease of tone duration was statistically significant in High-level and Low-level tones, while the increment of F0 means was significant in High-level tone only.

CONCLUSIONS: General filtering effect of PFCs is observed in Cantonese speech data, confirming language-universal patterns in acoustic attenuation by PFCs. The various coverings lower overall intensity levels of speech and degrade speech signal in higher frequency regions. Modification patterns specific to Hong Kong Cantonese are also identified. Vowel space area is reduced and found associated with increased speaking rates. Tones are produced with higher F0s under PFCs, which may be attributed to vocal tension caused by tightened vocal tract during speaking through facial coverings.

RevDate: 2022-10-10

Urzúa AR, KB Wolf (2022)

Unitary rotation of pixellated polychromatic images.

Journal of the Optical Society of America. A, Optics, image science, and vision, 39(8):1323-1329.

Unitary rotations of polychromatic images on finite two-dimensional pixellated screens provide invertibility, group composition, and thus conservation of information. Rotations have been applied on monochromatic image data sets, where we now examine closer the Gibbs-like oscillations that appear due to discrete "discontinuities" of the input images under unitary transformations. Extended to three-color images, we examine here the display of color at the pixels where, due to oscillations, some pixel color values may fall outside their required common numerical range [0,1], between absence and saturation of the red, green, and blue formant colors we choose to represent the images.

RevDate: 2022-10-04
CmpDate: 2022-10-04

Rothenberg M, S Rothenberg (2022)

Measuring the distortion of speech by a facemask.

JASA express letters, 2(9):095203.

Most prior research focuses on the reduced amplitude of speech caused by facemasks. This paper argues that the interaction between the acoustic properties of a facemask and the acoustic properties of the vocal tract contributes to speech distortion by changing the formants of the voice. Speech distortion of a number of masks was tested by measuring the increase in damping of the first formant. Results suggest that masks dampen the first formant and that increasing the distance between the mask wall and mouth can reduce this distortion. These findings contribute to the research studying the impact of masks on speech.

RevDate: 2022-10-04
CmpDate: 2022-10-04

Tran Ngoc A, Meunier F, J Meyer (2022)

Testing perceptual flexibility in speech through the categorization of whistled Spanish consonants by French speakers.

JASA express letters, 2(9):095201.

Whistled speech is a form of modified speech where, in non-tonal languages, vowels and consonants are augmented and transposed to whistled frequencies, simplifying their timbre. According to previous studies, these transformations maintain some level of vowel recognition for naive listeners. Here, in a behavioral experiment, naive listeners' capacities for the categorization of four whistled consonants (/p/, /k/, /t/, and /s/) were analyzed. Results show patterns of correct responses and confusions that provide new insights into whistled speech perception, highlighting the importance of frequency modulation cues, transposed from phoneme formants, as well as the perceptual flexibility in processing these cues.

RevDate: 2022-10-07
CmpDate: 2022-10-04

Winn MB, RA Wright (2022)

Reconsidering commonly used stimuli in speech perception experiments.

The Journal of the Acoustical Society of America, 152(3):1394.

This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.

RevDate: 2022-11-28
CmpDate: 2022-09-30

Borodkin K, Gassner T, Ershaid H, et al (2022)

tDCS modulates speech perception and production in second language learners.

Scientific reports, 12(1):16212.

Accurate identification and pronunciation of nonnative speech sounds can be particularly challenging for adult language learners. The current study tested the effects of a brief musical training combined with transcranial direct current stimulation (tDCS) on speech perception and production in a second language (L2). The sample comprised 36 native Hebrew speakers, aged 18-38, who studied English as L2 in a formal setting and had little musical training. Training encompassed musical perception tasks with feedback (i.e., timbre, duration, and tonal memory) and concurrent tDCS applied over the left posterior auditory-related cortex (including posterior superior temporal gyrus and planum temporale). Participants were randomly assigned to anodal or sham stimulation. Musical perception, L2 speech perception (measured by a categorical AXB discrimination task) and speech production (measured by a speech imitation task) were tested before and after training. There were no tDCS-dependent effects on musical perception post-training. However, only participants who received active stimulation showed increased accuracy of L2 phoneme discrimination and greater change in the acoustic properties of L2 speech sound production (i.e., second formant frequency in vowels and center of gravity in consonants). The results of this study suggest neuromodulation can facilitate the processing of nonnative speech sounds in adult learners.

RevDate: 2022-09-29
CmpDate: 2022-09-28

Morse RP, Holmes SD, Irving R, et al (2022)

Noise helps cochlear implant listeners to categorize vowels.

JASA express letters, 2(4):042001.

Theoretical studies demonstrate that controlled addition of noise can enhance the amount of information transmitted by a cochlear implant (CI). The present study is a proof-of-principle for whether stochastic facilitation can improve the ability of CI users to categorize speech sounds. Analogue vowels were presented to CI users through a single electrode with independent noise on multiple electrodes. Noise improved vowel categorization, particularly in terms of an increase in information conveyed by the first and second formant. Noise, however, did not significantly improve vowel recognition: the miscategorizations were just more consistent, giving the potential to improve with experience.

RevDate: 2022-11-09
CmpDate: 2022-10-19

Easwar V, Purcell D, Lasarev M, et al (2022)

Speech-Evoked Envelope Following Responses in Children and Adults.

Journal of speech, language, and hearing research : JSLHR, 65(10):4009-4023.

PURPOSE: Envelope following responses (EFRs) could be useful for objectively evaluating audibility of speech in children who are unable to participate in routine clinical tests. However, relative to adults, the characteristics of EFRs elicited by frequency-specific speech and their utility in predicting audibility in children are unknown.

METHOD: EFRs were elicited by the first (F1) and second and higher formants (F2+) of male-spoken vowels /u/ and /i/ and by fricatives /ʃ/ and /s/ in the token /suʃi/ presented at 15, 35, 55, 65, and 75 dB SPL. The F1, F2+, and fricatives were low-, mid-, and high-frequency dominant, respectively. EFRs were recorded between the vertex and the nape from twenty-three 6- to 17-year-old children and 21 young adults with normal hearing. Sensation levels of stimuli were estimated based on behavioral thresholds.

RESULTS: In children, amplitude decreased with age for /ʃ/-elicited EFRs but remained stable for low- and mid-frequency stimuli. As a group, EFR amplitude and phase coherence did not differ from that of adults. EFR sensitivity (proportion of audible stimuli detected) and specificity (proportion of inaudible stimuli not detected) did not vary between children and adults. Consistent with previous work, EFR sensitivity increased with stimulus frequency and level. The type of statistical indicator used for EFR detection did not influence accuracy in children.

CONCLUSIONS: Adultlike EFRs in 6- to 17-year-old typically developing children suggest mature envelope encoding for low- and mid-frequency stimuli. EFR sensitivity and specificity in children, when considering a wide range of stimulus levels and audibility, are ~77% and ~92%, respectively.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21136171.

RevDate: 2022-09-13

Nault DR, Mitsuya T, Purcell DW, et al (2022)

Perturbing the consistency of auditory feedback in speech.

Frontiers in human neuroscience, 16:905365.

Sensory information, including auditory feedback, is used by talkers to maintain fluent speech articulation. Current models of speech motor control posit that speakers continually adjust their motor commands based on discrepancies between the sensory predictions made by a forward model and the sensory consequences of their speech movements. Here, in two within-subject design experiments, we used a real-time formant manipulation system to explore how reliant speech articulation is on the accuracy or predictability of auditory feedback information. This involved introducing random formant perturbations during vowel production that varied systematically in their spatial location in formant space (Experiment 1) and temporal consistency (Experiment 2). Our results indicate that, on average, speakers' responses to auditory feedback manipulations varied based on the relevance and degree of the error that was introduced in the various feedback conditions. In Experiment 1, speakers' average production was not reliably influenced by random perturbations that were introduced every utterance to the first (F1) and second (F2) formants in various locations of formant space that had an overall average of 0 Hz. However, when perturbations were applied that had a mean of +100 Hz in F1 and -125 Hz in F2, speakers demonstrated reliable compensatory responses that reflected the average magnitude of the applied perturbations. In Experiment 2, speakers did not significantly compensate for perturbations of varying magnitudes that were held constant for one and three trials at a time. Speakers' average productions did, however, significantly deviate from a control condition when perturbations were held constant for six trials. Within the context of these conditions, our findings provide evidence that the control of speech movements is, at least in part, dependent upon the reliability and stability of the sensory information that it receives over time.

RevDate: 2022-11-29
CmpDate: 2022-11-29

Frankford SA, Cai S, Nieto-Castañón A, et al (2022)

Auditory feedback control in adults who stutter during metronome-paced speech II. Formant Perturbation.

Journal of fluency disorders, 74:105928.

PURPOSE: Prior work has shown that Adults who stutter (AWS) have reduced and delayed responses to auditory feedback perturbations. This study aimed to determine whether external timing cues, which increase fluency, resolve auditory feedback processing disruptions.

METHODS: Fifteen AWS and sixteen adults who do not stutter (ANS) read aloud a multisyllabic sentence either with natural stress and timing or with each syllable paced at the rate of a metronome. On random trials, an auditory feedback formant perturbation was applied, and formant responses were compared between groups and pacing conditions.

RESULTS: During normally paced speech, ANS showed a significant compensatory response to the perturbation by the end of the perturbed vowel, while AWS did not. In the metronome-paced condition, which significantly reduced the disfluency rate, the opposite was true: AWS showed a significant response by the end of the vowel, while ANS did not.

CONCLUSION: These findings indicate a potential link between the reduction in stuttering found during metronome-paced speech and changes in auditory motor integration in AWS.

RevDate: 2022-09-01

Lee SH, GS Lee (2022)

Long-term Average Spectrum and Nasal Accelerometry in Sentences of Differing Nasality and Forward-Focused Vowel Productions Under Altered Auditory Feedback.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00228-4 [Epub ahead of print].

OBJECTIVES AND BACKGROUND: To investigate whether voice focus adjustments can alter the audio-vocal feedback and consequently modulate speech/voice motor control. Speaking with a forward-focused voice was expected to enhance audio-vocal feedback and thus decrease the variability of vocal fundamental frequency (F0).

MATERIALS AND METHOD: Twenty-two healthy, untrained adults (10 males and 12 females) were requested to sustain vowel /a/ with their natural focus and a forward focus and to naturally read the nasal, oral, and mixed oral-nasal sentences in normal noise-masked auditory conditions. Meanwhile, a miniature accelerometer was externally attached on the noise to detect the nasal vibrations during vocalization. Audio recordings were made and analyzed using the long-term average spectrum (LTAS) and power spectral analysis of F0.

RESULTS: Compared with naturally-focused vowel production and oral sentences, forward-focused vowel productions and nasal sentences both showed significant increases in nasal accelerometric amplitude and the spectral power within the range of 200∼300 Hz, and significantly decreased the F0 variability below 3 Hz, which has been reported to be associated with enhanced auditory feedback in our previous research. The auditory masking not only significantly increased the low-frequency F0 variability, but also significantly decreased the ratio of the spectral power within 200∼300 Hz to the power within 300∼1000 Hz for the vowel and sentence productions. Gender differences were found in the correlations between the degree of nasal coupling and F0 stability as well as in the LTAS characteristics in response to noise.

CONCLUSIONS: Variations in nasal-oral acoustic coupling not only change the formant features of speech signals, but involuntarily influence the auditory feedback control of vocal fold vibrations. Speakers tend to show improved F0 stability in response to a forward-focused voice adjustment.

RevDate: 2022-10-07
CmpDate: 2022-09-08

Ibrahim O, Yuen I, van Os M, et al (2022)

The combined effects of contextual predictability and noise on the acoustic realisation of German syllables.

The Journal of the Acoustical Society of America, 152(2):911.

Speakers tend to speak clearly in noisy environments, while they tend to reserve effort by shortening word duration in predictable contexts. It is unclear how these two communicative demands are met. The current study investigates the acoustic realizations of syllables in predictable vs unpredictable contexts across different background noise levels. Thirty-eight German native speakers produced 60 CV syllables in two predictability contexts in three noise conditions (reference = quiet, 0 dB and -10 dB signal-to-noise ratio). Duration, intensity (average and range), F0 (median), and vowel formants of the target syllables were analysed. The presence of noise yielded significantly longer duration, higher average intensity, larger intensity range, and higher F0. Noise levels affected intensity (average and range) and F0. Low predictability syllables exhibited longer duration and larger intensity range. However, no interaction was found between noise and predictability. This suggests that noise-related modifications might be independent of predictability-related changes, with implications for including channel-based and message-based formulations in speech production.

RevDate: 2022-10-07
CmpDate: 2022-09-08

Krumbiegel J, Ufer C, H Blank (2022)

Influence of voice properties on vowel perception depends on speaker context.

The Journal of the Acoustical Society of America, 152(2):820.

Different speakers produce the same intended vowel with very different physical properties. Fundamental frequency (F0) and formant frequencies (FF), the two main parameters that discriminate between voices, also influence vowel perception. While it has been shown that listeners comprehend speech more accurately if they are familiar with a talker's voice, it is still unclear how such prior information is used when decoding the speech stream. In three online experiments, we examined the influence of speaker context via F0 and FF shifts on the perception of /o/-/u/ vowel contrasts. Participants perceived vowels from an /o/-/u/ continuum shifted toward /u/ when F0 was lowered or FF increased relative to the original speaker's voice and vice versa. This shift was reduced when the speakers were presented in a block-wise context compared to random order. Conversely, the original base voice was perceived to be shifted toward /u/ when presented in the context of a low F0 or high FF speaker, compared to a shift toward /o/ with high F0 or low FF speaker context. These findings demonstrate that that F0 and FF jointly influence vowel perception in speaker context.

RevDate: 2022-10-07
CmpDate: 2022-09-08

Whalen DH, Chen WR, Shadle CH, et al (2022)

Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986).

The Journal of the Acoustical Society of America, 152(2):933.

Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.

RevDate: 2022-08-30

Beeck VC, Heilmann G, Kerscher M, et al (2022)

Sound Visualization Demonstrates Velopharyngeal Coupling and Complex Spectral Variability in Asian Elephants.

Animals : an open access journal from MDPI, 12(16):.

Sound production mechanisms set the parameter space available for transmitting biologically relevant information in vocal signals. Low-frequency rumbles play a crucial role in coordinating social interactions in elephants' complex fission-fusion societies. By emitting rumbles through either the oral or the three-times longer nasal vocal tract, African elephants alter their spectral shape significantly. In this study, we used an acoustic camera to visualize the sound emission of rumbles in Asian elephants, which have received far less research attention than African elephants. We recorded nine adult captive females and analyzed the spectral parameters of 203 calls, including vocal tract resonances (formants). We found that the majority of rumbles (64%) were nasally emitted, 21% orally, and 13% simultaneously through the mouth and trunk, demonstrating velopharyngeal coupling. Some of the rumbles were combined with orally emitted roars. The nasal rumbles concentrated most spectral energy in lower frequencies exhibiting two formants, whereas the oral and mixed rumbles contained higher formants, higher spectral energy concentrations and were louder. The roars were the loudest, highest and broadest in frequency. This study is the first to demonstrate velopharyngeal coupling in a non-human animal. Our findings provide a foundation for future research into the adaptive functions of the elephant acoustic variability for information coding, localizability or sound transmission, as well as vocal flexibility across species.

RevDate: 2022-10-19
CmpDate: 2022-09-13

Rong P, Hansen O, L Heidrick (2022)

Relationship between rate-elicited changes in muscular-kinematic control strategies and acoustic performance in individuals with ALS-A multimodal investigation.

Journal of communication disorders, 99:106253.

INTRODUCTION: As a key control variable, duration has been long suspected to mediate the organization of speech motor control strategies, which has management implications for neuromotor speech disorders. This study aimed to experimentally delineate the role of duration in organizing speech motor control in neurologically healthy and impaired speakers using a voluntary speaking rate manipulation paradigm.

METHODS: Thirteen individuals with amyotrophic lateral sclerosis (ALS) and 10 healthy controls performed a sentence reading task three times, first at their habitual rate, then at a slower rate. A multimodal approach combining surface electromyography, kinematic, and acoustic technologies was used to record jaw muscle activities, jaw kinematics, and speech acoustics. Six muscular-kinematic features were extracted and factor-analyzed to characterize the organization of the mandibular control hierarchy. Five acoustic features were extracted, measuring the spectrotemporal properties of the diphthong /ɑɪ/ and the plosives /t/ and /k/.

RESULTS: The muscular-kinematic features converged into two interpretable latent factors, reflecting the level and cohesiveness/flexibility of mandibular control, respectively. Voluntary rate reduction led to a trend toward (1) finer, less cohesive, and more flexible mandibular control, and (2) increased range and decreased transition slope of the diphthong formants, across neurologically healthy and impaired groups. Differential correlations were found between the rate-elicited changes in mandibular control and acoustic performance for neurologically healthy and impaired speakers.

CONCLUSIONS: The results provided empirical evidence for the long-suspected but previously unsubstantiated role of duration in (re)organizing speech motor control strategies. The rate-elicited reorganization of muscular-kinematic control contributed to the acoustic performance of healthy speakers, in ways consistent with theoretical predictions. Such contributions were less consistent in impaired speakers, implying the complex nature of speaking rate reduction in ALS, possibly reflecting an interplay of disease-related constraints and volitional duration control. This information may help to stratify and identify candidates for the rate manipulation therapy.

RevDate: 2022-08-24

Easwar V, Aiken S, Beh K, et al (2022)

Variability in the Estimated Amplitude of Vowel-Evoked Envelope Following Responses Caused by Assumed Neurophysiologic Processing Delays.

Journal of the Association for Research in Otolaryngology : JARO [Epub ahead of print].

Vowel-evoked envelope following responses (EFRs) reflect neural encoding of the fundamental frequency of voice (f0). Accurate analysis of EFRs elicited by natural vowels requires the use of methods like the Fourier analyzer (FA) to consider the production-related f0 changes. The FA's accuracy in estimating EFRs is, however, dependent on the assumed neurophysiological processing delay needed to time-align the f0 time course and the recorded electroencephalogram (EEG). For male-spoken vowels (f0 ~ 100 Hz), a constant 10-ms delay correction is often assumed. Since processing delays vary with stimulus and physiological factors, we quantified (i) the delay-related variability that would occur in EFR estimation, and (ii) the influence of stimulus frequency, non-f0 related neural activity, and the listener's age on such variability. EFRs were elicited by the low-frequency first formant, and mid-frequency second and higher formants of /u/, /a/, and /i/ in young adults and 6- to 17-year-old children. To time-align with the f0 time course, EEG was shifted by delays between 5 and 25 ms to encompass plausible response latencies. The delay-dependent range in EFR amplitude did not vary by stimulus frequency or age and was significantly smaller when interference from low-frequency activity was reduced. On average, the delay-dependent range was < 22% of the maximum variability in EFR amplitude that could be expected by noise. Results suggest that using a constant EEG delay correction in FA analysis does not substantially alter EFR amplitude estimation. In the present study, the lack of substantial variability was likely facilitated by using vowels with small f0 ranges.

RevDate: 2022-08-22

Clarke H, Leav S, Zestic J, et al (2022)

Enhanced Neonatal Pulse Oximetry Sounds for the First Minutes of Life: A Laboratory Trial.

Human factors [Epub ahead of print].

OBJECTIVE: Auditory enhancements to the pulse oximetry tone may help clinicians detect deviations from target ranges for oxygen saturation (SpO2) and heart rate (HR).

BACKGROUND: Clinical guidelines recommend target ranges for SpO2 and HR during neonatal resuscitation in the first 10 minutes after birth. The pulse oximeter currently maps HR to tone rate, and SpO2 to tone pitch. However, deviations from target ranges for SpO2 and HR are not easy to detect.

METHOD: Forty-one participants were presented with 30-second simulated scenarios of an infant's SpO2 and HR levels in the first minutes after birth. Tremolo marked distinct HR ranges and formants marked distinct SpO2 ranges. Participants were randomly allocated to conditions: (a) No Enhancement control, (b) Enhanced HR Only, (c) Enhanced SpO2 Only, and (d) Enhanced Both.

RESULTS: Participants in the Enhanced HR Only and Enhanced SpO2 Only conditions identified HR and SpO2 ranges, respectively, more accurately than participants in the No Enhancement condition, ps < 0.001. In the Enhanced Both condition, the tremolo enhancement of HR did not affect participants' ability to identify SpO2 range, but the formants enhancement of SpO2 may have attenuated participants' ability to identify tremolo-enhanced HR range.

CONCLUSION: Tremolo and formant enhancements improve range identification for HR and SpO2, respectively, and could improve clinicians' ability to identify SpO2 and HR ranges in the first minutes after birth.

APPLICATION: Enhancements to the pulse oximeter tone to indicate clinically important ranges could improve the management of oxygen delivery to the neonate during resuscitation in the first 10 minutes after birth.

RevDate: 2022-08-12

Nascimento GFD, Silva HJD, Oliveira KGSC, et al (2022)

Relationship Between Oropharyngeal Geometry and Acoustic Parameters in Singers: A Preliminary Study.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(22)00214-4 [Epub ahead of print].

OBJECTIVE: To verify possible correlations between formant and cepstral parameters and oropharyngeal geometry in singers, stratified by sex.

METHOD: Voice records and oropharyngeal measures of 31 singers - 13 females and 18 males, mean age of 28 (±5.0) years - were retrieved from a database and analyzed. The oropharyngeal geometry measures were collected with acoustic pharyngometry, and the voice records consisted of sustained vowel /Ԑ/ phonation, which were exported to Praat software and edited to obtain the formant and cepstral parameters, stratified by sex. The Pearson linear correlation test was applied to relate voice parameters to oropharyngeal geometry, at the 5% significance level; the linear regression test was used to justify the variable related to the second formant.

RESULTS: Differences between the sexes were identified only in the oral cavity length (greater in males) and pharyngeal cavity length (greater in females). There was a linear correlation between the third formant and the cepstrum in the female group. In the male group, there was a linear correlation between the cepstrum and the third and fourth formants. A positive linear correlation with up to 95% confidence was also identified between the pharyngeal cavity volume and the second formant in the female group, making it possible to estimate a regression model for the second formant (R2 = 0.70).

CONCLUSION: There are correlations between the oropharyngeal geometry and formant and cepstral parameters in relation to sex. The pharyngeal cavity volume showed the greatest correlation between females and the second formant.

RevDate: 2022-08-29
CmpDate: 2022-08-15

Nishimura T, Tokuda IT, Miyachi S, et al (2022)

Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech.

Science (New York, N.Y.), 377(6607):760-763.

Human speech production obeys the same acoustic principles as vocal production in other animals but has distinctive features: A stable vocal source is filtered by rapidly changing formant frequencies. To understand speech evolution, we examined a wide range of primates, combining observations of phonation with mathematical modeling. We found that source stability relies upon simplifications in laryngeal anatomy, specifically the loss of air sacs and vocal membranes. We conclude that the evolutionary loss of vocal membranes allows human speech to mostly avoid the spontaneous nonlinear phenomena and acoustic chaos common in other primate vocalizations. This loss allows our larynx to produce stable, harmonic-rich phonation, ideally highlighting formant changes that convey most phonetic information. Paradoxically, the increased complexity of human spoken language thus followed simplification of our laryngeal anatomy.

RevDate: 2022-09-07
CmpDate: 2022-09-07

Suresh CH, A Krishnan (2022)

Frequency-Following Response to Steady-State Vowel in Quiet and Background Noise Among Marching Band Participants With Normal Hearing.

American journal of audiology, 31(3):719-736.

OBJECTIVE: Human studies enrolling individuals at high risk for cochlear synaptopathy (CS) have reported difficulties in speech perception in adverse listening conditions. The aim of this study is to determine if these individuals show a degradation in the neural encoding of speech in quiet and in the presence of background noise as reflected in neural phase-locking to both envelope periodicity and temporal fine structure (TFS). To our knowledge, there are no published reports that have specifically examined the neural encoding of both envelope periodicity and TFS of speech stimuli (in quiet and in adverse listening conditions) among a sample with loud-sound exposure history who are at risk for CS.

METHOD: Using scalp-recorded frequency-following response (FFR), the authors evaluated the neural encoding of envelope periodicity (FFRENV) and TFS (FFRTFS) for a steady-state vowel (English back vowel /u/) in quiet and in the presence of speech-shaped noise presented at +5- and 0 dB SNR. Participants were young individuals with normal hearing who participated in the marching band for at least 5 years (high-risk group) and non-marching band group with low-noise exposure history (low-risk group).

RESULTS: The results showed no group differences in the neural encoding of either the FFRENV or the first formant (F1) in the FFRTFS in quiet and in noise. Paradoxically, the high-risk group demonstrated enhanced representation of F2 harmonics across all stimulus conditions.

CONCLUSIONS: These results appear to be in line with a music experience-dependent enhancement of F2 harmonics. However, due to sound overexposure in the high-risk group, the role of homeostatic central compensation cannot be ruled out. A larger scale data set with different noise exposure background, longitudinal measurements with an array of behavioral and electrophysiological tests is needed to disentangle the nature of the complex interaction between the effects of central compensatory gain and experience-dependent enhancement.

RevDate: 2022-09-20
CmpDate: 2022-08-19

McAllister T, Eads A, Kabakoff H, et al (2022)

Baseline Stimulability Predicts Patterns of Response to Traditional and Ultrasound Biofeedback Treatment for Residual Speech Sound Disorder.

Journal of speech, language, and hearing research : JSLHR, 65(8):2860-2880.

PURPOSE: This study aimed to identify predictors of response to treatment for residual speech sound disorder (RSSD) affecting English rhotics. Progress was tracked during an initial phase of traditional motor-based treatment and a longer phase of treatment incorporating ultrasound biofeedback. Based on previous literature, we focused on baseline stimulability and sensory acuity as predictors of interest.

METHOD: Thirty-three individuals aged 9-15 years with residual distortions of /ɹ/ received a course of individual intervention comprising 1 week of intensive traditional treatment and 9 weeks of ultrasound biofeedback treatment. Stimulability for /ɹ/ was probed prior to treatment, after the traditional treatment phase, and after the end of all treatment. Accuracy of /ɹ/ production in each probe was assessed with an acoustic measure: normalized third formant (F3)-second formant (F2) distance. Model-based clustering analysis was applied to these acoustic measures to identify different average trajectories of progress over the course of treatment. The resulting clusters were compared with respect to acuity in auditory and somatosensory domains.

RESULTS: All but four individuals were judged to exhibit a clinically significant response to the combined course of treatment. Two major clusters were identified. The "low stimulability" cluster was characterized by very low accuracy at baseline, minimal response to traditional treatment, and strong response to ultrasound biofeedback. The "high stimulability" group was more accurate at baseline and made significant gains in both traditional and ultrasound biofeedback phases of treatment. The clusters did not differ with respect to sensory acuity.

CONCLUSIONS: This research accords with clinical intuition in finding that individuals who are more stimulable at baseline are more likely to respond to traditional intervention, whereas less stimulable individuals may derive greater relative benefit from biofeedback.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.20422236.

RevDate: 2022-08-22
CmpDate: 2022-08-09

Levi SV (2022)

Teaching acoustic phonetics to undergraduates in communication sciences and disorders: Course structure and sample projects.

The Journal of the Acoustical Society of America, 152(1):651.

Virtually all undergraduate communication sciences and disorders programs require a course that covers acoustic phonetics. Students typically have a separate phonetics (transcription) course prior to taking the acoustic phonetics course. This paper describes a way to structure an acoustic phonetics course into two halves: a first half that focuses on the source, including basic acoustics (simple harmonic motion, harmonics), vocal fold vibration, modes of phonation, and intonation, and a second half that focuses on the filter, including resonance and tube models, vowel formants, and consonant acoustics. Thus, basic acoustic properties are interwoven with specific examples of speech-related acoustics. In addition, two projects that illustrate concepts from the two halves of the course (one on fundamental frequency and the other on vowel formants) are presented.

RevDate: 2022-08-22
CmpDate: 2022-08-09

Mills HE, Shorey AE, Theodore RM, et al (2022)

Context effects in perception of vowels differentiated by F1 are not influenced by variability in talkers' mean F1 or F3.

The Journal of the Acoustical Society of America, 152(1):55.

Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443-1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent.

RevDate: 2022-08-05

Feng Y, G Peng (2022)

Development of categorical speech perception in Mandarin-speaking children and adolescents.

Child development [Epub ahead of print].

Although children develop categorical speech perception at a very young age, the maturation process remains unclear. A cross-sectional study in Mandarin-speaking 4-, 6-, and 10-year-old children, 14-year-old adolescents, and adults (n = 104, 56 males, all Asians from mainland China) was conducted to investigate the development of categorical perception of four Mandarin phonemic contrasts: lexical tone contrast Tone 1-2, vowel contrast /u/-/i/, consonant aspiration contrast /p/-/p[h] /, and consonant formant transition contrast /p/-/t/. The results indicated that different types of phonemic contrasts, and even the identification and discrimination of the same phonemic contrast, matured asynchronously. The observation that tone and vowel perception are achieved earlier than consonant perception supports the phonological saliency hypothesis.

RevDate: 2022-08-02

Song J, Wan Q, Wang Y, et al (2022)

Establishment of a Multi-parameter Evaluation Model for Risk of Aspiration in Dysphagia: A Pilot Study.

Dysphagia [Epub ahead of print].

It's difficult for clinical bedside evaluations to accurately determine the occurrence of aspiration in patients. Although VFSS and FEES are the gold standards for clinical diagnosis of dysphagia, which are mainly used to evaluate people at high risk of dysphagia found by bedside screening, the operation is complicated and time-consuming. The aim of this pilot study was to present an objective measure based on a multi-parameter approach to screen for aspiration risk in patients with dysphagia. Objective evaluation techniques based on speech parameters were used to assess the oral motor function, vocal cord function, and voice changes before and after swallowing in 32 patients with dysphagia (16 low-risk aspiration group, 16 high-risk aspiration group). Student's t test combined with stepwise logistic regression were used to determine the optimal index. The best model consists of three parameters, and the equation is: logit(P) = - 3.824 - (0.504 × maximum phonation time) + (0.008 × second formant frequency of /u/) - 0.085 × (fundamental frequency difference before and after swallowing). An additional eight patients with dysphagia were randomly selected as the validation group of the model. When applied to validation, this model can accurately identify the risk of aspiration in 87.5% of patients, and the sensitivity is as high as 100%. Therefore, it has certain clinical practical value that may help clinicians to assess the risk of aspiration in patients with dysphagia, especially for silent aspiration.

RevDate: 2022-09-14
CmpDate: 2022-08-31

Lee GS, CW Chang (2022)

Comparisons of auditory brainstem response elicited by compound click-sawtooths sound and synthetic consonant-vowel /da/.

Physiology & behavior, 255:113922.

The auditory brainstem response to complex sounds (cABR) could be evoked using speech sounds such as the 40 ms synthetic consonant-vowel syllable /da/ (CV-da) that was commonly used in basic and clinical research. cABR consists of responses to formant energy as well as the energy of fundamental frequency. The co-existence of the two energy makes cABR a mixed response. We introduced a new stimulus of click-sawtooths (CSW) with similar time-lock patterns but without formant or harmonic energy. Ten young healthy volunteers were recruited and the cABRs of CV-da and CSW of their 20 ears were acquired. The response latencies, amplitudes, and frequency-domain analytic results were compared pairwisely between stimuli. The response amplitudes were significantly greater for CSW and the latencies were significantly shorter for CSW. The latency-intensity functions were also greater for CSW. For CSW, adjustments of energy component can be made without causing biased changes to the other. CSW may be used in future basic research and clinical applications.

RevDate: 2022-07-28
CmpDate: 2022-07-28

França FP, Almeida AA, LW Lopes (2022)

Immediate effect of different exercises in the vocal space of women with and without vocal nodules.

CoDAS, 34(5):e20210157 pii:S2317-17822022000500310.

PURPOSE: To investigate the immediate effect of voiced tongue vibration (VSL), high-resistance straw in the air (CAR), and overarticulation (OA) on the vocal space of vocally healthy women (MVS) and with vocal nodules (MNV).

METHODS: 12 women participated in the MNV and 12 women in the MVS, allocated to perform the vocal exercises of VSL, CAR, and OA. Each participant performed only one of the three proposed exercises, for 5 minutes, preceded and followed by recording a sequence of vehicle sentences for extracting formants (F1 and F2) from the vowel segments [a, i, u]. The vowel space was analyzed through the differences between the measures of the formants of the vowels.

RESULTS: we observed a reduction of F1 in the interval [a]-[i] and [i]-[u] and of F2 between the vowels [a]-[u] and [i]-[u] in the MVS, after performing the CAR. In MNV, we observed a reduction of F2 in the interval [a]-[i] after VSL. In the intergroup analysis, there were higher F1 values between the intervals of the vowels [a]-[i] and [i]-[u] in the MVS, before performing the CAR, and after exercise only in the interval [a]-[i]. A higher value of F1 and F2 was observed in the interval between the vowels [i]-[u] in the MNV after VSL.

CONCLUSION: The VSL exercise reduced the vowel space in MNV women. CAR reduced the vocal space of women in the MVS. The MNV had a smaller vowel space compared to the MVS before and after the CAR. We observed a reduction in the vowel space in the MNV compared to the MNV after the VSL exercise.

RevDate: 2022-08-30

Wang H, L Max (2022)

Inter-Trial Formant Variability in Speech Production Is Actively Controlled but Does Not Affect Subsequent Adaptation to a Predictable Formant Perturbation.

Frontiers in human neuroscience, 16:890065.

Despite ample evidence that speech production is associated with extensive trial-to-trial variability, it remains unclear whether this variability represents merely unwanted system noise or an actively regulated mechanism that is fundamental for maintaining and adapting accurate speech movements. Recent work on upper limb movements suggest that inter-trial variability may be not only actively regulated based on sensory feedback, but also provide a type of workspace exploration that facilitates sensorimotor learning. We therefore investigated whether experimentally reducing or magnifying inter-trial formant variability in the real-time auditory feedback during speech production (a) leads to adjustments in formant production variability that compensate for the manipulation, (b) changes the temporal structure of formant adjustments across productions, and (c) enhances learning in a subsequent adaptation task in which a predictable formant-shift perturbation is applied to the feedback signal. Results show that subjects gradually increased formant variability in their productions when hearing auditory feedback with reduced variability, but subsequent formant-shift adaptation was not affected by either reducing or magnifying the perceived variability. Thus, findings provide evidence for speakers' active control of inter-trial formant variability based on auditory feedback from previous trials, but-at least for the current short-term experimental manipulation of feedback variability-not for a role of this variability regulation mechanism in subsequent auditory-motor learning.

RevDate: 2022-07-23

Mailhos A, Egea-Caparrós DA, Guerrero Rodríguez C, et al (2022)

Vocal Cues to Male Physical Formidability.

Frontiers in psychology, 13:879102.

Animal vocalizations convey important information about the emitter, including sex, age, biological quality, and emotional state. Early on, Darwin proposed that sex differences in auditory signals and vocalizations were driven by sexual selection mechanisms. In humans, studies on the association between male voice attributes and physical formidability have thus far reported mixed results. Hence, with a view to furthering our understanding of the role of human voice in advertising physical formidability, we sought to identify acoustic attributes of male voices associated with physical formidability proxies. Mean fundamental frequency (F 0), formant dispersion (D f), formant position (P f), and vocal tract length (VTL) data from a sample of 101 male voices was analyzed for potential associations with height, weight, and maximal handgrip strength (HGS). F 0 correlated negatively with HGS; P f showed negative correlations with HGS, height and weight, whereas VTL positively correlated with HGS, height and weight. All zero-order correlations remained significant after controlling for false discovery rate (FDR) with the Benjamini-Hochberg method. After controlling for height and weight-and controlling for FDR-the correlation between F 0 and HGS remained significant. In addition, to evaluate the ability of human male voices to advertise physical formidability to potential mates, 151 heterosexual female participants rated the voices of the 10 strongest and the 10 weakest males from the original sample for perceived physical strength, and given that physical strength is a desirable attribute in male partners, perceived attractiveness. Generalized linear mixed model analyses-which allow for generalization of inferences to other samples of both raters and targets-failed to support a significant association of perceived strength or attractiveness from voices alone and actual physical strength. These results add to the growing body of work on the role of human voices in conveying relevant biological information.

RevDate: 2022-08-14
CmpDate: 2022-07-22

Shao J, Bakhtiar M, C Zhang (2022)

Impaired Categorical Perception of Speech Sounds Under the Backward Masking Condition in Adults Who Stutter.

Journal of speech, language, and hearing research : JSLHR, 65(7):2554-2570.

PURPOSE: Evidence increasingly indicates that people with developmental stuttering have auditory perception deficits. Our previous research has indicated similar but slower performance in categorical perception of the speech sounds under the quiet condition in children who stutter and adults who stutter (AWS) compared with their typically fluent counterparts. We hypothesized that the quiet condition may not be sufficiently sensitive to reveal subtle perceptual deficiencies in people who stutter. This study examined this hypothesis by testing the categorical perception of speech and nonspeech sounds under backward masking condition (i.e., a noise was presented immediately after the target stimuli).

METHOD: Fifteen Cantonese-speaking AWS and 15 adults who do not stutter (AWNS) were tested on the categorical perception of four stimulus continua, namely, consonant varying in voice onset time (VOT), vowel, lexical tone, and nonspeech, under the backward masking condition using identification and discrimination tasks.

RESULTS: AWS demonstrated a broader boundary width than AWNS in the identification task. AWS also exhibited a worse performance than AWNS in the discrimination of between-category stimuli but a comparable performance in the discrimination of within-category stimuli, indicating reduced sensitivity to sounds that belonged to different phonemic categories among AWS. Moreover, AWS showed similar patterns of impaired categorical perception across the four stimulus types, although the boundary location on the VOT continuum occurred at an earlier point in AWS than in AWNS.

CONCLUSIONS: The findings provide robust evidence that AWS exhibit impaired categorical perception of speech and nonspeech sounds under the backward masking condition. Temporal processing (i.e., VOT manipulation), frequency/spectral/formant processing (i.e., lexical tone or vowel manipulations), and nonlinguistic pitch processing were all found to be impaired in AWS. Altogether, the findings support the hypothesis that AWS might be less efficient in accessing the phonemic representations when exposed to a demanding listening condition.

SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.20249718.

RevDate: 2022-07-29
CmpDate: 2022-07-22

Baciadonna L, Solvi C, Del Vecchio F, et al (2022)

Vocal accommodation in penguins (Spheniscus demersus) as a result of social environment.

Proceedings. Biological sciences, 289(1978):20220626.

The ability to vary the characteristics of one's voice is a critical feature of human communication. Understanding whether and how animals change their calls will provide insights into the evolution of language. We asked to what extent the vocalizations of penguins, a phylogenetically distant species from those capable of explicit vocal learning, are flexible and responsive to their social environment. Using a principal components (PCs) analysis, we reduced 14 vocal parameters of penguin's contact calls to four PCs, each comprising highly correlated parameters and which can be categorized as fundamental frequency, formant frequency, frequency modulation, and amplitude modulation rate and duration. We compared how these differed between individuals with varying degrees of social interactions: same-colony versus different-colony, same colony over 3 years and partners versus non-partners. Our analyses indicate that the more penguins experience each other's calls, the more similar their calls become over time, that vocal convergence requires a long time and relative stability in colony membership, and that partners' unique social bond may affect vocal convergence differently than non-partners. Our results suggest that this implicit form of vocal plasticity is perhaps more widespread across the animal kingdom than previously thought and may be a fundamental capacity of vertebrate vocalization.

RevDate: 2022-10-15
CmpDate: 2022-09-08

Easwar V, L Chung (2022)

The influence of phoneme contexts on adaptation in vowel-evoked envelope following responses.

The European journal of neuroscience, 56(5):4572-4582.

Repeated stimulus presentation leads to neural adaptation and consequent amplitude reduction in vowel-evoked envelope following responses (EFRs)-a response that reflects neural activity phase-locked to envelope periodicity. EFRs are elicited by vowels presented in isolation or in the context of other phonemes such as consonants in syllables. While context phonemes could exert some forward influence on vowel-evoked EFRs, they may reduce the degree of adaptation. Here, we evaluated whether the properties of context phonemes between consecutive vowel stimuli influence adaptation. EFRs were elicited by the low-frequency first formant (resolved harmonics) and middle-to-high-frequency second and higher formants (unresolved harmonics) of a male-spoken /i/ when the presence, number and predictability of context phonemes (/s/, /a/, /∫/ and /u/) between vowel repetitions varied. Monitored over four iterations of /i/, adaptation was evident only for EFRs elicited by the unresolved harmonics. EFRs elicited by the unresolved harmonics decreased in amplitude by ~16-20 nV (10%-17%) after the first presentation of /i/ and remained stable thereafter. EFR adaptation was reduced by the presence of a context phoneme, but the reduction did not change with their number or predictability. The presence of a context phoneme, however, attenuated EFRs by a degree similar to that caused by adaptation (~21-23 nV). Such a trade-off in the short- and long-term influence of context phonemes suggests that the benefit of interleaving EFR-eliciting vowels with other context phonemes depends on whether the use of consonant-vowel syllables is critical to improve the validity of EFR applications.

RevDate: 2022-09-05

Teferra BG, Borwein S, DeSouza DD, et al (2022)

Acoustic and Linguistic Features of Impromptu Speech and Their Association With Anxiety: Validation Study.

JMIR mental health, 9(7):e36828.

BACKGROUND: The measurement and monitoring of generalized anxiety disorder requires frequent interaction with psychiatrists or psychologists. Access to mental health professionals is often difficult because of high costs or insufficient availability. The ability to assess generalized anxiety disorder passively and at frequent intervals could be a useful complement to conventional treatment and help with relapse monitoring. Prior work suggests that higher anxiety levels are associated with features of human speech. As such, monitoring speech using personal smartphones or other wearable devices may be a means to achieve passive anxiety monitoring.

OBJECTIVE: This study aims to validate the association of previously suggested acoustic and linguistic features of speech with anxiety severity.

METHODS: A large number of participants (n=2000) were recruited and participated in a single web-based study session. Participants completed the Generalized Anxiety Disorder 7-item scale assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. Acoustic and linguistic speech features were a priori selected based on the existing speech and anxiety literature, along with related features. Associations between speech features and anxiety levels were assessed using age and personal income as covariates.

RESULTS: Word count and speaking duration were negatively correlated with anxiety scores (r=-0.12; P<.001), indicating that participants with higher anxiety scores spoke less. Several acoustic features were also significantly (P<.05) associated with anxiety, including the mel-frequency cepstral coefficients, linear prediction cepstral coefficients, shimmer, fundamental frequency, and first formant. In contrast to previous literature, second and third formant, jitter, and zero crossing rate for the z score of the power spectral density acoustic features were not significantly associated with anxiety. Linguistic features, including negative-emotion words, were also associated with anxiety (r=0.10; P<.001). In addition, some linguistic relationships were sex dependent. For example, the count of words related to power was positively associated with anxiety in women (r=0.07; P=.03), whereas it was negatively associated with anxiety in men (r=-0.09; P=.01).

CONCLUSIONS: Both acoustic and linguistic speech measures are associated with anxiety scores. The amount of speech, acoustic quality of speech, and gender-specific linguistic characteristics of speech may be useful as part of a system to screen for anxiety, detect relapse, or monitor treatment.

RevDate: 2022-07-25
CmpDate: 2022-07-06

Lin YC, Yan HT, Lin CH, et al (2022)

Predicting frailty in older adults using vocal biomarkers: a cross-sectional study.

BMC geriatrics, 22(1):549.

BACKGROUND: Frailty is a common issue in the aging population. Given that frailty syndrome is little discussed in the literature on the aging voice, the current study aims to examine the relationship between frailty and vocal biomarkers in older people.

METHODS: Participants aged ≥ 60 years visiting geriatric outpatient clinics were recruited. They underwent frailty assessment (Cardiovascular Health Study [CHS] index; Study of Osteoporotic Fractures [SOF] index; and Fatigue, Resistance, Ambulation, Illness, and Loss of weight [FRAIL] index) and were asked to pronounce a sustained vowel /a/ for approximately 1 s. Four voice parameters were assessed: average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4).

RESULTS: Among 277 older adults, increased A1 was associated with a lower likelihood of frailty as defined by SOF (odds ratio [OR] 0.84, 95% confidence interval [CI] 0.74-0.96). Participants with larger A2 values were more likely to be frail, as defined by FRAIL and CHS (FRAIL: OR 1.41, 95% CI 1.12-1.79; CHS: OR 1.38, 95% CI 1.10-1.75). Sex differences were observed across the three frailty indices. In male participants, an increase in A3 by 10 points increased the odds of frailty by almost 7% (SOF: OR 1.07, 95% CI 1.02-1.12), 6% (FRAIL: OR 1.06, 95% CI 1.02-1.11), or 6% (CHS: OR 1.06, 95% CI 1.01-1.11). In female participants, an increase in A4 by 0.1 conferred a significant 2.8-fold (SOF: OR 2.81, 95% CI 1.71-4.62), 2.3-fold (FRAIL: OR 2.31, 95% CI 1.45-3.68), or 2.8-fold (CHS: OR 2.82, 95% CI 1.76-4.51, CHS) increased odds of frailty.

CONCLUSIONS: Vocal biomarkers, especially spectral-domain voice parameters, might have potential for estimating frailty, as a non-invasive, instantaneous, objective, and cost-effective estimation tool, and demonstrating sex differences for individualised treatment of frailty.

RevDate: 2022-07-25
CmpDate: 2022-07-07

Jibson J (2022)

Formant detail needed for identifying, rating, and discriminating vowels in Wisconsin English.

The Journal of the Acoustical Society of America, 151(6):4004.

Neel [(2004). Acoust. Res. Lett. Online 5, 125-131] asked how much time-varying formant detail is needed for vowel identification. In that study, multiple stimuli were synthesized for each vowel: 1-point (monophthongal with midpoint frequencies), 2-point (linear from onset to offset), 3-point, 5-point, and 11-point. Results suggested that a 3-point model was optimal. This conflicted with the dual-target hypothesis of vowel inherent spectral change research, which has found that two targets are sufficient to model vowel identification. The present study replicates and expands upon the work of Neel. Ten English monophthongs were chosen for synthesis. One-, two-, three-, and five-point vowels were created as described above, and another 1-point stimulus was created with onset frequencies rather than midpoint frequencies. Three experiments were administered (n = 18 for each): vowel identification, goodness rating, and discrimination. The results ultimately align with the dual-target hypothesis, consistent with most vowel inherent spectral change studies.

RevDate: 2022-11-13
CmpDate: 2022-07-22

Groll MD, Dahl KL, Cádiz MD, et al (2022)

Resynthesis of Transmasculine Voices to Assess Gender Perception as a Function of Testosterone Therapy.

Journal of speech, language, and hearing research : JSLHR, 65(7):2474-2489.

PURPOSE: The goal of this study was to use speech resynthesis to investigate the effects of changes to individual acoustic features on speech-based gender perception of transmasculine voice samples following the onset of hormone replacement therapy (HRT) with exogenous testosterone. We hypothesized that mean fundamental frequency (f o) would have the largest effect on gender perception of any single acoustic feature.

METHOD: Mean f o, f o contour, and formant frequencies were calculated for three pairs of transmasculine speech samples before and after HRT onset. Sixteen speech samples with unique combinations of these acoustic features from each pair of speech samples were resynthesized. Twenty young adult listeners evaluated each synthesized speech sample for gender perception and synthetic quality. Two analyses of variance were used to investigate the effects of acoustic features on gender perception and synthetic quality.

RESULTS: Of the three acoustic features, mean f o was the only single feature that had a statistically significant effect on gender perception. Differences between the speech samples before and after HRT onset that were not captured by changes in f o and formant frequencies also had a statistically significant effect on gender perception.

CONCLUSION: In these transmasculine voice samples, mean f o was the most important acoustic feature for voice masculinization as a result of HRT; future investigations in a larger number of transmasculine speakers and on the effects of behavioral therapy-based changes in concert with HRT is warranted.

RevDate: 2022-07-16

Yan S, Liu P, Chen Z, et al (2022)

High-Property Refractive Index and Bio-Sensing Dual-Purpose Sensor Based on SPPs.

Micromachines, 13(6):.

A high-property plasma resonance-sensor structure consisting of two metal-insulator-metal (MIM) waveguides coupled with a transverse ladder-shaped nano-cavity (TLSNC) is designed based on surface plasmon polaritons. Its transmission characteristics are analyzed using multimode interference coupling mode theory (MICMT), and are simulated using finite element analysis (FEA). Meanwhile, the influence of different structural arguments on the performance of the structure is investigated. This study shows that the system presents four high-quality formants in the transmission spectrum. The highest sensitivity is 3000 nm/RIU with a high FOM[*] of 9.7 × 10[5]. In addition, the proposed structure could act as a biosensor to detect the concentrations of sodium ions (Na[+]), potassium ions (K[+]), and the glucose solution with maximum sensitivities of 0.45, 0.625 and 5.5 nm/mgdL[-1], respectively. Compared with other structures, the designed system has the advantages of a simple construction, a wide working band range, high reliability and easy nano-scale integration, providing a high-performance cavity choice for refractive index sensing and biosensing devices based on surface plasmons.

RevDate: 2022-07-19
CmpDate: 2022-06-27

Ham J, Yoo HJ, Kim J, et al (2022)

Vowel speech recognition from rat electroencephalography using long short-term memory neural network.

PloS one, 17(6):e0270405.

Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen's κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.

RevDate: 2022-06-22

Pravitharangul N, Miyamoto JJ, Yoshizawa H, et al (2022)

Vowel sound production and its association with cephalometric characteristics in skeletal Class III subjects.

European journal of orthodontics pii:6613233 [Epub ahead of print].

BACKGROUND: This study aimed to evaluate differences in vowel production using acoustic analysis in skeletal Class III and Class I Japanese participants and to identify the correlation between vowel sounds and cephalometric variables in skeletal Class III subjects.

MATERIALS AND METHODS: Japanese males with skeletal Class III (ANB < 0°) and Class I skeletal anatomy (0.62° < ANB < 5.94°) were recruited (n = 18/group). Acoustic analysis of vowel sounds and cephalometric analysis of lateral cephalograms were performed. For sound analysis, an isolated Japanese vowel (/a/,/i/,/u/,/e/,/o/) pattern was recorded. Praat software was used to extract acoustic parameters such as fundamental frequency (F0) and the first four formants (F1, F2, F3, and F4). The formant graph area was calculated. Cephalometric values were obtained using ImageJ. Correlations between acoustic and cephalometric variables in skeletal Class III subjects were then investigated.

RESULTS: Skeletal Class III subjects exhibited significantly higher/o/F2 and lower/o/F4 values. Mandibular length, SNB, and overjet of Class III subjects were moderately negatively correlated with acoustic variables.

LIMITATIONS: This study did not take into account vertical skeletal patterns and tissue movements during sound production.

CONCLUSION: Skeletal Class III males produced different /o/ (back and rounded vowel), possibly owing to their anatomical positions or adaptive changes. Vowel production was moderately associated with cephalometric characteristics of Class III subjects. Thus, changes in speech after orthognathic surgery may be expected. A multidisciplinary team approach that included the input of a speech pathologist would be useful.

RevDate: 2022-10-19
CmpDate: 2022-09-13

Kabakoff H, Gritsyk O, Harel D, et al (2022)

Characterizing sensorimotor profiles in children with residual speech sound disorder: a pilot study.

Journal of communication disorders, 99:106230.

PURPOSE: Children with speech errors who have reduced motor skill may be more likely to develop residual errors associated with lifelong challenges. Drawing on models of speech production that highlight the role of somatosensory acuity in updating motor plans, this pilot study explored the relationship between motor skill and speech accuracy, and between somatosensory acuity and motor skill in children. Understanding the connections among sensorimotor measures and speech outcomes may offer insight into how somatosensation and motor skill cooperate during speech production, which could inform treatment decisions for this population.

METHOD: Twenty-five children (ages 9-14) produced syllables in an /ɹ/ stimulability task before and after an ultrasound biofeedback treatment program targeting rhotics. We first tested whether motor skill (as measured by two ultrasound-based metrics of tongue shape complexity) predicted acoustically measured accuracy (the normalized difference between the second and third formant frequencies). We then tested whether somatosensory acuity (as measured by an oral stereognosis task) predicted motor skill, while controlling for auditory acuity.

RESULTS: One measure of tongue shape complexity was a significant predictor of accuracy, such that higher tongue shape complexity was associated with lower accuracy at pre-treatment but higher accuracy at post-treatment. Based on the same measure, children with better somatosensory acuity produced /ɹ/ tongue shapes that were more complex, but this relationship was only present at post-treatment.

CONCLUSION: The predicted relationships among somatosensory acuity, motor skill, and acoustically measured /ɹ/ production accuracy were observed after treatment, but unexpectedly did not hold before treatment. The surprising finding that greater tongue shape complexity was associated with lower accuracy at pre-treatment highlights the importance of evaluating tongue shape patterns (e.g., using ultrasound) prior to treatment, and has the potential to suggest that children with high tongue shape complexity at pre-treatment may be good candidates for ultrasound-based treatment.

RevDate: 2022-09-20
CmpDate: 2022-09-20

González-Alvarez J, R Sos-Peña (2022)

Perceiving Body Height From Connected Speech: Higher Fundamental Frequency Is Associated With the Speaker's Height.

Perceptual and motor skills, 129(5):1349-1361.

To a certain degree, human listeners can perceive a speaker's body size from their voice. The speaker's voice pitch or fundamental frequency (Fo) and the vocal formant frequencies are the voice parameters that have been most intensively studied in past body size perception research (particularly for body height). Artificially lowering the Fo of isolated vowels from male speakers improved listeners' accuracy of binary (i.e., tall vs not tall) body height perceptions. This has been explained by the theory that a denser harmonic spectrum provided by a low pitch improved the perceptual resolution of formants that aid formant-based size assessments. In the present study, we extended this research using connected speech (i.e., words and sentences) pronounced by speakers of both sexes. Unexpectedly, we found that raising Fo, not lowering it, increased the participants' perceptual performance in two binary discrimination tasks of body size. We explain our new finding in the temporal domain by the dynamic and time-varying acoustic properties of connected speech. Increased Fo might increase the sampling density of sound wave acoustic cycles and provide more detailed information, such as higher resolution, on the envelope shape.

RevDate: 2022-07-16

Sugiyama Y (2022)

Identification of Minimal Pairs of Japanese Pitch Accent in Noise-Vocoded Speech.

Frontiers in psychology, 13:887761.

The perception of lexical pitch accent in Japanese was assessed using noise-excited vocoder speech, which contained no fundamental frequency (f o) or its harmonics. While prosodic information such as in lexical stress in English and lexical tone in Mandarin Chinese is known to be encoded in multiple acoustic dimensions, such multidimensionality is less understood for lexical pitch accent in Japanese. In the present study, listeners were tested under four different conditions to investigate the contribution of non-f o properties to the perception of Japanese pitch accent: noise-vocoded speech stimuli consisting of 10 3-ERBN-wide bands and 15 2-ERBN-wide bands created from a male and female speaker. Results found listeners were able to identify minimal pairs of final-accented and unaccented words at a rate better than chance in all conditions, indicating the presence of secondary cues to Japanese pitch accent. Subsequent analyses were conducted to investigate if the listeners' ability to distinguish minimal pairs was correlated with duration, intensity or formant information. The results found no strong or consistent correlation, suggesting the possibility that listeners used different cues depending on the information available in the stimuli. Furthermore, the comparison of the current results with equivalent studies in English and Mandarin Chinese suggest that, although lexical prosodic information exists in multiple acoustic dimensions in Japanese, the primary cue is more salient than in other languages.

RevDate: 2022-08-03
CmpDate: 2022-07-12

Preisig BC, Riecke L, A Hervais-Adelman (2022)

Speech sound categorization: The contribution of non-auditory and auditory cortical regions.

NeuroImage, 258:119375.

Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.

RevDate: 2022-06-13

Sayyahi F, V Boulenger (2022)

A temporal-based therapy for children with inconsistent phonological disorder: A case-series.

Clinical linguistics & phonetics [Epub ahead of print].

Deficits in temporal auditory processing, and in particular higher gap detection thresholds have been reported in children with inconsistent phonological disorder (IPD). Here we hypothesized that providing these children with extra time for phoneme identification may in turn enhance their phonological planning abilities for production, and accordingly improve not only consistency but also accuracy of their speech. We designed and tested a new temporal-based therapy, inspired by Core Vocabulary Therapy and called it T-CVT, where we digitally lengthened formant transitions between phonemes of words used for therapy. This allowed to target both temporal auditory processing and word phonological planning. Four preschool Persian native children with IPD received T-CVT for eight weeks. We measured changes in speech consistency (% inconsistency) and accuracy (percentage of consonants correct PCC) to assess the effects of the intervention. Therapy significantly improved both consistency and accuracy of word production in the four children: % inconsistency decreased from 59% on average before therapy to 2% post-T-CVT, and PCC increased from 61% to 92% on average. Consistency and accuracy were furthermore maintained or even still improved at three-month follow-up (2% inconsistency and 99% PCC). Results in a nonword repetition task showed the generalization of these effects to non-treated material: % inconsistency for nonwords decreased from 67% to 10% post-therapy, and PCC increased from 63% to 90%. These preliminary findings support the efficacy of the T-CVT intervention for children with IPD who show temporal auditory processing deficits as reflected by higher gap detection thresholds.

RevDate: 2022-10-15
CmpDate: 2022-08-03

Di Dona G, Scaltritti M, S Sulpizio (2022)

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli.

The European journal of neuroscience, 56(3):4086-4106.

The present study investigated whether listeners can form abstract voice representations while ignoring constantly changing phonological information and if they can use the resulting information to facilitate voice change detection. Further, the study aimed at understanding whether the use of abstraction is restricted to the speech domain or can be deployed also in non-speech contexts. We ran an electroencephalogram (EEG) experiment including one passive and one active oddball task, each featuring a speech and a rotated speech condition. In the speech condition, participants heard constantly changing vowels uttered by a male speaker (standard stimuli) which were infrequently replaced by vowels uttered by a female speaker with higher pitch (deviant stimuli). In the rotated speech condition, participants heard rotated vowels, in which the natural formant structure of speech was disrupted. In the passive task, the mismatch negativity was elicited after the presentation of the deviant voice in both conditions, indicating that listeners could successfully group together different stimuli into a formant-invariant voice representation. In the active task, participants showed shorter reaction times (RTs), higher accuracy and a larger P3b in the speech condition with respect to the rotated speech condition. Results showed that whereas at a pre-attentive level the cognitive system can track pitch regularities while presumably ignoring constantly changing formant information both in speech and in rotated speech, at an attentive level the use of such information is facilitated for speech. This facilitation was also testified by a stronger synchronisation in the theta band (4-7 Hz), potentially pointing towards differences in encoding/retrieval processes.

RevDate: 2022-07-16
CmpDate: 2022-06-08

Hampsey E, Meszaros M, Skirrow C, et al (2022)

Protocol for Rhapsody: a longitudinal observational study examining the feasibility of speech phenotyping for remote assessment of neurodegenerative and psychiatric disorders.

BMJ open, 12(6):e061193.

INTRODUCTION: Neurodegenerative and psychiatric disorders (NPDs) confer a huge health burden, which is set to increase as populations age. New, remotely delivered diagnostic assessments that can detect early stage NPDs by profiling speech could enable earlier intervention and fewer missed diagnoses. The feasibility of collecting speech data remotely in those with NPDs should be established.

METHODS AND ANALYSIS: The present study will assess the feasibility of obtaining speech data, collected remotely using a smartphone app, from individuals across three NPD cohorts: neurodegenerative cognitive diseases (n=50), other neurodegenerative diseases (n=50) and affective disorders (n=50), in addition to matched controls (n=75). Participants will complete audio-recorded speech tasks and both general and cohort-specific symptom scales. The battery of speech tasks will serve several purposes, such as measuring various elements of executive control (eg, attention and short-term memory), as well as measures of voice quality. Participants will then remotely self-administer speech tasks and follow-up symptom scales over a 4-week period. The primary objective is to assess the feasibility of remote collection of continuous narrative speech across a wide range of NPDs using self-administered speech tasks. Additionally, the study evaluates if acoustic and linguistic patterns can predict diagnostic group, as measured by the sensitivity, specificity, Cohen's kappa and area under the receiver operating characteristic curve of the binary classifiers distinguishing each diagnostic group from each other. Acoustic features analysed include mel-frequency cepstrum coefficients, formant frequencies, intensity and loudness, whereas text-based features such as number of words, noun and pronoun rate and idea density will also be used.

ETHICS AND DISSEMINATION: The study received ethical approval from the Health Research Authority and Health and Care Research Wales (REC reference: 21/PR/0070). Results will be disseminated through open access publication in academic journals, relevant conferences and other publicly accessible channels. Results will be made available to participants on request.


RevDate: 2022-07-16

Roessig S, Winter B, D Mücke (2022)

Tracing the Phonetic Space of Prosodic Focus Marking.

Frontiers in artificial intelligence, 5:842546.

Focus is known to be expressed by a wide range of phonetic cues but only a few studies have explicitly compared different phonetic variables within the same experiment. Therefore, we presented results from an analysis of 19 phonetic variables conducted on a data set of the German language that comprises the opposition of unaccented (background) vs. accented (in focus), as well as different focus types with the nuclear accent on the same syllable (broad, narrow, and contrastive focus). The phonetic variables are measures of the acoustic and articulographic signals of a target syllable. Overall, our results provide the highest number of reliable effects and largest effect sizes for accentuation (unaccented vs. accented), while the differentiation of focus types with accented target syllables (broad, narrow, and contrastive focus) are more subtle. The most important phonetic variables across all conditions are measures of the fundamental frequency. The articulatory variables and their corresponding acoustic formants reveal lower tongue positions for both vowels /o, a/, and larger lip openings for the vowel /a/ under increased prosodic prominence with the strongest effects for accentuation. While duration exhibits consistent mid-ranked results for both accentuation and the differentiation of focus types, measures related to intensity are particularly important for accentuation. Furthermore, voice quality and spectral tilt are affected by accentuation but also in the differentiation of focus types. Our results confirm that focus is realized via multiple phonetic cues. Additionally, the present analysis allows a comparison of the relative importance of different measures to better understand the phonetic space of focus marking.

RevDate: 2022-07-16

Coughler C, Quinn de Launay KL, Purcell DW, et al (2022)

Pediatric Responses to Fundamental and Formant Frequency Altered Auditory Feedback: A Scoping Review.

Frontiers in human neuroscience, 16:858863.

PURPOSE: The ability to hear ourselves speak has been shown to play an important role in the development and maintenance of fluent and coherent speech. Despite this, little is known about the developing speech motor control system throughout childhood, in particular if and how vocal and articulatory control may differ throughout development. A scoping review was undertaken to identify and describe the full range of studies investigating responses to frequency altered auditory feedback in pediatric populations and their contributions to our understanding of the development of auditory feedback control and sensorimotor learning in childhood and adolescence.

METHOD: Relevant studies were identified through a comprehensive search strategy of six academic databases for studies that included (a) real-time perturbation of frequency in auditory input, (b) an analysis of immediate effects on speech, and (c) participants aged 18 years or younger.

RESULTS: Twenty-three articles met inclusion criteria. Across studies, there was a wide variety of designs, outcomes and measures used. Manipulations included fundamental frequency (9 studies), formant frequency (12), frequency centroid of fricatives (1), and both fundamental and formant frequencies (1). Study designs included contrasts across childhood, between children and adults, and between typical, pediatric clinical and adult populations. Measures primarily explored acoustic properties of speech responses (latency, magnitude, and variability). Some studies additionally examined the association of these acoustic responses with clinical measures (e.g., stuttering severity and reading ability), and neural measures using electrophysiology and magnetic resonance imaging.

CONCLUSION: Findings indicated that children above 4 years generally compensated in the opposite direction of the manipulation, however, in several cases not as effectively as adults. Overall, results varied greatly due to the broad range of manipulations and designs used, making generalization challenging. Differences found between age groups in the features of the compensatory vocal responses, latency of responses, vocal variability and perceptual abilities, suggest that maturational changes may be occurring in the speech motor control system, affecting the extent to which auditory feedback is used to modify internal sensorimotor representations. Varied findings suggest vocal control develops prior to articulatory control. Future studies with multiple outcome measures, manipulations, and more expansive age ranges are needed to elucidate findings.

RevDate: 2022-07-16
CmpDate: 2022-06-02

Wang X, T Wang (2022)

Voice Recognition and Evaluation of Vocal Music Based on Neural Network.

Computational intelligence and neuroscience, 2022:3466987.

Artistic voice is the artistic life of professional voice users. In the process of selecting and cultivating artistic performing talents, the evaluation of voice even occupies a very important position. Therefore, an appropriate evaluation of the artistic voice is crucial. With the development of art education, how to scientifically evaluate artistic voice training methods and fairly select artistic voice talents is an urgent need for objective evaluation of artistic voice. The current evaluation methods for artistic voices are time-consuming, laborious, and highly subjective. In the objective evaluation of artistic voice, the selection of evaluation acoustic parameters is very important. Attempt to extract the average energy, average frequency error, and average range error of singing voice by using speech analysis technology as the objective evaluation acoustic parameters, use neural network method to objectively evaluate the singing quality of artistic voice, and compare with the subjective evaluation of senior professional teachers. In this paper, voice analysis technology is used to extract the first formant, third formant, fundamental frequency, sound range, fundamental frequency perturbation, first formant perturbation, third formant perturbation, and average energy of singing acoustic parameters. By using BP neural network methods, the quality of singing was evaluated objectively and compared with the subjective evaluation of senior vocal professional teachers. The results show that the BP neural network method can accurately and objectively evaluate the quality of singing voice by using the evaluation parameters, which is helpful in scientifically guiding the selection and training of artistic voice talents.

RevDate: 2022-05-26
CmpDate: 2022-05-26

Rafi S, Gangloff C, Paulhet E, et al (2022)

Out-of-Hospital Cardiac Arrest Detection by Machine Learning Based on the Phonetic Characteristics of the Caller's Voice.

Studies in health technology and informatics, 294:445-449.

INTRODUCTION: Out-of-hospital cardiac arrest (OHCA) is a major public health issue. The prognosis is closely related to the time from collapse to return of spontaneous circulation. Resuscitation efforts are frequently initiated at the request of emergency call center professionals who are specifically trained to identify critical conditions over the phone. However, 25% of OHCAs are not recognized during the first call. Therefore, it would be interesting to develop automated computer systems to recognize OHCA on the phone. The aim of this study was to build and evaluate machine learning models for OHCA recognition based on the phonetic characteristics of the caller's voice.

METHODS: All patients for whom a call was done to the emergency call center of Rennes, France, between 01/01/2017 and 01/01/2019 were eligible. The predicted variable was OHCA presence. Predicting variables were collected by computer-automatized phonetic analysis of the call. They were based on the following voice parameters: fundamental frequency, formants, intensity, jitter, shimmer, harmonic to noise ratio, number of voice breaks, and number of periods. Three models were generated using binary logistic regression, random forest, and neural network. The area under the curve (AUC) was the primary outcome used to evaluate each model performance.

RESULTS: 820 patients were included in the study. The best model to predict OHCA was random forest (AUC=74.9, 95% CI=67.4-82.4).

CONCLUSION: Machine learning models based on the acoustic characteristics of the caller's voice can recognize OHCA. The integration of the acoustic parameters identified in this study will help to design decision-making support systems to improve OHCA detection over the phone.

RevDate: 2022-07-16

Tomaschek F, M Ramscar (2022)

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing.

Frontiers in psychology, 13:754395.

The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.

RevDate: 2022-07-16

Haiduk F, WT Fitch (2022)

Understanding Design Features of Music and Language: The Choric/Dialogic Distinction.

Frontiers in psychology, 13:786899.

Music and spoken language share certain characteristics: both consist of sequences of acoustic elements that are combinatorically combined, and these elements partition the same continuous acoustic dimensions (frequency, formant space and duration). However, the resulting categories differ sharply: scale tones and note durations of small integer ratios appear in music, while speech uses phonemes, lexical tone, and non-isochronous durations. Why did music and language diverge into the two systems we have today, differing in these specific features? We propose a framework based on information theory and a reverse-engineering perspective, suggesting that design features of music and language are a response to their differential deployment along three different continuous dimensions. These include the familiar propositional-aesthetic ('goal') and repetitive-novel ('novelty') dimensions, and a dialogic-choric ('interactivity') dimension that is our focus here. Specifically, we hypothesize that music exhibits specializations enhancing coherent production by several individuals concurrently-the 'choric' context. In contrast, language is specialized for exchange in tightly coordinated turn-taking-'dialogic' contexts. We examine the evidence for our framework, both from humans and non-human animals, and conclude that many proposed design features of music and language follow naturally from their use in distinct dialogic and choric communicative contexts. Furthermore, the hybrid nature of intermediate systems like poetry, chant, or solo lament follows from their deployment in the less typical interactive context.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Support this website:
Order from Amazon
We will earn a commission.

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )