About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot

About | BLOGS | Portfolio | Misc | Recommended | What's New | What's Hot


Bibliography Options Menu

12 Jul 2020 at 01:41
Hide Abstracts   |   Hide Additional Links
Long bibliographies are displayed in blocks of 100 citations at a time. At the end of each block there is an option to load the next block.

Bibliography on: Formants: Modulators of Communication


Robert J. Robbins is a biologist, an educator, a science administrator, a publisher, an information technologist, and an IT leader and manager who specializes in advancing biomedical knowledge and supporting education through the application of information technology. More About:  RJR | OUR TEAM | OUR SERVICES | THIS WEBSITE

RJR: Recommended Bibliography 12 Jul 2020 at 01:41 Created: 

Formants: Modulators of Communication

Wikipedia: A formant, as defined by James Jeans, is a harmonic of a note that is augmented by a resonance. In speech science and phonetics, however, a formant is also sometimes used to mean acoustic resonance of the human vocal tract. Thus, in phonetics, formant can mean either a resonance or the spectral maximum that the resonance produces. Formants are often measured as amplitude peaks in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer and, in the case of the voice, this gives an estimate of the vocal tract resonances. In vowels spoken with a high fundamental frequency, as in a female or child voice, however, the frequency of the resonance may lie between the widely spaced harmonics and hence no corresponding peak is visible. Because formants are a product of resonance and resonance is affected by the shape and material of the resonating structure, and because all animals (humans included) have unqiue morphologies, formants can add additional generic (sounds big) and specific (that's Towser barking) information to animal vocalizations.

Created with PubMed® Query: formant NOT pmcbook NOT ispreviousversion

Citations The Papers (from PubMed®)


RevDate: 2020-07-10

Levy-Lambert D, Grigos MI, LeBlanc É, et al (2020)

Communication Efficiency in a Face Transplant Recipient: Determinants and Therapeutic Implications.

The Journal of craniofacial surgery [Epub ahead of print].

We longitudinally assessed speech intelligibility (percent words correct/pwc), communication efficiency (intelligible words per minute/iwpm), temporal control markers (speech and pause coefficients of variation), and formant frequencies associated with lip motion in a 41-year-old face transplant recipient. Pwc and iwpm at 13 months post-transplantation were both higher than preoperative values. Multivariate regression demonstrated that temporal markers and all formant frequencies associated with lip motion were significant predictors (P < 0.05) of communication efficiency, highlighting the interplay of these variables in generating intelligible and effective speech. These findings can guide us in developing personalized rehabilitative approaches in face transplant recipients for optimal speech outcomes.

RevDate: 2020-07-08

Kim KS, Wang H, L Max (2020)

It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Various aspects of speech production related to auditory-motor integration and learning have been examined through auditory feedback perturbation paradigms in which participants' acoustic speech output is experimentally altered and played back via earphones/headphones "in real time." Scientific rigor requires high precision in determining and reporting the involved hardware and software latencies. Many reports in the literature, however, are not consistent with the minimum achievable latency for a given experimental setup. Here, we focus specifically on this methodological issue associated with implementing real-time auditory feedback perturbations, and we offer concrete suggestions for increased reproducibility in this particular line of work. Method Hardware and software latencies as well as total feedback loop latency were measured for formant perturbation studies with the Audapter software. Measurements were conducted for various audio interfaces, desktop and laptop computers, and audio drivers. An approach for lowering Audapter's software latency through nondefault parameter specification was also tested. Results Oft-overlooked hardware-specific latencies were not negligible for some of the tested audio interfaces (adding up to 15 ms). Total feedback loop latencies (including both hardware and software latency) were also generally larger than claimed in the literature. Nondefault parameter values can improve Audapter's own processing latency without negative impact on formant tracking. Conclusions Audio interface selection and software parameter optimization substantially affect total feedback loop latency. Thus, the actual total latency (hardware plus software) needs to be correctly measured and described in all published reports. Future speech research with "real-time" auditory feedback perturbations should increase scientific rigor by minimizing this latency.

RevDate: 2020-07-06

Vurma A (2020)

Amplitude Effects of Vocal Tract Resonance Adjustments When Singing Louder.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30194-6 [Epub ahead of print].

In the literature on vocal pedagogy we may find suggestions to increase the mouth opening when singing louder. It is known that sopranos tend to sing loud high notes with a wider mouth opening which raises the frequency of the first resonance of the vocal tract (fR1) to tune it close to the fundamental. Our experiment with classically trained male singers revealed that they also tended to raise the fR1 with the dynamics at pitches where the formant tuning does not seem relevant. The analysis by synthesis showed that such behaviour may contribute to the strengthening of the singer's formant by several dB-s and to a rise in the centre of spectral gravity. The contribution of the fR1 raising to the overall sound level was less consistent. Changing the extent of the mouth opening with the dynamics may create several simultaneous semantic cues that signal how prominent the produced sound is and how great the physical effort by the singer is. The diminishing of the mouth opening when singing piano may also have an importance as it helps singers to produce a quieter sound by increasing the distance between the fR1 and higher resonances, which lowers the transfer function of the vocal tract at the relevant spectral regions.

RevDate: 2020-07-03

Chaturvedi R, Kraus M, RSE Keefe (2020)

A new measure of authentic auditory emotion recognition: Application to patients with schizophrenia.

Schizophrenia research pii:S0920-9964(19)30550-X [Epub ahead of print].

BACKGROUND: Many social processes such as emotion recognition are severely impaired in patients with schizophrenia. While basic auditory processing seems to play a key role in identifying emotions, research in this field is limited due to the lack of proper assessment batteries. Many of the widely accepted tests utilize actors to portray certain emotions-these batteries are less ecologically and face valid.

METHODS: This study utilized a newly developed auditory emotion recognition test that contained natural stimuli from spontaneous displays of emotions to assess 28 patients with schizophrenia and 16 healthy controls.

RESULTS: The results indicate that the newly developed test, referred to as the INTONATION Test, is more sensitive to the emotion recognition deficits in patients with schizophrenia than previously used measures. The correlations of the INTONATION Test measures with basic auditory processes were similar to established tests of auditory emotion. Particular emotion sub scores from the INTONTATION test, such as happiness, demonstrated the strongest correlations with specific auditory processing skills, such as formant discrimination and sinusoidal amplitude modulation detection (SAM60).

CONCLUSIONS: The results from this study indicate that auditory emotion recognition impairments are more pronounced in patients with schizophrenia when perceiving authentic displays of emotion. Understanding these deficits could help specify the nature of auditory emotion recognition deficits in patients with schizophrenia and those at risk.

RevDate: 2020-07-02

Toutios A, Xu M, Byrd D, et al (2020)

How an aglossic speaker produces an alveolar-like percept without a functional tongue tip.

The Journal of the Acoustical Society of America, 147(6):EL460.

It has been previously observed [McMicken, Salles, Berg, Vento-Wilson, Rogers, Toutios, and Narayanan. (2017). J. Commun. Disorders, Deaf Stud. Hear. Aids 5(2), 1-6] using real-time magnetic resonance imaging that a speaker with severe congenital tongue hypoplasia (aglossia) had developed a compensatory articulatory strategy where she, in the absence of a functional tongue tip, produced a plosive consonant perceptually similar to /d/ using a bilabial constriction. The present paper provides an updated account of this strategy. It is suggested that the previously observed compensatory bilabial closing that occurs during this speaker's /d/ production is consistent with vocal tract shaping resulting from hyoid raising created with mylohyoid action, which may also be involved in typical /d/ production. Simulating this strategy in a dynamic articulatory synthesis experiment leads to the generation of /d/-like formant transitions.

RevDate: 2020-07-02

Harper S, Goldstein L, S Narayanan (2020)

Variability in individual constriction contributions to third formant values in American English /ɹ/.

The Journal of the Acoustical Society of America, 147(6):3905.

Although substantial variability is observed in the articulatory implementation of the constriction gestures involved in /ɹ/ production, studies of articulatory-acoustic relations in /ɹ/ have largely ignored the potential for subtle variation in the implementation of these gestures to affect salient acoustic dimensions. This study examines how variation in the articulation of American English /ɹ/ influences the relative sensitivity of the third formant to variation in palatal, pharyngeal, and labial constriction degree. Simultaneously recorded articulatory and acoustic data from six speakers in the USC-TIMIT corpus was analyzed to determine how variation in the implementation of each constriction across tokens of /ɹ/ relates to variation in third formant values. Results show that third formant values are differentially affected by constriction degree for the different constrictions used to produce /ɹ/. Additionally, interspeaker variation is observed in the relative effect of different constriction gestures on third formant values, most notably in a division between speakers exhibiting relatively equal effects of palatal and pharyngeal constriction degree on F3 and speakers exhibiting a stronger palatal effect. This division among speakers mirrors interspeaker differences in mean constriction length and location, suggesting that individual differences in /ɹ/ production lead to variation in articulatory-acoustic relations.

RevDate: 2020-06-25

Xu M, Tachibana RO, Okanoya K, et al (2020)

Unconscious and Distinctive Control of Vocal Pitch and Timbre During Altered Auditory Feedback.

Frontiers in psychology, 11:1224.

Vocal control plays a critical role in smooth social communication. Speakers constantly monitor auditory feedback (AF) and make adjustments when their voices deviate from their intentions. Previous studies have shown that when certain acoustic features of the AF are artificially altered, speakers compensate for this alteration in the opposite direction. However, little is known about how the vocal control system implements compensations for alterations of different acoustic features, and associates them with subjective consciousness. The present study investigated whether compensations for the fundamental frequency (F0), which corresponds to perceived pitch, and formants, which contribute to perceived timbre, can be performed unconsciously and independently. Forty native Japanese speakers received two types of altered AF during vowel production that involved shifts of either only the formant frequencies (formant modification; Fm) or both the pitch and formant frequencies (pitch + formant modification; PFm). For each type, three levels of shift (slight, medium, and severe) in both directions (increase or decrease) were used. After the experiment, participants were tested for whether they had perceived a change in the F0 and/or formants. The results showed that (i) only formants were compensated for in the Fm condition, while both the F0 and formants were compensated for in the PFm condition; (ii) the F0 compensation exhibited greater precision than the formant compensation in PFm; and (iii) compensation occurred even when participants misperceived or could not explicitly perceive the alteration in AF. These findings indicate that non-experts can compensate for both formant and F0 modifications in the AF during vocal production, even when the modifications are not explicitly or correctly perceived, which provides further evidence for a dissociation between conscious perception and action in vocal control. We propose that such unconscious control of voice production may enhance rapid adaptation to changing speech environments and facilitate mutual communication.

RevDate: 2020-06-19

White-Schwoch T, Magohe AK, Fellows AM, et al (2020)

Auditory neurophysiology reveals central nervous system dysfunction in HIV-infected individuals.

Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology, 131(8):1827-1832 pii:S1388-2457(20)30327-8 [Epub ahead of print].

OBJECTIVE: To test the hypothesis that human immunodeficiency virus (HIV) affects auditory-neurophysiological functions.

METHODS: A convenience sample of 68 HIV+ and 59 HIV- normal-hearing adults was selected from a study set in Dar es Salaam, Tanzania. The speech-evoked frequency-following response (FFR), an objective measure of auditory function, was collected. Outcome measures were FFRs to the fundamental frequency (F0) and to harmonics corresponding to the first formant (F1), two behaviorally relevant cues for understanding speech.

RESULTS: The HIV+ group had weaker responses to the F1 than the HIV- group; this effect generalized across multiple stimuli (d = 0.59). Responses to the F0 were similar between groups.

CONCLUSIONS: Auditory-neurophysiological responses differ between HIV+ and HIV- adults despite normal hearing thresholds.

SIGNIFICANCE: The FFR may reflect HIV-associated central nervous system dysfunction that manifests as disrupted auditory processing of speech harmonics corresponding to the first formant.

RevDate: 2020-06-19

DiNino M, Arenberg JG, Duchen ALR, et al (2020)

Effects of Age and Cochlear Implantation on Spectrally Cued Speech Categorization.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Weighting of acoustic cues for perceiving place-of-articulation speech contrasts was measured to determine the separate and interactive effects of age and use of cochlear implants (CIs). It has been found that adults with normal hearing (NH) show reliance on fine-grained spectral information (e.g., formants), whereas adults with CIs show reliance on broad spectral shape (e.g., spectral tilt). In question was whether children with NH and CIs would demonstrate the same patterns as adults, or show differences based on ongoing maturation of hearing and phonetic skills. Method Children and adults with NH and with CIs categorized a /b/-/d/ speech contrast based on two orthogonal spectral cues. Among CI users, phonetic cue weights were compared to vowel identification scores and Spectral-Temporally Modulated Ripple Test thresholds. Results NH children and adults both relied relatively more on the fine-grained formant cue and less on the broad spectral tilt cue compared to participants with CIs. However, early-implanted children with CIs better utilized the formant cue compared to adult CI users. Formant cue weights correlated with CI participants' vowel recognition and in children, also related to Spectral-Temporally Modulated Ripple Test thresholds. Adults and child CI users with very poor phonetic perception showed additive use of the two cues, whereas those with better and/or more mature cue usage showed a prioritized trading relationship, akin to NH listeners. Conclusions Age group and hearing modality can influence phonetic cue-weighting patterns. Results suggest that simple nonlexical categorization tests correlate with more general speech recognition skills of children and adults with CIs.

RevDate: 2020-06-16

Chiu YF, Neel A, T Loux (2020)

Acoustic characteristics in relation to intelligibility reduction in noise for speakers with Parkinson's disease.

Clinical linguistics & phonetics [Epub ahead of print].

Decreased speech intelligibility in noisy environments is frequently observed in speakers with Parkinson's disease (PD). This study investigated which acoustic characteristics across the speech subsystems contributed to poor intelligibility in noise for speakers with PD. Speech samples were obtained from 13 speakers with PD and five healthy controls reading 56 sentences. Intelligibility analysis was conducted in quiet and noisy listening conditions. Seventy-two young listeners transcribed the recorded sentences in quiet and another 72 listeners transcribed in noise. The acoustic characteristics of the speakers with PD who experienced large intelligibility reduction from quiet to noise were compared to those with smaller intelligibility reduction in noise and healthy controls. The acoustic measures in the study included second formant transitions, cepstral and spectral measures of voice (cepstral peak prominence and low/high spectral ratio), pitch variation, and articulation rate to represent speech components across speech subsystems of articulation, phonation, and prosody. The results show that speakers with PD who had larger intelligibility reduction in noise exhibited decreased second formant transition, limited cepstral and spectral variations, and faster articulation rate. These findings suggest that the adverse effect of noise on speech intelligibility in PD is related to speech changes in the articulatory and phonatory systems.

RevDate: 2020-06-15

Rankinen W, K de Jong (2020)

The Entanglement of Dialectal Variation and Speaker Normalization.

Language and speech [Epub ahead of print].

This paper explores the relationship between speaker normalization and dialectal identity in sociolinguistic data, examining a database of vowel formants collected from 88 monolingual American English speakers in Michigan's Upper Peninsula. Audio recordings of Finnish- and Italian-heritage American English speakers reading a passage and a word list were normalized using two normalization procedures. These algorithms are based on different concepts of normalization: Lobanov, which models normalization as based on experience with individual talkers, and Labov ANAE, which models normalization as based on experience with scale-factors inherent in acoustic resonators of all kinds. The two procedures yielded different results; while the Labov ANAE method reveals a cluster shifting of low and back vowels that correlated with heritage, the Lobanov procedure seems to eliminate this sociolinguistic variation. The difference between the two procedures lies in how they treat relations between formant changes, suggesting that dimensions of variation in the vowel space may be treated differently by different normalization procedures, raising the question of how anatomical variation and dialectal variation interact in the real world. The structure of the sociolinguistic effects found with the Labov ANAE normalized data, but not in the Lobanov normalized data, suggest that the Lobanov normalization does over-normalize formant measures and remove sociolinguistically relevant information.

RevDate: 2020-06-09

Ménard L, Prémont A, Trudeau-Fisette P, et al (2020)

Phonetic Implementation of Prosodic Emphasis in Preschool-Aged Children and Adults: Probing the Development of Sensorimotor Speech Goals.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Objective We aimed to investigate the production of contrastive emphasis in French-speaking 4-year-olds and adults. Based on previous work, we predicted that, due to their immature motor control abilities, preschool-aged children would produce smaller articulatory differences between emphasized and neutral syllables than adults. Method Ten 4-year-old children and 10 adult French speakers were recorded while repeating /bib/, /bub/, and /bab/ sequences in neutral and contrastive emphasis conditions. Synchronous recordings of tongue movements, lip and jaw positions, and speech signals were made. Lip positions and tongue shapes were analyzed; formant frequencies, amplitude, fundamental frequency, and duration were extracted from the acoustic signals; and between-vowel contrasts were calculated. Results Emphasized vowels were higher in pitch, intensity, and duration than their neutral counterparts in all participants. However, the effect of contrastive emphasis on lip position was smaller in children. Prosody did not affect tongue position in children, whereas it did in adults. As a result, children's productions were perceived less accurately than those of adults. Conclusion These findings suggest that 4-year-old children have not yet learned to produce hypoarticulated forms of phonemic goals to allow them to successfully contrast syllables and enhance prosodic saliency.

RevDate: 2020-05-07

Groll MD, McKenna VS, Hablani S, et al (2020)

Formant-Estimated Vocal Tract Length and Extrinsic Laryngeal Muscle Activation During Modulation of Vocal Effort in Healthy Speakers.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The goal of this study was to explore the relationships among vocal effort, extrinsic laryngeal muscle activity, and vocal tract length (VTL) within healthy speakers. We hypothesized that increased vocal effort would result in increased suprahyoid muscle activation and decreased VTL, as previously observed in individuals with vocal hyperfunction. Method Twenty-eight healthy speakers of American English produced vowel-consonant-vowel utterances under varying levels of vocal effort. VTL was estimated from the vowel formants. Three surface electromyography sensors measured the activation of the suprahyoid and infrahyoid muscle groups. A general linear model was used to investigate the effects of vocal effort level and surface electromyography on VTL. Two additional general linear models were used to investigate the effects of vocal effort on suprahyoid and infrahyoid muscle activities. Results Neither vocal effort nor extrinsic muscle activity showed significant effects on VTL; however, the degree of extrinsic muscle activity of both suprahyoid and infrahyoid muscle groups increased with increases in vocal effort. Conclusion Increasing vocal effort resulted in increased activation of both suprahyoid and infrahyoid musculature in healthy adults, with no change to VTL.

RevDate: 2020-05-06

Zhou H, Lu J, Zhang C, et al (2020)

Abnormal Acoustic Features Following Pharyngeal Flap Surgery in Patients Aged Six Years and Older.

The Journal of craniofacial surgery [Epub ahead of print].

In our study, older velopharyngeal insufficiency (posterior velopharyngeal insufficiency) patients were defined as those older than 6 years of age. This study aimed to evaluate the abnormal acoustic features of older velopharyngeal insufficiency patients before and after posterior pharyngeal flap surgery. A retrospective medical record review was conducted for patients aged 6 years and older, who underwent posterior pharyngeal flap surgery between November 2011 and March 2015. The audio records of patients were evaluated before and after surgery. Spectral analysis was conducted by the Computer Speech Lab (CSL)-4150B acoustic system with the following input data: The vowel /i/, unaspirated plosive /b/, aspirated plosives /p/, aspirated fricatives /s/ and /x/, unaspirated affricates /j/ and /z/, and aspirated affricates /c/ and /q/. The patients were followed up for 3 months. Speech outcome was evaluated by comparing the postoperatively phonetic data with preoperative data. Subjective and objective analyses showed significant differences in the sonogram, formant, and speech articulation before and after the posterior pharyngeal flap surgery. However, the sampled patients could not be considered to have a high speech articulation (<85%) as the normal value was above or equal to 96%. Our results showed that pharyngeal flap surgery could correct the speech function of older patients with posterior velopharyngeal insufficiency to some extent. Owing to the original errors in pronunciation patterns, pathological speech articulation still existed, and speech treatment is required in the future.

RevDate: 2020-05-03

Almurashi W, Al-Tamimi J, G Khattab (2020)

Static and dynamic cues in vowel production in Hijazi Arabic.

The Journal of the Acoustical Society of America, 147(4):2917.

Static cues such as formant measurements obtained at the vowel midpoint are usually taken as the main correlate for vowel identification. However, dynamic cues such as vowel-inherent spectral change have been shown to yield better classification of vowels using discriminant analysis. The aim of this study is to evaluate the role of static versus dynamic cues in Hijazi Arabic (HA) vowel classification, in addition to vowel duration and F3, which are not usually looked at. Data from 12 male HA speakers producing eight HA vowels in /hVd/ syllables were obtained, and classification accuracy was evaluated using discriminant analysis. Dynamic cues, particularly the three-point model, had higher classification rates (average 95.5%) than the remaining models (static model: 93.5%; other dynamic models: between 65.75% and 94.25%). Vowel duration had a significant role in classification accuracy (average +8%). These results are in line with dynamic approaches to vowel classification and highlight the relative importance of cues such as vowel duration across languages, particularly where it is prominent in the phonology.

RevDate: 2020-05-03

Egurtzegi A, C Carignan (2020)

An acoustic description of Mixean Basque.

The Journal of the Acoustical Society of America, 147(4):2791.

This paper presents an acoustic analysis of Mixean Low Navarrese, an endangered variety of Basque. The manuscript includes an overview of previous acoustic studies performed on different Basque varieties in order to synthesize the sparse acoustic descriptions of the language that are available. This synthesis serves as a basis for the acoustic analysis performed in the current study, in which the various acoustic analyses given in previous studies are replicated in a single, cohesive general acoustic description of Mixean Basque. The analyses include formant and duration measurements for the six-vowel system, voice onset time measurements for the three-way stop system, spectral center of gravity for the sibilants, and number of lingual contacts in the alveolar rhotic tap and trill. Important findings include: a centralized realization ([ʉ]) of the high-front rounded vowel usually described as /y/; a data-driven confirmation of the three-way laryngeal opposition in the stop system; evidence in support of an alveolo-palatal to apical sibilant merger; and the discovery of a possible incipient merger of rhotics. These results show how using experimental acoustic methods to study under-represented linguistic varieties can result in revelations of sound patterns otherwise undescribed in more commonly studied varieties of the same language.

RevDate: 2020-05-03

Mellesmoen G, M Babel (2020)

Acoustically distinct and perceptually ambiguous: ʔayʔaǰuθəm (Salish) fricatives.

The Journal of the Acoustical Society of America, 147(4):2959.

ʔayʔaǰuθəm (Comox-Sliammon) is a Central Salish language spoken in British Columbia with a large fricative inventory. Previous impressionistic descriptions of ʔayʔaǰuθəm have noted perceptual ambiguity of select anterior fricatives. This paper provides an auditory-acoustic description of the four anterior fricatives /θ s ʃ ɬ/ in the Mainland dialect of ʔayʔaǰuθəm. Peak ERBN trajectories, noise duration, and formant transitions are analysed in the fricative productions of five speakers. These analyses provide quantitative and qualitative descriptions of these fricative contrasts, indicating more robust acoustic differentiation for fricatives in onset versus coda position. In a perception task, English listeners categorized fricatives in CV and VC sequences from the natural productions. The results of the perception experiment are consistent with reported perceptual ambiguity between /s/ and /θ/, with listeners frequently misidentifying /θ/ as /s/. The production and perception data suggest that listener L1 categories play a role in the categorization and discrimination of ʔayʔaǰuθəm fricatives. These findings provide an empirical description of fricatives in an understudied language and have implications for L2 teaching and learning in language revitalization contexts.

RevDate: 2020-05-03

Rosen N, Stewart J, ON Sammons (2020)

How "mixed" is mixed language phonology? An acoustic analysis of the Michif vowel system.

The Journal of the Acoustical Society of America, 147(4):2989.

Michif, a severely endangered language still spoken today by an estimated 100-200 Métis people in Western Canada, is generally classified as a mixed language, meaning it cannot be traced back to a single language family [Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Thomason (2001). Language Contact: An Introduction (Edinburgh University Press and Georgetown University Press, Edinburgh and Washington, DC); Meakins (2013). Contact Languages: A Comprehensive Guide (Mouton De Gruyter, Berlin), pp. 159-228.]. It has been claimed to maintain the phonological grammar of both of its source languages, French and Plains Cree [Rhodes (1977). Actes du Huitieme congrès des Algonqunistes (Carleton University, Ottawa), pp. 6-25; Bakker (1997). A Language of Our Own (Oxford University Press, Oxford); Bakker and Papen (1997). Contact Languages: A Wider Perspective (John Benjamins, Amsterdam), pp. 295-363]. The goal of this paper is twofold: to offer an instrumental analysis of Michif vowels and to investigate this claim of a stratified grammar, based on this careful phonetic analysis. Using source language as a variable in the analysis, the authors argue the Michif vowel system does not appear to rely on historical information, and that historically similar French and Cree vowels pattern together within the Michif system with regards to formant frequencies and duration. The authors show that there are nine Michif oral vowels in this system, which has merged phonetically similar French- and Cree-source vowels.

RevDate: 2020-05-03

van Brenk F, H Terband (2020)

Compensatory and adaptive responses to real-time formant shifts in adults and children.

The Journal of the Acoustical Society of America, 147(4):2261.

Auditory feedback plays an important role in speech motor learning, yet, little is known about the strength of motor learning and feedback control in speech development. This study investigated compensatory and adaptive responses to auditory feedback perturbation in children (aged 4-9 years old) and young adults (aged 18-29 years old). Auditory feedback was perturbed by near-real-time shifting F1 and F2 of the vowel /ɪː/ during the production of consonant-vowel-consonant words. Children were able to compensate and adapt in a similar or larger degree compared to young adults. Higher token-to-token variability was found in children compared to adults but not disproportionately higher during the perturbation phases compared to the unperturbed baseline. The added challenge to auditory-motor integration did not influence production variability in children, and compensation and adaptation effects were found to be strong and sustainable. Significant group differences were absent in the proportions of speakers displaying a compensatory or adaptive response, an amplifying response, or no consistent response. Within these categories, children produced significantly stronger compensatory, adaptive, or amplifying responses, which could be explained by less-ingrained existing representations. The results are interpreted as both auditory-motor integration and learning capacities are stronger in young children compared to adults.

RevDate: 2020-05-03

Chiu C, JT Sun (2020)

On pharyngealized vowels in Northern Horpa: An acoustic and ultrasound study.

The Journal of the Acoustical Society of America, 147(4):2928.

In the Northern Horpa (NH) language of Sichuan, vowels are divided between plain and pharyngealized sets, with the latter pronounced with auxiliary articulatory gestures involving more constriction in the vocal tract. The current study examines how the NH vocalic contrast is manifested in line with the process of pharyngealization both acoustically and articulatorily, based on freshly gathered data from two varieties of the language (i.e., Rtsangkhog and Yunasche). Along with formant analyses, ultrasound imaging was employed to capture the tongue postures and positions during vowel production. The results show that in contrast with plain vowels, pharyngealized vowels generally feature lower F2 values and higher F1 and F3 values. Mixed results for F2 and F3 suggest that the quality contrasts are vowel-dependent. Ultrasound images, on the other hand, reveal that the vocalic distinction is affected by different types of tongue movements, including retraction, backing, and double bunching, depending on the inherent tongue positions for each vowel. The two NH varieties investigated are found to display differential formant changes and different types of tongue displacements. The formant profiles along with ultrasound images support the view that the production of the NH phonologically marked vowels is characteristic of pharyngealization.

RevDate: 2020-05-03

Horo L, Sarmah P, GDS Anderson (2020)

Acoustic phonetic study of the Sora vowel system.

The Journal of the Acoustical Society of America, 147(4):3000.

This paper is an acoustic phonetic study of vowels in Sora, a Munda language of the Austroasiatic language family. Descriptions here illustrate that the Sora vowel system has six vowels and provide evidence that Sora disyllables have prominence on the second syllable. While the acoustic categorization of vowels is based on formant frequencies, the presence of prominence on the second syllable is shown through temporal features of vowels, including duration, intensity, and fundamental frequency. Additionally, this paper demonstrates that acoustic categorization of vowels in Sora is better in the prominent syllable than in the non-prominent syllable, providing evidence that syllable prominence and vowel quality are correlated in Sora. These acoustic properties of Sora vowels are discussed in relation to the existing debates on vowels and patterns of syllable prominence in Munda languages of India. In this regard, it is noteworthy that Munda languages, in general, lack instrumental studies, and therefore this paper presents significant findings that are undocumented in other Munda languages. These acoustic studies are supported by exploratory statistical modeling and statistical classification methods.

RevDate: 2020-05-03

Sarvasy H, Elvin J, Li W, et al (2020)

An acoustic phonetic description of Nungon vowels.

The Journal of the Acoustical Society of America, 147(4):2891.

This study is a comprehensive acoustic description and analysis of the six vowels /i e a u o ɔ/ in the Towet dialect of the Papuan language Nungon ⟨yuw⟩ of northeastern Papua New Guinea. Vowel tokens were extracted from a corpus of audio speech recordings created for general language documentation and grammatical description. To assess the phonetic correlates of a claimed phonological vowel length distinction, vowel duration was measured. Multi-point acoustic analyses enabled investigation of mean vowel F1, F2, and F3; vowel trajectories, and coarticulation effects. The three Nungon back vowels were of particular interest, as they contribute to an asymmetrical, back vowel-heavy array, and /o/ had previously been described as having an especially low F2. The authors found that duration of phonologically long and short vowels differed significantly. Mean vowel formant measurements confirmed that the six phonological vowels form six distinct acoustic groupings; trajectories show slightly more formant movement in some vowels than was previously known. Adjacent nasal consonants exerted significant effects on vowel formant measurements. The authors show that an uncontrolled, general documentation corpus for an under-described language can be mined for acoustic analysis, but coarticulation effects should be taken into account.

RevDate: 2020-05-03

Nance C, S Kirkham (2020)

The acoustics of three-way lateral and nasal palatalisation contrasts in Scottish Gaelic.

The Journal of the Acoustical Society of America, 147(4):2858.

This paper presents an acoustic description of laterals and nasals in an endangered minority language, Scottish Gaelic (known as "Gaelic"). Gaelic sonorants are reported to take part in a typologically unusual three-way palatalisation contrast. Here, the acoustic evidence for this contrast is considered, comparing lateral and nasal consonants in both word-initial and word-final position. Previous acoustic work has considered lateral consonants, but nasals are much less well-described. An acoustic analysis of twelve Gaelic-dominant speakers resident in a traditionally Gaelic-speaking community is reported. Sonorant quality is quantified via measurements of F2-F1 and F3-F2 and observation of the whole spectrum. Additionally, we quantify extensive devoicing in word-final laterals that has not been previously reported. Mixed-effects regression modelling suggests robust three-way acoustic differences in lateral consonants in all relevant vowel contexts. Nasal consonants, however, display lesser evidence of the three-way contrast in formant values and across the spectrum. Potential reasons for lesser evidence of contrast in the nasal system are discussed, including the nature of nasal acoustics, evidence from historical changes, and comparison to other Goidelic dialects. In doing so, contributions are made to accounts of the acoustics of the Celtic languages, and to typologies of contrastive palatalisation in the world's languages.

RevDate: 2020-05-03

Tabain M, Butcher A, Breen G, et al (2020)

A formant study of the alveolar versus retroflex contrast in three Central Australian languages: Stop, nasal, and lateral manners of articulation.

The Journal of the Acoustical Society of America, 147(4):2745.

This study presents formant transition data from 21 speakers for the apical alveolar∼retroflex contrast in three neighbouring Central Australian languages: Arrernte, Pitjantjatjara, and Warlpiri. The contrast is examined for three manners of articulation: stop, nasal, and lateral /t ∼ ʈ/ /n ∼ ɳ/, and /l ∼ ɭ/, and three vowel contexts /a i u/. As expected, results show that a lower F3 and F4 in the preceding vowel signal a retroflex consonant; and that the alveolar∼retroflex contrast is most clearly realized in the context of an /a/ vowel, and least clearly realized in the context of an /i/ vowel. Results also show that the contrast is most clearly realized for the stop manner of articulation. These results provide an acoustic basis for the greater typological rarity of retroflex nasals and laterals as compared to stops. It is suggested that possible nasalization of the preceding vowel accounts for the poorer nasal consonant results, and that articulatory constraints on lateral consonant production account for the poorer lateral consonant results. Importantly, differences are noticed between speakers, and it is suggested that literacy plays a major role in maintenance of this marginal phonemic contrast.

RevDate: 2020-04-27

Liepins R, Kaider A, Honeder C, et al (2020)

Formant frequency discrimination with a fine structure sound coding strategy for cochlear implants.

Hearing research, 392:107970 pii:S0378-5955(19)30207-2 [Epub ahead of print].

Recent sound coding strategies for cochlear implants (CI) have focused on the transmission of temporal fine structure to the CI recipient. To date, knowledge about the effects of fine structure coding in electrical hearing is poorly charactarized. The aim of this study was to examine whether the presence of temporal fine structure coding affects how the CI recipient perceives sound. This was done by comparing two sound coding strategies with different temporal fine structure coverage in a longitudinal cross-over setting. The more recent FS4 coding strategy provides fine structure coding on typically four apical stimulation channels compared to FSP with usually one or two fine structure channels. 34 adult CI patients with a minimum CI experience of one year were included. All subjects were fitted according to clinical routine and used both coding strategies for three months in a randomized sequence. Formant frequency discrimination thresholds (FFDT) were measured to assess the ability to resolve timbre information. Further outcome measures included a monosyllables test in quiet and the speech reception threshold of an adaptive matrix sentence test in noise (Oldenburger sentence test). In addition, the subjective sound quality was assessed using visual analogue scales and a sound quality questionnaire after each three months period. The extended fine structure range of FS4 yields FFDT similar to FSP for formants occurring in the frequency range only covered by FS4. There is a significant interaction (p = 0.048) between the extent of fine structure coverage in FSP and the improvement in FFDT in favour of FS4 for these stimuli. FS4 Speech perception in noise and quiet was similar with both coding strategies. Sound quality was rated heterogeneously showing that both strategies represent valuable options for CI fitting to allow for best possible individual optimization.

RevDate: 2020-04-22

Toyoda A, Maruhashi T, Malaivijitnond S, et al (2020)

Dominance status and copulatory vocalizations among male stump-tailed macaques in Thailand.

Primates; journal of primatology pii:10.1007/s10329-020-00820-7 [Epub ahead of print].

Male copulation calls sometimes play important roles in sexual strategies, attracting conspecific females or advertising their social status to conspecific males. These calls generally occur in sexually competitive societies such as harem groups and multi-male and multi-female societies. However, the call functions remain unclear because of limited availability of data sets that include a large number of male and female animals in naturalistic environments, particularly in primates. Here, we examined the possible function of male-specific copulation calls in wild stump-tailed macaques (Macaca arctoides) by analyzing the contexts and acoustic features of vocalizations. We observed 395 wild stump-tailed macaques inhabiting the Khao Krapuk Khao Taomor Non-Hunting Area in Thailand and recorded all occurrences of observed copulations. We counted 446 male-specific calls in 383 copulations recorded, and measured their acoustic characteristics. Data were categorized into three groups depending on their social status: dominant (alpha and coalition) males and non-dominant males. When comparing male status, alpha males most frequently produced copulation calls at ejaculation, coalition males produced less frequent calls than alpha males, and other non-dominant males rarely vocalized, maintaining silence even when mounting females. Acoustic analysis indicated no significant influence of status (alpha or coalition) on call number, bout duration, or further formant dispersion parameters. Our results suggest that male copulation calls of this species are social status-dependent signals. Furthermore, dominant males might actively transmit their social status and copulations to other male rivals to impede their challenging attacks, while other non-dominant males maintain silence to prevent the interference of dominants.

RevDate: 2020-04-19

Saldías M, Laukkanen AM, Guzmán M, et al (2020)

The Vocal Tract in Loud Twang-Like Singing While Producing High and Low Pitches.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30057-6 [Epub ahead of print].

Twang-like vocal qualities have been related to a megaphone-like shape of the vocal tract (epilaryngeal tube and pharyngeal narrowing, and a wider mouth opening), low-frequency spectral changes, and tighter and/or increased vocal fold adduction. Previous studies have focused mainly on loud and high-pitched singing, comfortable low-pitched spoken vowels, or are based on modeling and simulation. There is no data available related to twang-like voices in loud, low-pitched singing.

PURPOSE: This study investigates the possible contribution of the lower and upper vocal tract configurations during loud twang-like singing on high and low pitches in a real subject.

METHODS: One male contemporary commercial music singer produced a sustained vowel [a:] in his habitual speaking pitch (B2) and loudness. The same vowel was also produced in a loud twang-like singing voice on high (G4) and low pitches (B2). Computerized tomography, acoustic analysis, inverse filtering, and audio-perceptual assessments were performed.

RESULTS: Both loud twang-like voices showed a megaphone-like shape of the vocal tract, being more notable on the low pitch. Also, low-frequency spectral changes, a peak of sound energy around 3 kHz and increased vocal fold adduction were found. Results agreed with audio-perceptual evaluation.

CONCLUSIONS: Loud twang-like phonation seems to be mainly related to low-frequency spectral changes (under 2 kHz) and a more compact formant structure. Twang-like qualities seem to require different degrees of twang-related vocal tract adjustments while phonating in different pitches. A wider mouth opening, pharyngeal constriction, and epilaryngeal tube narrowing may be helpful strategies for maximum power transfer and improved vocal economy in loud contemporary commercial music singing and potentially in loud speech. Further studies should focus on vocal efficiency and vocal economy measurements using modeling and simulation, based on real-singers' data.

RevDate: 2020-04-17

Yaralı M (2020)

Varying effect of noise on sound onset and acoustic change evoked auditory cortical N1 responses evoked by a vowel-vowel stimulus.

International journal of psychophysiology : official journal of the International Organization of Psychophysiology pii:S0167-8760(20)30077-5 [Epub ahead of print].

INTRODUCTION: According to previous studies noise causes prolonged latencies and decreased amplitudes in acoustic change evoked cortical responses. Particularly for a consonant-vowel stimulus, speech shaped noise leads to more pronounced changes on onset evoked response than acoustic change evoked response. Reasoning that this may be related to the spectral characteristics of the stimuli and the noise, in the current study a vowel-vowel stimulus (/ui/) was presented in white noise during cortical response recordings. The hypothesis is that the effect of noise will be higher on acoustic change N1 compared to onset N1 due to the masking effects on formant transitions.

METHODS: Onset and acoustic change evoked auditory cortical N1-P2 responses were obtained from 21 young adults with normal hearing while presenting 1000 ms/ui/stimuli in quiet and in white noise at +10 dB and 0 dB signal-to-noise ratio (SNR).

RESULTS: In the quiet and +10 dB SNR conditions, the N1-P2 responses to both onset and change were present. In the +10 dB SNR condition acoustic change N1-P2 peak-to-peak amplitudes were reduced and N1 latencies were prolonged compared to the quiet condition. Whereas there was not a significant change in onset N1 latencies and N1-P2 peak-to-peak amplitudes in the +10 dB SNR condition. In the 0 dB SNR condition change responses were not observed but onset N1-P2 peak-to-peak amplitudes were significantly lower, and onset N1 latencies were significantly higher compared to the quiet and the 10 dB SNR conditions. Onset and change responses were also compared with each other in each condition. N1 latencies and N1-P2 peak to peak amplitudes of onset and acoustic change were not significantly different in the quiet condition. Whereas at 10 dB SNR, acoustic change N1 latencies were higher and N1-P2 amplitudes were lower than onset latencies and amplitudes. To summarize, presentation of white noise at 10 dB SNR resulted in the reduction of acoustic change evoked N1-P2 peak-to-peak amplitudes and the prolongation of N1 latencies compared to quiet. Same effect on onsets were only observed at 0 dB SNR, where acoustic change N1 was not observed. In the quiet condition, latencies and amplitudes of onsets and changes were not different. Whereas at 10 dB SNR, acoustic change N1 latencies were higher, amplitudes were lower than onset N1.

DISCUSSION/CONCLUSIONS: The effect of noise was found to be higher on acoustic change evoked N1 response compared to onset N1. This may be related to the spectral characteristics of the utilized noise and the stimuli, possible differences in acoustic features of sound onsets and acoustic changes, or to the possible differences in the mechanisms for detecting acoustic changes and sound onsets. In order to investigate the possible reasons for more pronounced effect of noise on acoustic changes, future work with different vowel-vowel transitions in different noise types is suggested.

RevDate: 2020-04-04

Tykalova T, Skrabal D, Boril T, et al (2020)

Effect of Ageing on Acoustic Characteristics of Voice Pitch and Formants in Czech Vowels.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(20)30086-2 [Epub ahead of print].

BACKGROUND: The relevance of formant-based measures has been noted across a spectrum of medical, technical, and linguistic applications. Therefore, the primary aim of the study was to evaluate the effect of ageing on vowel articulation, as the previous research revealed contradictory findings. The secondary aim was to provide normative acoustic data for all Czech monophthongs.

METHODS: The database consisted of 100 healthy speakers (50 men and 50 women) aged between 20 and 90. Acoustic characteristics, including vowel duration, vowel space area (VSA), fundamental frequency (fo), and the first to fourth formant frequencies (F1-F4) of 10 Czech vowels were extracted from a reading passage. In addition, the articulation rate was calculated from the entire duration of the reading passage.

RESULTS: Age-related changes in pitch were sex-dependent, while age-related alterations in F2/a/, F2/u/, VSA, and vowel duration seemed to be sex-independent. In particular, we observed a clear lowering of fo with age for women, but no change for men. With regard to formants, we found lowering of F2/a/ and F2/u/ with increased age, but no statistically significant changes in F1, F3, or F4 frequencies with advanced age. Although the alterations in F1 and F2 frequencies were rather small, they appeared to be in a direction against vowel centralization, resulting in a significantly greater VSA in the older population. The greater VSA was found to be related partly to longer vowel duration.

CONCLUSIONS: Alterations in vowel formant frequencies across several decades of adult life appear to be small or in a direction against vowel centralization, thus indicating the good preservation of articulatory precision in older speakers.

RevDate: 2020-04-02

Milenkovic PH, Wagner M, Kent RD, et al (2020)

Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children.

The Journal of the Acoustical Society of America, 147(3):EL221.

The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.

RevDate: 2020-03-20

Deloche F (2020)

Fine-grained statistical structure of speech.

PloS one, 15(3):e0230233 pii:PONE-D-19-01235.

In spite of its acoustic diversity, the speech signal presents statistical regularities that can be exploited by biological or artificial systems for efficient coding. Independent Component Analysis (ICA) revealed that on small time scales (∼ 10 ms), the overall structure of speech is well captured by a time-frequency representation whose frequency selectivity follows the same power law in the high frequency range 1-8 kHz as cochlear frequency selectivity in mammals. Variations in the power-law exponent, i.e. different time-frequency trade-offs, have been shown to provide additional adaptation to phonetic categories. Here, we adopt a parametric approach to investigate the variations of the exponent at a finer level of speech. The estimation procedure is based on a measure that reflects the sparsity of decompositions in a set of Gabor dictionaries whose atoms are Gaussian-modulated sinusoids. We examine the variations of the exponent associated with the best decomposition, first at the level of phonemes, then at an intra-phonemic level. We show that this analysis offers a rich interpretation of the fine-grained statistical structure of speech, and that the exponent values can be related to key acoustic properties. Two main results are: i) for plosives, the exponent is lowered by the release bursts, concealing higher values during the opening phases; ii) for vowels, the exponent is bound to formant bandwidths and decreases with the degree of acoustic radiation at the lips. This work further suggests that an efficient coding strategy is to reduce frequency selectivity with sound intensity level, congruent with the nonlinear behavior of cochlear filtering.

RevDate: 2020-03-20

Hardy TLD, Boliek CA, Aalto D, et al (2020)

Contributions of Voice and Nonverbal Communication to Perceived Masculinity-Femininity for Cisgender and Transgender Communicators.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity-femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women (n = 10 of each) and transgender women (n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers (n = 20) rated each communicator's masculinity-femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity-femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity-femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

RevDate: 2020-03-11

Carl M, Kent RD, Levy ES, et al (2020)

Vowel Acoustics and Speech Intelligibility in Young Adults With Down Syndrome.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose Speech production deficits and reduced intelligibility are frequently noted in individuals with Down syndrome (DS) and are attributed to a combination of several factors. This study reports acoustic data on vowel production in young adults with DS and relates these findings to perceptual analysis of speech intelligibility. Method Participants were eight young adults with DS as well as eight age- and gender-matched typically developing (TD) controls. Several different acoustic measures of vowel centralization and variability were applied to tokens of corner vowels (/ɑ/, /æ/, /i/, /u/) produced in common English words. Intelligibility was assessed for single-word productions of speakers with DS, by means of transcriptions from 14 adult listeners. Results Group differentiation was found for some, but not all, of the acoustic measures. Low vowels were more acoustically centralized and variable in speakers with DS than TD controls. Acoustic findings were associated with overall intelligibility scores. Vowel formant dispersion was the most sensitive measure in distinguishing DS and TD formant data. Conclusion Corner vowels are differentially affected in speakers with DS. The acoustic characterization of vowel production and its association with speech intelligibility scores within the DS group support the conclusion of motor control deficits in the overall speech impairment. Implications are discussed for effective treatment planning.

RevDate: 2020-03-09

Zhang T, Shao Y, Wu Y, et al (2020)

Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.

IEEE journal of biomedical and health informatics [Epub ahead of print].

Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

RevDate: 2020-03-01

Allison KM, Salehi S, JR Green (2020)

Effect of prosodic manipulation on articulatory kinematics and second formant trajectories in children.

The Journal of the Acoustical Society of America, 147(2):769.

This study investigated effects of rate reduction and emphatic stress cues on second formant (F2) trajectories and articulatory movements during diphthong production in 11 typically developing school-aged children. F2 extent increased in slow and emphatic stress conditions, and tongue and jaw displacement increased in the emphatic stress condition compared to habitual speech. Tongue displacement significantly predicted F2 extent across speaking conditions. Results suggest that slow rate and emphatic stress cues induce articulatory and acoustic changes in children that may enhance clarity of the acoustic signal. Potential clinical implications for improving speech in children with dysarthria are discussed.

RevDate: 2020-03-01

Summers RJ, B Roberts (2020)

Informational masking of speech by acoustically similar intelligible and unintelligible interferers.

The Journal of the Acoustical Society of America, 147(2):1113.

Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.

RevDate: 2020-03-01

Winn MB (2020)

Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script.

The Journal of the Acoustical Society of America, 147(2):852.

Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.

RevDate: 2020-02-29

Riggs WJ, Hiss MM, Skidmore J, et al (2020)

Utilizing Electrocochleography as a Microphone for Fully Implantable Cochlear Implants.

Scientific reports, 10(1):3714 pii:10.1038/s41598-020-60694-z.

Current cochlear implants (CIs) are semi-implantable devices with an externally worn sound processor that hosts the microphone and sound processor. A fully implantable device, however, would ultimately be desirable as it would be of great benefit to recipients. While some prototypes have been designed and used in a few select cases, one main stumbling block is the sound input. Specifically, subdermal implantable microphone technology has been poised with physiologic issues such as sound distortion and signal attenuation under the skin. Here we propose an alternative method that utilizes a physiologic response composed of an electrical field generated by the sensory cells of the inner ear to serve as a sound source microphone for fully implantable hearing technology such as CIs. Electrophysiological results obtained from 14 participants (adult and pediatric) document the feasibility of capturing speech properties within the electrocochleography (ECochG) response. Degradation of formant properties of the stimuli /da/ and /ba/ are evaluated across various degrees of hearing loss. Preliminary results suggest proof-of-concept of using the ECochG response as a microphone is feasible to capture vital properties of speech. However, further signal processing refinement is needed in addition to utilization of an intracochlear recording location to likely improve signal fidelity.

RevDate: 2020-02-27

Kim HT (2020)

Vocal Feminization for Transgender Women: Current Strategies and Patient Perspectives.

International journal of general medicine, 13:43-52 pii:205102.

Voice feminization for transgender women is a highly complicated comprehensive transition process. Voice feminization has been thought to be equal to pitch elevation. Thus, many surgical procedures have only focused on pitch raising for voice feminization. However, voice feminization should not only consider voice pitch but also consider gender differences in physical, neurophysiological, and acoustical characteristics of voice. That is why voice therapy has been the preferred choice for the feminization of the voice. Considering gender difference of phonatory system, the method for voice feminization consists of changing the following four critical elements: fundamental frequency, resonance frequency related to vocal tract volume and length, formant tuning, and phonatory pattern. Voice feminizing process can be generally divided into non-surgical feminization and surgical feminization. As a non-surgical procedure, feminization voice therapy consists of increasing fundamental frequency, improving oral and pharyngeal resonance, and behavioral therapy. Surgical feminization usually can be achieved by external approach or endoscopic approach. Based on three factors (length, tension and mass) of vocal fold for pitch modulation, surgical procedure can be classified as one-factor, two-factors and three-factors modification of vocal folds. Recent systematic reviews and meta-analysis studies have reported positive outcomes for both the voice therapy and voice feminization surgery. The benefits of voice therapy, as it is highly satisfactory, mostly increase vocal pitch, and are noninvasive. However, the surgical voice feminization of three-factors modification of vocal folds is also highly competent and provides a maximum absolute increase in vocal pitch. Voice feminization is a long transition journey for physical, neurophysiological, and psychosomatic changes that convert a male phonatory system to a female phonatory system. Therefore, strategies for voice feminization should be individualized according to the individual's physical condition, the desired change in voice pitch, economic conditions, and social roles.

RevDate: 2020-02-20

Levy ES, Moya-Galé G, Chang YM, et al (2020)

Effects of speech cues in French-speaking children with dysarthria.

International journal of language & communication disorders [Epub ahead of print].

BACKGROUND: Articulatory excursion and vocal intensity are reduced in many children with dysarthria due to cerebral palsy (CP), contributing to the children's intelligibility deficits and negatively affecting their social participation. However, the effects of speech-treatment strategies for improving intelligibility in this population are understudied, especially for children who speak languages other than English. In a cueing study on English-speaking children with dysarthria, acoustic variables and intelligibility improved when the children were provided with cues aimed to increase articulatory excursion and vocal intensity. While French is among the top 20 most spoken languages in the world, dysarthria and its management in French-speaking children are virtually unexplored areas of research. Information gleaned from such research is critical for providing an evidence base on which to provide treatment.

AIMS: To examine acoustic and perceptual changes in the speech of French-speaking children with dysarthria, who are provided with speech cues targeting greater articulatory excursion (French translation of 'speak with your big mouth') and vocal intensity (French translation of 'speak with your strong voice'). This study investigated whether, in response to the cues, the children would make acoustic changes and listeners would perceive the children's speech as more intelligible.

METHODS & PROCEDURES: Eleven children with dysarthria due to CP (six girls, five boys; ages 4;11-17;0 years; eight with spastic CP, three with dyskinetic CP) repeated pre-recorded speech stimuli across three speaking conditions (habitual, 'big mouth' and 'strong voice'). Stimuli were sentences and contrastive words in phrases. Acoustic analyses were conducted. A total of 66 Belgian-French listeners transcribed the children's utterances orthographically and rated their ease of understanding on a visual analogue scale at sentence and word levels.

OUTCOMES & RESULTS: Acoustic analyses revealed significantly longer duration in response to the big mouth cue at sentence level and in response to both the big mouth and strong voice cues at word level. Significantly higher vocal sound-pressure levels were found following both cues at sentence and word levels. Both cues elicited significantly higher first-formant vowel frequencies and listeners' greater ease-of-understanding ratings at word level. Increases in the percentage of words transcribed correctly and in sentence ease-of-understanding ratings, however, did not reach statistical significance. Considerable variability between children was observed.

Speech cues targeting greater articulatory excursion and vocal intensity yield significant acoustic changes in French-speaking children with dysarthria. However, the changes may only aid listeners' ease of understanding at word level. The significant findings and great inter-speaker variability are generally consistent with studies on English-speaking children with dysarthria, although changes appear more constrained in these French-speaking children. What this paper adds What is already known on the subject According to the only study comparing effects of speech-cueing strategies on English-speaking children with dysarthria, intelligibility increases when the children are provided with cues aimed to increase articulatory excursion and vocal intensity. Little is known about speech characteristics in French-speaking children with dysarthria and no published research has explored effects of cueing strategies in this population. What this paper adds to existing knowledge This paper is the first study to examine the effects of speech cues on the acoustics and intelligibility of French-speaking children with CP. It provides evidence that the children can make use of cues to modify their speech, although the changes may only aid listeners' ease of understanding at word level. What are the potential or actual clinical implications of this work? For clinicians, the findings suggest that speech cues emphasizing increasing articulatory excursion and vocal intensity show promise for improving the ease of understanding of words produced by francophone children with dysarthria, although improvements may be modest. The variability in the responses also suggests that this population may benefit from a combination of such cues to produce words that are easier to understand.

RevDate: 2020-02-20

Boë LJ, Sawallis TR, Fagot J, et al (2019)

Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science.

Science advances, 5(12):eaaw3916 pii:aaw3916.

Recent articles on primate articulatory abilities are revolutionary regarding speech emergence, a crucial aspect of language evolution, by revealing a human-like system of proto-vowels in nonhuman primates and implicitly throughout our hominid ancestry. This article presents both a schematic history and the state of the art in primate vocalization research and its importance for speech emergence. Recent speech research advances allow more incisive comparison of phylogeny and ontogeny and also an illuminating reinterpretation of vintage primate vocalization data. This review produces three major findings. First, even among primates, laryngeal descent is not uniquely human. Second, laryngeal descent is not required to produce contrasting formant patterns in vocalizations. Third, living nonhuman primates produce vocalizations with contrasting formant patterns. Thus, evidence now overwhelmingly refutes the long-standing laryngeal descent theory, which pushes back "the dawn of speech" beyond ~200 ka ago to over ~20 Ma ago, a difference of two orders of magnitude.

RevDate: 2020-02-10

Kearney E, Nieto-Castañón A, Weerathunge HR, et al (2019)

A Simple 3-Parameter Model for Examining Adaptation in Speech and Voice Production.

Frontiers in psychology, 10:2995.

Sensorimotor adaptation experiments are commonly used to examine motor learning behavior and to uncover information about the underlying control mechanisms of many motor behaviors, including speech production. In the speech and voice domains, aspects of the acoustic signal are shifted/perturbed over time via auditory feedback manipulations. In response, speakers alter their production in the opposite direction of the shift so that their perceived production is closer to what they intended. This process relies on a combination of feedback and feedforward control mechanisms that are difficult to disentangle. The current study describes and tests a simple 3-parameter mathematical model that quantifies the relative contribution of feedback and feedforward control mechanisms to sensorimotor adaptation. The model is a simplified version of the DIVA model, an adaptive neural network model of speech motor control. The three fitting parameters of SimpleDIVA are associated with the three key subsystems involved in speech motor control, namely auditory feedback control, somatosensory feedback control, and feedforward control. The model is tested through computer simulations that identify optimal model fits to six existing sensorimotor adaptation datasets. We show its utility in (1) interpreting the results of adaptation experiments involving the first and second formant frequencies as well as fundamental frequency; (2) assessing the effects of masking noise in adaptation paradigms; (3) fitting more than one perturbation dimension simultaneously; (4) examining sensorimotor adaptation at different timepoints in the production signal; and (5) quantitatively predicting responses in one experiment using parameters derived from another experiment. The model simulations produce excellent fits to real data across different types of perturbations and experimental paradigms (mean correlation between data and model fits across all six studies = 0.95 ± 0.02). The model parameters provide a mechanistic explanation for the behavioral responses to the adaptation paradigm that are not readily available from the behavioral responses alone. Overall, SimpleDIVA offers new insights into speech and voice motor control and has the potential to inform future directions of speech rehabilitation research in disordered populations. Simulation software, including an easy-to-use graphical user interface, is publicly available to facilitate the use of the model in future studies.

RevDate: 2020-02-09

Viegas F, Viegas D, Serra Guimarães G, et al (2020)

Acoustic Analysis of Voice and Speech in Men with Skeletal Class III Malocclusion: A Pilot Study.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000505186 [Epub ahead of print].

OBJECTIVES: To assess the fundamental (f0) and first third formant (F1, F2, F3) frequencies of the 7 oral vowels of Brazilian Portuguese in men with skeletal class III malocclusion and to compare these measures with a control group of individuals with Angle's class I.

METHODS: Sixty men aged 18-40 years, 20 with Angle's class III skeletal malocclusion and 40 with Angle's class I malocclusion were selected by speech therapists and dentists. The speech signals were obtained from sustained vowels, and the values of f0 and frequencies of F1, F2 and F3 were estimated. The differences were verified through Student's t test, and the effect size calculation was performed.

RESULTS: In the class III group, more acute f0 values were observed in all vowels, higher values of F1 in the vowels [a] and [ε] and in F2 in the vowels [a], [e] and [i] and lower F1 and F3 values of the vowel [u].

CONCLUSION: More acute f0 values were found in all vowels investigated in the class III group, which showed a higher laryngeal position in the production of these sounds. The frequencies of the first 3 formants showed punctual differences, with higher values of F1 in the vowels [a] and [ε] and of F2 in [a], [e] and [i], and lower values of F1 and F3 in the vowel [u] in the experimental group. Thus, it is concluded that the fundamental frequency of the voice was the main parameter that differentiated the studied group from the control.

RevDate: 2020-02-02

Kelley MC, BV Tucker (2020)

A comparison of four vowel overlap measures.

The Journal of the Acoustical Society of America, 147(1):137.

Multiple measures of vowel overlap have been proposed that use F1, F2, and duration to calculate the degree of overlap between vowel categories. The present study assesses four of these measures: the spectral overlap assessment metric [SOAM; Wassink (2006). J. Acoust. Soc. Am. 119(4), 2334-2350], the a posteriori probability (APP)-based metric [Morrison (2008). J. Acoust. Soc. Am. 123(1), 37-40], the vowel overlap analysis with convex hulls method [VOACH; Haynes and Taylor, (2014). J. Acoust. Soc. Am. 136(2), 883-891], and the Pillai score as first used for vowel overlap by Hay, Warren, and Drager [(2006). J. Phonetics 34(4), 458-484]. Summaries of the measures are presented, and theoretical critiques of them are performed, concluding that the APP-based metric and Pillai score are theoretically preferable to SOAM and VOACH. The measures are empirically assessed using accuracy and precision criteria with Monte Carlo simulations. The Pillai score demonstrates the best overall performance in these tests. The potential applications of vowel overlap measures to research scenarios are discussed, including comparisons of vowel productions between different social groups, as well as acoustic investigations into vowel formant trajectories.

RevDate: 2020-02-02

Renwick MEL, JA Stanley (2020)

Modeling dynamic trajectories of front vowels in the American South.

The Journal of the Acoustical Society of America, 147(1):579.

Regional variation in American English speech is often described in terms of shifts, indicating which vowel sounds are converging or diverging. In the U.S. South, the Southern vowel shift (SVS) and African American vowel shift (AAVS) affect not only vowels' relative positions but also their formant dynamics. Static characterizations of shifting, with a single pair of first and second formant values taken near vowels' midpoint, fail to capture this vowel-inherent spectral change, which can indicate dialect-specific diphthongization or monophthongization. Vowel-inherent spectral change is directly modeled to investigate how trajectories of front vowels /i eɪ ɪ ɛ/ differ across social groups in the 64-speaker Digital Archive of Southern Speech. Generalized additive mixed models are used to test the effects of two social factors, sex and ethnicity, on trajectory shape. All vowels studied show significant differences between men, women, African American and European American speakers. Results show strong overlap between the trajectories of /eɪ, ɛ/ particularly among European American women, consistent with the SVS, and greater vowel-inherent raising of /ɪ/ among African American speakers, indicating how that lax vowel is affected by the AAVS. Model predictions of duration additionally indicate that across groups, trajectories become more peripheral as vowel duration increases.

RevDate: 2020-02-02

Chung H (2020)

Vowel acoustic characteristics of Southern American English variation in Louisiana.

The Journal of the Acoustical Society of America, 147(1):541.

This study examined acoustic characteristics of vowels produced by speakers from Louisiana, one of the states in the Southern English dialect region. First, how Louisiana vowels differ from or are similar to the reported patterns of Southern dialect were examined. Then, within-dialect differences across regions in Louisiana were examined. Thirty-four female adult monolingual speakers of American English from Louisiana, ranging in age from 18 to 23, produced English monosyllabic words containing 11 vowels /i, ɪ, e, ɛ, æ, ʌ, u, ʊ, o, ɔ, ɑ/. The first two formant frequencies at the midpoint of the vowel nucleus, direction, and amount of formant changes across three different time points (20, 50, and 80%), and vowel duration were compared to previously reported data on Southern vowels. Overall, Louisiana vowels showed patterns consistent with previously reported characteristics of Southern vowels that reflect ongoing changes in the Southern dialect (no evidence of acoustic reversal of tense-lax pairs, more specifically no peripheralization of front vowels). Some dialect-specific patterns were also observed (a relatively lesser degree of formant changes and slightly shorter vowel duration). These patterns were consistent across different regions within Louisiana.

RevDate: 2020-01-20

Maebayashi H, Takiguchi T, S Takada (2019)

Study on the Language Formation Process of Very-Low-Birth-Weight Infants in Infancy Using a Formant Analysis.

The Kobe journal of medical sciences, 65(2):E59-E70.

Expressive language development depends on anatomical factors, such as motor control of the tongue and oral cavity needed for vocalization, as well as cognitive aspects for comprehension and speech. The purpose of this study was to examine the differences in expressive language development between normal-birth-weight (NBW) infants and very-low-birth-weight (VLBW) infants in infancy using a formant analysis. We also examined the presence of differences between infants with a normal development and those with a high risk of autism spectrum disorder who were expected to exist among VLBW infants. The participants were 10 NBW infants and 10 VLBW infants 12-15 months of age whose speech had been recorded at intervals of approximately once every 3 months. The recorded speech signal was analyzed using a formant analysis, and changes due to age were observed. One NBW and 3 VLBW infants failed to pass the screening tests (CBCL and M-CHAT) at 24 months of age. The formant frequencies (F1 and F2) of the three groups of infants (NBW, VLBW and CBCL·M-CHAT non-passing infants) were scatter-plotted by age. For the NBW and VLBW infants, the area of the plot increased with age, but there was no significant expansion of the plot area for the CBCL·M-CHAT non-passing infants. The results showed no significant differences in expressive language development between NBW infants at 24 months old and VLBW infants at the corrected age. However, different language developmental patterns were observed in CBCL·M-CHAT non-passing infants, regardless of birth weight, suggesting the importance of screening by acoustic analyses.

RevDate: 2020-01-16

Hosbach-Cannon CJ, Lowell SY, Colton RH, et al (2020)

Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.

RevDate: 2020-01-15

Seyfarth RM, Cheney DL, Harcourt AH, et al (1994)

The acoustic features of gorilla double grunts and their relation to behavior.

American journal of primatology, 33(1):31-50.

Mountain gorillas (Gorilla gorilla beringei) give double-grunts to one another in a variety of situations, when feeding, resting, moving, or engaged in other kinds of social behavior. Some double-grunts elicit double-grunts in reply whereas others do not. Double-grunts are individually distinctive, and high-ranking animals give double-grunts at higher rates than others. There was no evidence, however, that the probability of eliciting a reply depended upon either the animals' behavior at the time a call was given or the social relationship between caller and respondent. The probability of eliciting a reply could be predicted from a double-grunt's acoustic features. Gorillas apparently produce at least two acoustically different subtypes of double-grunts, each of which conveys different information. Double-grunts with a low second formant (typically < 1600 Hz) are given by animals after a period of silence and frequently elicit vocal replies. Double-grunts with a high second formant (typically > 1600 Hz) are given by animals within 5 s of a call from another individual and rarely elicit replies. © 1994 Wiley-Liss, Inc.

RevDate: 2020-01-15

Souza P, Gallun F, R Wright (2020)

Contributions to Speech-Cue Weighting in Older Adults With Impaired Hearing.

Journal of speech, language, and hearing research : JSLHR [Epub ahead of print].

Purpose In a previous paper (Souza, Wright, Blackburn, Tatman, & Gallun, 2015), we explored the extent to which individuals with sensorineural hearing loss used different cues for speech identification when multiple cues were available. Specifically, some listeners placed the greatest weight on spectral cues (spectral shape and/or formant transition), whereas others relied on the temporal envelope. In the current study, we aimed to determine whether listeners who relied on temporal envelope did so because they were unable to discriminate the formant information at a level sufficient to use it for identification and the extent to which a brief discrimination test could predict cue weighting patterns. Method Participants were 30 older adults with bilateral sensorineural hearing loss. The first task was to label synthetic speech tokens based on the combined percept of temporal envelope rise time and formant transitions. An individual profile was derived from linear discriminant analysis of the identification responses. The second task was to discriminate differences in either temporal envelope rise time or formant transitions. The third task was to discriminate spectrotemporal modulation in a nonspeech stimulus. Results All listeners were able to discriminate temporal envelope rise time at levels sufficient for the identification task. There was wide variability in the ability to discriminate formant transitions, and that ability predicted approximately one third of the variance in the identification task. There was no relationship between performance in the identification task and either amount of hearing loss or ability to discriminate nonspeech spectrotemporal modulation. Conclusions The data suggest that listeners who rely to a greater extent on temporal cues lack the ability to discriminate fine-grained spectral information. The fact that the amount of hearing loss was not associated with the cue profile underscores the need to characterize individual abilities in a more nuanced way than can be captured by the pure-tone audiogram.

RevDate: 2020-01-17

Kamiloğlu RG, Fischer AH, DA Sauter (2020)

Good vibrations: A review of vocal expressions of positive emotions.

Psychonomic bulletin & review pii:10.3758/s13423-019-01701-x [Epub ahead of print].

Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.

RevDate: 2020-01-08

Dubey AK, Prasanna SRM, S Dandapat (2019)

Detection and assessment of hypernasality in repaired cleft palate speech using vocal tract and residual features.

The Journal of the Acoustical Society of America, 146(6):4211.

The presence of hypernasality in repaired cleft palate (CP) speech is a consequence of velopharyngeal insufficiency. The coupling of the nasal tract with the oral tract adds nasal formant and antiformant pairs in the hypernasal speech spectrum. This addition deviates the spectral and linear prediction (LP) residual characteristics of hypernasal speech compared to normal speech. In this work, the vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection. The first feature captures the lower-frequencies prominence in speech due to the presence of nasal formants, the second feature captures the undesirable signal components in the residual signal due to the nasal antiformants, and the third feature captures the information about formants and antiformants in the spectrum along with the spectral envelope. The combination of three features gives normal versus hypernasal speech detection accuracies of 87.76%, 91.13%, and 93.70% for /a/, /i/, and /u/ vowels, respectively, and hypernasality severity detection accuracies of 80.13% and 81.25% for /i/ and /u/ vowels, respectively. The speech data are collected from 30 control normal and 30 repaired CP children between the ages of 7 and 12.

RevDate: 2019-12-31

Shiraishi M, Mishima K, H Umeda (2019)

Development of an Acoustic Simulation Method during Phonation of the Japanese Vowel /a/ by the Boundary Element Method.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30445-X [Epub ahead of print].

OBJECTIVES: The purpose of the present study was to establish the method for an acoustic simulation of a vocal tract created from CT data during phonation of the Japanese vowel /a/ and to verify the validity of the simulation.

MATERIAL AND METHODS: The subjects were 15 healthy adults (8 males, 7 females). The vocal tract model was created from CT data acquired during sustained phonation of the Japanese vowel /a/. After conversion to a mesh model for analysis, a wave acoustic analysis was performed with a boundary element method. The wall and the bottom of the vocal tract model were regarded as a rigid wall and a nonrigid wall, respectively. The acoustic medium was set to 37°C, and a point sound source was set in the place corresponding to the vocal cord as a sound source. The first and second formant frequencies (F1 and F2) were calculated. For 1 of the 15 subjects, the range from the upper end of the frontal sinus to the tracheal bifurcation was scanned, and 2 models were created: model 1 included the range from the frontal sinus to the tracheal bifurcation; and model 2 included the range from the frontal sinus to the glottis and added a virtually extended trachea by 12 cm cylindrically. F1 and F2 calculated from models 1 and 2 were compared. To evaluate the validity of the present simulation, F1 and F2 calculated from the simulation were compared with those of the actual voice and the sound generated using a solid model and a whistle-type artificial larynx. To judge the validity, the vowel formant frequency discrimination threshold reported in the past was used as a criterion. Namely, the relative discrimination thresholds (%), dividing ▵F by F, where F was the formant frequency calculated from the simulation, and ▵F was the difference between F and the formant frequency of the actual voice and the sound generated using the solid model and artificial larynx, were obtained.

RESULTS: F1 and F2 calculated from models 1 and 2 were similar. Therefore, to reduce the exposure dose, the remaining 14 subjects were scanned from the upper end of the frontal sinus to the glottis, and model 2 with the trachea extended by 12 cm virtually was used for the simulation. The averages of the relative discrimination thresholds against F1 and F2 calculated from the actual voice were 5.9% and 4.6%, respectively. The averages of the relative discrimination thresholds against F1 and F2 calculated from the sound generated by using the solid model and the artificial larynx were 4.1% and 3.7%, respectively.

CONCLUSIONS: The Japanese vowel /a/ could be simulated with high validity for the vocal tract models created from the CT data during phonation of /a/ using the boundary element method.

RevDate: 2019-12-31

Huang MY, Duan RY, Q Zhao (2019)

The influence of long-term cadmium exposure on the male advertisement call of Xenopus laevis.

Environmental science and pollution research international pii:10.1007/s11356-019-07525-5 [Epub ahead of print].

Cadmium (Cd) is a non-essential environmental endocrine-disrupting compound found in water and a potential threat to aquatic habitats. Cd has been shown to have various short-term effects on aquatic animals; however, evidence for long-term effects of Cd on vocal communications in amphibians is lacking. To better understand the long-term effects of low-dose Cd on acoustic communication in amphibians, male Xenopus laevis individuals were treated with low Cd concentrations (0.1, 1, and 10 μg/L) via aqueous exposure for 24 months. At the end of the exposure, the acoustic spectrum characteristics of male advertisement calls and male movement behaviors in response to female calls were recorded. The gene and protein expressions of the androgen receptor (AR) were determined using Western blot and RT-PCR. The results showed that long-term Cd treatment affected the spectrogram and formant of the advertisement call. Compared with the control group, 10 μg/L Cd significantly decreased the first and second formant frequency, and the fundamental and main frequency, and increased the third formant frequency. One and 10-μg/L Cd treatments significantly reduced the proportion of individuals responding to female calls and prolonged the time of first movement of the male. Long-term Cd treatment induced a downregulation in the AR protein. Treatments of 0.1, 1, and 10 μg/L Cd significantly decreased the expression of AR mRNA in the brain. These findings indicate that long-term exposure of Cd has negative effects on advertisement calls in male X. laevis.

RevDate: 2020-01-13

Park EJ, Yoo SD, Kim HS, et al (2019)

Correlations between swallowing function and acoustic vowel space in stroke patients with dysarthria.

NeuroRehabilitation, 45(4):463-469.

BACKGROUND: Dysphagia and dysarthria tend to coexist in stroke patients. Dysphagia can reduce patients' quality of life, cause aspiration pneumonia and increased mortality.

OBJECTIVE: To evaluate correlations among swallowing function parameters and acoustic vowel space values in patients with stroke.

METHODS: Data from stroke patients with dysarthria and dysphagia were collected. The formant parameter representing the resonance frequency of the vocal tract as a two-dimensional coordinate point was measured for the /a/, /ae/, /i/, and /u/vowels, and the quadrilateral vowel space area (VSA) and formant centralization ratio (FCR) were measured. Swallowing function was evaluated by a videofluoroscopic swallowing study (VFSS) using the videofluoroscopic dysphagia scale (VDS) and penetration aspiration scale (PAS). Pearson's correlation and linear regression analyses were used to assess the correlation of VSA and FCR to VDS and PAS scores.

RESULTS: Thirty-one stroke patients with dysphagia and dysarthria were analyzed. VSA showed a negative correlation to VDS and PAS scores, while FCR showed a positive correlation to VDS score, but not to PAS score. VSA and FCR were significant factors for assessing dysphagia severity.

CONCLUSIONS: VSA and FCR values were correlated with swallowing function and may be helpful in predicting dysphagia severity associated with stroke.

RevDate: 2020-01-08

McCarthy KM, Skoruppa K, P Iverson (2019)

Development of neural perceptual vowel spaces during the first year of life.

Scientific reports, 9(1):19592.

This study measured infants' neural responses for spectral changes between all pairs of a set of English vowels. In contrast to previous methods that only allow for the assessment of a few phonetic contrasts, we present a new method that allows us to assess changes in spectral sensitivity across the entire vowel space and create two-dimensional perceptual maps of the infants' vowel development. Infants aged four to eleven months were played long series of concatenated vowels, and the neural response to each vowel change was assessed using the Acoustic Change Complex (ACC) from EEG recordings. The results demonstrated that the youngest infants' responses more closely reflected the acoustic differences between the vowel pairs and reflected higher weight to first-formant variation. Older infants had less acoustically driven responses that seemed a result of selective increases in sensitivity for phonetically similar vowels. The results suggest that phonetic development may involve a perceptual warping for confusable vowels rather than uniform learning, as well as an overall increasing sensitivity to higher-frequency acoustic information.

RevDate: 2019-12-18

Houle N, SV Levi (2019)

Effect of Phonation on Perception of Femininity/Masculinity in Transgender and Cisgender Speakers.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30302-9 [Epub ahead of print].

Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker's voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.

RevDate: 2019-12-18

Xu Y, S Prom-On (2019)

Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics.

Frontiers in psychology, 10:2469.

Economy of effort, a popular notion in contemporary speech research, predicts that dynamic extremes such as the maximum speed of articulatory movement are avoided as much as possible and that approaching the dynamic extremes is necessary only when there is a need to enhance linguistic contrast, as in the case of stress or clear speech. Empirical data, however, do not always support these predictions. In the present study, we considered an alternative principle: maximum rate of information, which assumes that speech dynamics are ultimately driven by the pressure to transmit information as quickly and accurately as possible. For empirical data, we asked speakers of American English to produce repetitive syllable sequences such as wawawawawa as fast as possible by imitating recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. Analysis of formant trajectories shows that dynamic extremes in meaningful speech sometimes even exceeded those in the nonsense syllable sequences but that this happened more often in unstressed syllables than in stressed syllables. We then used a target approximation model based on a mass-spring system of varying orders to simulate the formant kinematics. The results show that the kind of formant kinematics found in the present study and in previous studies can only be generated by a dynamical system operating with maximal muscular force under strong time pressure and that the dynamics of this operation may hold the solution to the long-standing enigma of greater stiffness in unstressed than in stressed syllables. We conclude, therefore, that maximum rate of information can coherently explain both current and previous empirical data and could therefore be a fundamental principle of motor control in speech production.

RevDate: 2020-01-01
CmpDate: 2019-12-12

Root-Gutteridge H, Ratcliffe VF, Korzeniowska AT, et al (2019)

Dogs perceive and spontaneously normalize formant-related speaker and vowel differences in human speech sounds.

Biology letters, 15(12):20190555.

Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation-dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.

RevDate: 2020-01-08

Vorperian HK, Kent RD, Lee Y, et al (2019)

Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies.

The Journal of the Acoustical Society of America, 146(5):3255.

The purpose of this study was to determine the developmental trajectory of the four corner vowels' fundamental frequency (fo) and the first four formant frequencies (F1-F4), and to assess when speaker-sex differences emerge. Five words per vowel, two of which were produced twice, were analyzed for fo and estimates of the first four formants frequencies from 190 (97 female, 93 male) typically developing speakers ages 4-20 years old. Findings revealed developmental trajectories with decreasing values of fo and formant frequencies. Sex differences in fo emerged at age 7. The decrease of fo was larger in males than females with a marked drop during puberty. Sex differences in formant frequencies appeared at the earliest age under study and varied with vowel and formant. Generally, the higher formants (F3-F4) were sensitive to sex differences. Inter- and intra-speaker variability declined with age but had somewhat different patterns, likely reflective of maturing motor control that interacts with the changing anatomy. This study reports a source of developmental normative data on fo and the first four formants in both sexes. The different developmental patterns in the first four formants and vowel-formant interactions in sex differences likely point to anatomic factors, although speech-learning phenomena cannot be discounted.

RevDate: 2020-01-12

Gianakas SP, MB Winn (2019)

Lexical bias in word recognition by cochlear implant listeners.

The Journal of the Acoustical Society of America, 146(5):3373.

When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.

RevDate: 2020-01-08

Perrachione TK, Furbeck KT, EJ Thurston (2019)

Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices.

The Journal of the Acoustical Society of America, 146(5):3384.

The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.

RevDate: 2019-12-02

Lo JJH (2019)

Between Äh(m) and Euh(m): The Distribution and Realization of Filled Pauses in the Speech of German-French Simultaneous Bilinguals.

Language and speech [Epub ahead of print].

Filled pauses are well known for their speaker specificity, yet cross-linguistic research has also shown language-specific trends in their distribution and phonetic quality. To examine the extent to which speakers acquire filled pauses as language- or speaker-specific phenomena, this study investigates the use of filled pauses in the context of adult simultaneous bilinguals. Making use of both distributional and acoustic data, this study analyzed UH, consisting of only a vowel component, and UM, with a vowel followed by [m], in the speech of 15 female speakers who were simultaneously bilingual in French and German. Speakers were found to use UM more frequently in German than in French, but only German-dominant speakers had a preference for UM in German. Formant and durational analyses showed that while speakers maintained distinct vowel qualities in their filled pauses in different languages, filled pauses in their weaker language exhibited a shift towards those in their dominant language. These results suggest that, despite high levels of variability between speakers, there is a significant role for language in the acquisition of filled pauses in simultaneous bilingual speakers, which is further shaped by the linguistic environment they grow up in.

RevDate: 2020-01-08

Frey R, Volodin IA, Volodina EV, et al (2019)

Savannah roars: The vocal anatomy and the impressive rutting calls of male impala (Aepyceros melampus) - highlighting the acoustic correlates of a mobile larynx.

Journal of anatomy [Epub ahead of print].

A retractable larynx and adaptations of the vocal folds in the males of several polygynous ruminants serve for the production of rutting calls that acoustically announce larger than actual body size to both rival males and potential female mates. Here, such features of the vocal tract and of the sound source are documented in another species. We investigated the vocal anatomy and laryngeal mobility including its acoustical effects during the rutting vocal display of free-ranging male impala (Aepyceros melampus melampus) in Namibia. Male impala produced bouts of rutting calls (consisting of oral roars and interspersed explosive nasal snorts) in a low-stretch posture while guarding a rutting territory or harem. For the duration of the roars, male impala retracted the larynx from its high resting position to a low mid-neck position involving an extensible pharynx and a resilient connection between the hyoid apparatus and the larynx. Maximal larynx retraction was 108 mm based on estimates in video single frames. This was in good concordance with 91-mm vocal tract elongation calculated on the basis of differences in formant dispersion between roar portions produced with the larynx still ascended and those produced with maximally retracted larynx. Judged by their morphological traits, the larynx-retracting muscles of male impala are homologous to those of other larynx-retracting ruminants. In contrast, the large and massive vocal keels are evolutionary novelties arising by fusion and linear arrangement of the arytenoid cartilage and the canonical vocal fold. These bulky and histologically complex vocal keels produced a low fundamental frequency of 50 Hz. Impala is another ruminant species in which the males are capable of larynx retraction. In addition, male impala vocal folds are spectacularly specialized compared with domestic bovids, allowing the production of impressive, low-frequency roaring vocalizations as a significant part of their rutting behaviour. Our study expands knowledge on the evolutionary variation of vocal fold morphology in mammals, suggesting that the structure of the mammalian sound source is not always human-like and should be considered in acoustic analysis and modelling.

RevDate: 2019-11-23

Hu G, Determan SC, Dong Y, et al (2019)

Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise.

Journal of the Association for Research in Otolaryngology : JARO pii:10.1007/s10162-019-00737-z [Epub ahead of print].

Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.

RevDate: 2019-12-18

Hu W, Tao S, Li M, et al (2019)

Distinctiveness and Assimilation in Vowel Perception in a Second Language.

Journal of speech, language, and hearing research : JSLHR, 62(12):4534-4543.

Purpose The purpose of this study was to investigate how the distinctive establishment of 2nd language (L2) vowel categories (e.g., how distinctively an L2 vowel is established from nearby L2 vowels and from the native language counterpart in the 1st formant [F1] × 2nd formant [F2] vowel space) affected L2 vowel perception. Method Identification of 12 natural English monophthongs, and categorization and rating of synthetic English vowels /i/ and /ɪ/ in the F1 × F2 space were measured for Chinese-native (CN) and English-native (EN) listeners. CN listeners were also examined with categorization and rating of Chinese vowels in the F1 × F2 space. Results As expected, EN listeners significantly outperformed CN listeners in English vowel identification. Whereas EN listeners showed distinctive establishment of 2 English vowels, CN listeners had multiple patterns of L2 vowel establishment: both, 1, or neither established. Moreover, CN listeners' English vowel perception was significantly related to the perceptual distance between the English vowel and its Chinese counterpart, and the perceptual distance between the adjacent English vowels. Conclusions L2 vowel perception relied on listeners' capacity to distinctively establish L2 vowel categories that were distant from the nearby L2 vowels.

RevDate: 2019-12-18

Mollaei F, Shiller DM, Baum SR, et al (2019)

The Relationship Between Speech Perceptual Discrimination and Speech Production in Parkinson's Disease.

Journal of speech, language, and hearing research : JSLHR, 62(12):4256-4268.

Purpose We recently demonstrated that individuals with Parkinson's disease (PD) respond differentially to specific altered auditory feedback parameters during speech production. Participants with PD respond more robustly to pitch and less robustly to formant manipulations compared to control participants. In this study, we investigated whether differences in perceptual processing may in part underlie these compensatory differences in speech production. Methods Pitch and formant feedback manipulations were presented under 2 conditions: production and listening. In the production condition, 15 participants with PD and 15 age- and gender-matched healthy control participants judged whether their own speech output was manipulated in real time. During the listening task, participants judged whether paired tokens of their previously recorded speech samples were the same or different. Results Under listening, 1st formant manipulation discrimination was significantly reduced for the PD group compared to the control group. There was a trend toward better discrimination of pitch in the PD group, but the group difference was not significant. Under the production condition, the ability of participants with PD to identify pitch manipulations was greater than that of the controls. Conclusion The findings suggest perceptual processing differences associated with acoustic parameters of fundamental frequency and 1st formant perturbations in PD. These findings extend our previous results, indicating that different patterns of compensation to pitch and 1st formant shifts may reflect a combination of sensory and motor mechanisms that are differentially influenced by basal ganglia dysfunction.

RevDate: 2019-11-27

Escudero P, M Kalashnikova (2020)

Infants use phonetic detail in speech perception and word learning when detail is easy to perceive.

Journal of experimental child psychology, 190:104714.

Infants successfully discriminate speech sound contrasts that belong to their native language's phonemic inventory in auditory-only paradigms, but they encounter difficulties in distinguishing the same contrasts in the context of word learning. These difficulties are usually attributed to the fact that infants' attention to the phonetic detail in novel words is attenuated when they must allocate additional cognitive resources demanded by word-learning tasks. The current study investigated 15-month-old infants' ability to distinguish novel words that differ by a single vowel in an auditory discrimination paradigm (Experiment 1) and a word-learning paradigm (Experiment 2). These experiments aimed to tease apart whether infants' performance is dependent solely on the specific acoustic properties of the target vowels or on the context of the task. Experiment 1 showed that infants were able to discriminate only a contrast marked by a large difference along a static dimension (the vowels' second formant), whereas they were not able to discriminate a contrast with a small phonetic distance between its vowels, due to the dynamic nature of the vowels. In Experiment 2, infants did not succeed at learning words containing the same contrast they were able to discriminate in Experiment 1. The current findings demonstrate that both the specific acoustic properties of vowels in infants' native language and the task presented continue to play a significant role in early speech perception well into the second year of life.

RevDate: 2019-12-30

Rosenthal MA (2020)

A systematic review of the voice-tagging hypothesis of speech-in-noise perception.

Neuropsychologia, 136:107256.

The voice-tagging hypothesis claims that individuals who better represent pitch information in a speaker's voice, as measured with the frequency following response (FFR), will be better at speech-in-noise perception. The hypothesis has been provided to explain how music training might improve speech-in-noise perception. This paper reviews studies that are relevant to the voice-tagging hypothesis, including studies on musicians and nonmusicians. Most studies on musicians show greater f0 amplitude compared to controls. Most studies on nonmusicians do not show group differences in f0 amplitude. Across all studies reviewed, f0 amplitude does not consistently predict accuracy in speech-in-noise perception. The evidence suggests that music training does not improve speech-in-noise perception via enhanced subcortical representation of the f0.

RevDate: 2019-11-11

Hakanpää T, Waaramaa T, AM Laukkanen (2019)

Comparing Contemporary Commercial and Classical Styles: Emotion Expression in Singing.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30209-7 [Epub ahead of print].

OBJECTIVE: This study examines the acoustic correlates of the vocal expression of emotions in contemporary commercial music (CCM) and classical styles of singing. This information may be useful in improving the training of interpretation in singing.

STUDY DESIGN: This is an experimental comparative study.

METHODS: Eleven female singers with a minimum of 3 years of professional-level singing training in CCM, classical, or both styles participated. They sang the vowel [ɑ:] at three pitches (A3 220Hz, E4 330Hz, and A4 440Hz) expressing anger, sadness, joy, tenderness, and a neutral voice. Vowel samples were analyzed for fundamental frequency (fo) formant frequencies (F1-F5), sound pressure level (SPL), spectral structure (alpha ratio = SPL 1500-5000 Hz-SPL 50-1500 Hz), harmonics-to-noise ratio (HNR), perturbation (jitter, shimmer), onset and offset duration, sustain time, rate and extent of fo variation in vibrato, and rate and extent of amplitude vibrato.

RESULTS: The parameters that were statistically significantly (RM-ANOVA, P ≤ 0.05) related to emotion expression in both genres were SPL, alpha ratio, F1, and HNR. Additionally, for CCM, significance was found in sustain time, jitter, shimmer, F2, and F4. When fo and SPL were set as covariates in the variance analysis, jitter, HNR, and F4 did not show pure dependence on expression. The alpha ratio, F1, F2, shimmer apq5, amplitude vibrato rate, and sustain time of vocalizations had emotion-related variation also independent of fo and SPL in the CCM style, while these parameters were related to fo and SPL in the classical style.

CONCLUSIONS: The results differed somewhat for the CCM and classical styles. The alpha ratio showed less variation in the classical style, most likely reflecting the demand for a more stable voice source quality. The alpha ratio, F1, F2, shimmer, amplitude vibrato rate, and the sustain time of the vocalizations were related to fo and SPL control in the classical style. The only common independent sound parameter indicating emotional expression for both styles was SPL. The CCM style offers more freedom for expression-related changes in voice quality.

RevDate: 2019-11-22

Weirich M, A Simpson (2019)

Effects of Gender, Parental Role, and Time on Infant- and Adult-Directed Read and Spontaneous Speech.

Journal of speech, language, and hearing research : JSLHR, 62(11):4001-4014.

Purpose The study sets out to investigate inter- and intraspeaker variation in German infant-directed speech (IDS) and considers the potential impact that the factors gender, parental involvement, and speech material (read vs. spontaneous speech) may have. In addition, we analyze data from 3 time points prior to and after the birth of the child to examine potential changes in the features of IDS and, particularly also, of adult-directed speech (ADS). Here, the gender identity of a speaker is considered as an additional factor. Method IDS and ADS data from 34 participants (15 mothers, 19 fathers) is gathered by means of a reading and a picture description task. For IDS, 2 recordings were made when the baby was approximately 6 and 9 months old, respectively. For ADS, an additional recording was made before the baby was born. Phonetic analyses comprise mean fundamental frequency (f0), variation in f0, the 1st 2 formants measured in /i: ɛ a u:/, and the vowel space size. Moreover, social and behavioral data were gathered regarding parental involvement and gender identity. Results German IDS is characterized by an increase in mean f0, a larger variation in f0, vowel- and formant-specific differences, and a larger acoustic vowel space. No effect of gender or parental involvement was found. Also, the phonetic features of IDS were found in both spontaneous and read speech. Regarding ADS, changes in vowel space size in some of the fathers and in mean f0 in mothers were found. Conclusion Phonetic features of German IDS are robust with respect to the factors gender, parental involvement, speech material (read vs. spontaneous speech), and time. Some phonetic features of ADS changed within the child's first year depending on gender and parental involvement/gender identity. Thus, further research on IDS needs to address also potential changes in ADS.

RevDate: 2020-01-08

Howson PJ, MA Redford (2019)

Liquid coarticulation in child and adult speech.

Proceedings of the ... International Congress of Phonetic Sciences. International Congress of Phonetic Sciences, 2019:3100-3104.

Although liquids are mastered late, English-speaking children are said to have fully acquired these segments by age 8. The aim of this study was to test whether liquid coarticulation was also adult-like by this age. 8-year-old productions of /əLa/ and /əLu/ sequences were compared to 5-year-old and adult productions of these sequences. SSANOVA analyses of formant frequency trajectories indicated that, while adults contrasted rhotics and laterals from the onset of the vocalic sequence, F2 trajectories for rhotics and lateral were overlapped at the onset of the /əLa/ sequence in 8-year-old productions and across the entire /əLu/ sequence. The F2 trajectories for rhotics and laterals were even more overlapped in 5-year olds' productions. Overall, the study suggests that whereas younger children have difficulty coordinating the tongue body/root gesture with the tongue tip gesture, older children still struggle with the intergestural timing associated with liquid production.

RevDate: 2019-10-29

Kim D, S Kim (2019)

Coarticulatory vowel nasalization in American English: Data of individual differences in acoustic realization of vowel nasalization as a function of prosodic prominence and boundary.

Data in brief, 27:104593 pii:104593.

This article provides acoustic measurements data for vowel nasalization which are based on speech recorded from fifteen (8 female and 7 male) native speakers of American English in a laboratory setting. Each individual speaker's production patterns for the vowel nasalization in tautosyllabic CVN and NVC words are documented in terms of three acoustic parameters: the duration of nasal consonant (N-Duration), the duration of vowel (V-Duration) and the difference between the amplitude of the first formant (A1) and the first nasal peak (P0) obtained from the vowel (A1-P0) as an indication of the degree of vowel nasalization. The A1-P0 is measured at three different time points within the vowel -i.e., the near point (25%), midpoint (50%), and distant point (75%), either from the onset (CVN) or the offset (NVC) of the nasal consonant. These measures are taken from the target words in various prosodic prominence and boundary contexts: phonologically focused (PhonFOC) vs. lexically focused (LexFOC) vs. unfocused (NoFOC) conditions; phrase-edge (i.e., phrase-final for CVN and phrase-initial for NVC) vs. phrase-medial conditions. The data also contain a CSV file with each speaker's mean values of the N-Duration, V-Duration, and A1-P0 (z-scored) for each prosodic context along with the information about the speakers' gender. For further discussion of the data, please refer to the full-length article entitled "Prosodically-conditioned fine-tuning of coarticulatory vowel nasalization in English"(Cho et al., 2017).

RevDate: 2019-10-29

Goswami U, Nirmala SR, Vikram CM, et al (2019)

Analysis of Articulation Errors in Dysarthric Speech.

Journal of psycholinguistic research pii:10.1007/s10936-019-09676-5 [Epub ahead of print].

Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.

RevDate: 2020-01-02

Cartei V, Banerjee R, Garnham A, et al (2019)

Physiological and perceptual correlates of masculinity in children's voices.

Hormones and behavior, 117:104616 pii:S0018-506X(19)30277-6 [Epub ahead of print].

Low frequency components (i.e. a low pitch (F0) and low formant spacing (ΔF)) signal high salivary testosterone and height in adult male voices and are associated with high masculinity attributions by unfamiliar listeners (in both men and women). However, the relation between the physiological, acoustic and perceptual dimensions of speakers' masculinity prior to puberty remains unknown. In this study, 110 pre-pubertal children (58 girls), aged 3 to 10, were recorded as they described a cartoon picture. 315 adults (182 women) rated children's perceived masculinity from the voice only after listening to the speakers' audio recordings. On the basis of their voices alone, boys who had higher salivary testosterone levels were rated as more masculine and the relation between testosterone and perceived masculinity was partially mediated by F0. The voices of taller boys were also rated as more masculine, but the relation between height and perceived masculinity was not mediated by the considered acoustic parameters, indicating that acoustic cues other than F0 and ΔF may signal stature. Both boys and girls who had lower F0, were also rated as more masculine, while ΔF did not affect ratings. These findings highlight the interdependence of physiological, acoustic and perceptual dimensions, and suggest that inter-individual variation in male voices, particularly F0, may advertise hormonal masculinity from a very early age.

RevDate: 2019-10-17

Scheerer NE, Jacobson DS, JA Jones (2019)

Sensorimotor control of vocal production in early childhood.

Journal of experimental psychology. General pii:2019-62257-001 [Epub ahead of print].

Children maintain fluent speech despite dramatic changes to their articulators during development. Auditory feedback aids in the acquisition and maintenance of the sensorimotor mechanisms that underlie vocal motor control. MacDonald, Johnson, Forsythe, Plante, and Munhall (2012) reported that toddlers' speech motor control systems may "suppress" the influence of auditory feedback, since exposure to altered auditory feedback regarding their formant frequencies did not lead to modifications of their speech. This finding is not parsimonious with most theories of motor control. Here, we exposed toddlers to perturbations to the pitch of their auditory feedback as they vocalized. Toddlers compensated for the manipulations, producing significantly different responses to upward and downward perturbations. These data represent the first empirical demonstration that toddlers use auditory feedback for vocal motor control. Furthermore, our findings suggest toddlers are more sensitive to changes to the postural properties of their auditory feedback, such as fundamental frequency, relative to the phonemic properties, such as formant frequencies. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

RevDate: 2019-10-08

Conklin JT, O Dmitrieva (2019)

Vowel-to-Vowel Coarticulation in Spanish Nonwords.

Phonetica pii:000502890 [Epub ahead of print].

The present study examined vowel-to-vowel (VV) coarticulation in backness affecting mid vowels /e/ and /o/ in 36 Spanish nonwords produced by 20 native speakers of Spanish, aged 19-50 years (mean = 30.7; SD = 8.2). Examination of second formant frequency showed substantial carryover coarticulation throughout the data set, while anticipatory coarticulation was minimal and of shorter duration. Furthermore, the effect of stress on vowel-to-vowel coarticulation was investigated and found to vary by direction. In the anticipatory direction, small coarticulatory changes were relatively stable regardless of stress, particularly for target /e/, while in the carryover direction, a hierarchy of stress emerged wherein the greatest coarticulation occurred between stressed triggers and unstressed targets, less coarticulation was observed between unstressed triggers and unstressed targets, and the least coarticulation occurred between unstressed triggers with stressed targets. The results of the study augment and refine previously available knowledge about vowel-to-vowel coarticulation in Spanish and expand cross-linguistic understanding of the effect of stress on the magnitude and direction of vowel-to-vowel coarticulation.

RevDate: 2019-12-20

Lee Y, Keating P, J Kreiman (2019)

Acoustic voice variation within and between speakers.

The Journal of the Acoustical Society of America, 146(3):1568.

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

RevDate: 2020-01-08

Balaguer M, Pommée T, Farinas J, et al (2020)

Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review.

Head & neck, 42(1):111-130.

BACKGROUND: The development of automatic tools based on acoustic analysis allows to overcome the limitations of perceptual assessment for patients with head and neck cancer. The aim of this study is to provide a systematic review of literature describing the effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis.

METHODS: Two databases (PubMed and Embase) were surveyed. The selection process, according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement, led to a final set of 22 articles.

RESULTS: Nasalance is studied mainly in oropharyngeal patients. The vowels are mostly studied using formant analysis and vowel space area, the consonants by means of spectral moments with specific parameters according to their phonetic characteristic. Machine learning methods allow classifying "intelligible" or "unintelligible" speech for T3 or T4 tumors.

CONCLUSIONS: The development of comprehensive models combining different acoustic measures would allow a better consideration of the functional impact of the speech disorder.

RevDate: 2019-09-23

Zeng Q, Jiao Y, Huang X, et al (2019)

Effects of Angle of Epiglottis on Aerodynamic and Acoustic Parameters in Excised Canine Larynges.

Journal of voice : official journal of the Voice Foundation, 33(5):627-633.

OBJECTIVES: The aim of this study is to explore the effects of the angle of epiglottis (Aepi) on phonation and resonance in excised canine larynges.

METHODS: The anatomic Aepi was measured for 14 excised canine larynges as a control. Then, the Aepis were manually adjusted to 60° and 90° in each larynx. Aerodynamic and acoustic parameters, including mean flow rate, sound pressure level, jitter, shimmer, fundamental frequency (F0), and formants (F1'-F4'), were measured with a subglottal pressure of 1.5 kPa. Simple linear regression analysis between acoustic and aerodynamic parameters and the Aepi of the control was performed, and an analysis of variance comparing the acoustic and aerodynamic parameters of the three treatments was carried out.

RESULTS: The results of the study are as follows: (1) the larynges with larger anatomic Aepi had significantly lower jitter, shimmer, formant 1, and formant 2; (2) phonation threshold flow was significantly different for the three treatments; and (3) mean flow rate and sound pressure level were significantly different between the 60° and the 90° treatments of the 14 larynges.

CONCLUSIONS: The Aepi was proposed for the first time in this study. The Aepi plays an important role in phonation and resonance of excised canine larynges.

RevDate: 2019-09-18

Dmitrieva O, I Dutta (2019)

Acoustic Correlates of the Four-Way Laryngeal Contrast in Marathi.

Phonetica pii:000501673 [Epub ahead of print].

The study examines acoustic correlates of the four-way laryngeal contrast in Marathi, focusing on temporal parameters, voice quality, and onset f0. Acoustic correlates of the laryngeal contrast were investigated in the speech of 33 native speakers of Marathi, recorded in Mumbai, India, producing a word list containing six sets of words minimally contrastive in terms of laryngeal specification of word-initial velar stops. Measurements were made for the duration of prevoicing, release, and voicing during release. Fundamental frequency was measured at the onset of voicing following the stop and at 10 additional time points. As measures of voice quality, amplitude differences between the first and second harmonic (H1-H2) and between the first harmonic and the third formant (H1-A3) were calculated. The results demonstrated that laryngeal categories in Marathi are differentiated based on temporal measures, voice quality, and onset f0, although differences in each dimension were unequal in magnitude across different pairs of stop categories. We conclude that a single acoustic correlate, such as voice onset time, is insufficient to differentiate among all the laryngeal categories in languages such as Marathi, characterized by complex four-way laryngeal contrasts. Instead, a joint contribution of several acoustic correlates creates a robust multidimensional contrast.

RevDate: 2019-09-20

Guan J, C Liu (2019)

Speech Perception in Noise With Formant Enhancement for Older Listeners.

Journal of speech, language, and hearing research : JSLHR, 62(9):3290-3301.

Purpose Degraded speech intelligibility in background noise is a common complaint of listeners with hearing loss. The purpose of the current study is to explore whether 2nd formant (F2) enhancement improves speech perception in noise for older listeners with hearing impairment (HI) and normal hearing (NH). Method Target words (e.g., color and digit) were selected and presented based on the paradigm of the coordinate response measure corpus. Speech recognition thresholds with original and F2-enhanced speech in 2- and 6-talker babble were examined for older listeners with NH and HI. Results The thresholds for both the NH and HI groups improved for enhanced speech signals primarily in 2-talker babble, but not in 6-talker babble. The F2 enhancement benefits did not correlate significantly with listeners' age and their average hearing thresholds in most listening conditions. However, speech intelligibility index values increased significantly with F2 enhancement in babble for listeners with HI, but not for NH listeners. Conclusions Speech sounds with F2 enhancement may improve listeners' speech perception in 2-talker babble, possibly due to a greater amount of speech information available in temporally modulated noise or a better capacity to separate speech signals from background babble.

RevDate: 2019-09-01

Klein E, Brunner J, P Hoole (2019)

The influence of coarticulatory and phonemic relations on individual compensatory formant production.

The Journal of the Acoustical Society of America, 146(2):1265.

Previous auditory perturbation studies have shown that speakers are able to simultaneously use multiple compensatory strategies to produce a certain acoustic target. In the case of formant perturbation, these findings were obtained examining the compensatory production for low vowels /ɛ/ and /æ/. This raises some controversy as more recent research suggests that the contribution of the somatosensory feedback to the production of vowels might differ across phonemes. In particular, the compensatory magnitude to auditory perturbations is expected to be weaker for high vowels compared to low vowels since the former are characterized by larger linguopalatal contact. To investigate this hypothesis, this paper conducted a bidirectional auditory perturbation study in which F2 of the high central vowel /ɨ/ was perturbed in opposing directions depending on the preceding consonant (alveolar vs velar). The consonants were chosen such that speakers' usual coarticulatory patterns were either compatible or incompatible with the required compensatory strategy. The results demonstrate that speakers were able to compensate for applied perturbations even if speakers' compensatory movements resulted in unusual coarticulatory configurations. However, the results also suggest that individual compensatory patterns were influenced by additional perceptual factors attributable to the phonemic space surrounding the target vowel /ɨ/.

RevDate: 2019-09-01

Migimatsu K, IT Tokuda (2019)

Experimental study on nonlinear source-filter interaction using synthetic vocal fold models.

The Journal of the Acoustical Society of America, 146(2):983.

Under certain conditions, e.g., singing voice, the fundamental frequency of the vocal folds can go up and interfere with the formant frequencies. Acoustic feedback from the vocal tract filter to the vocal fold source then becomes strong and non-negligible. An experimental study was presented on such source-filter interaction using three types of synthetic vocal fold models. Asymmetry was also created between the left and right vocal folds. The experiment reproduced various nonlinear phenomena, such as frequency jump and quenching, as reported in humans. Increase in phonation threshold pressure was also observed when resonant frequency of the vocal tract and fundamental frequency of the vocal folds crossed each other. As a combined effect, the phonation threshold pressure was further increased by the left-right asymmetry. Simulation of the asymmetric two-mass model reproduced the experiments to some extent. One of the intriguing findings of this study is the variable strength of the source-filter interaction over different model types. Among the three models, two models were strongly influenced by the vocal tract, while no clear effect of the vocal tract was observed in the other model. This implies that the level of source-filter interaction may vary considerably from one subject to another in humans.

RevDate: 2019-11-19

Max L, A Daliri (2019)

Limited Pre-Speech Auditory Modulation in Individuals Who Stutter: Data and Hypotheses.

Journal of speech, language, and hearing research : JSLHR, 62(8S):3071-3084.

Purpose We review and interpret our recent series of studies investigating motor-to-auditory influences during speech movement planning in fluent speakers and speakers who stutter. In those studies, we recorded auditory evoked potentials in response to probe tones presented immediately prior to speaking or at the equivalent time in no-speaking control conditions. As a measure of pre-speech auditory modulation (PSAM), we calculated changes in auditory evoked potential amplitude in the speaking conditions relative to the no-speaking conditions. Whereas adults who do not stutter consistently showed PSAM, this phenomenon was greatly reduced or absent in adults who stutter. The same between-group difference was observed in conditions where participants expected to hear their prerecorded speech played back without actively producing it, suggesting that the speakers who stutter use inefficient forward modeling processes rather than inefficient motor command generation processes. Compared with fluent participants, adults who stutter showed both less PSAM and less auditory-motor adaptation when producing speech while exposed to formant-shifted auditory feedback. Across individual participants, however, PSAM and auditory-motor adaptation did not correlate in the typically fluent group, and they were negatively correlated in the stuttering group. Interestingly, speaking with a consistent 100-ms delay added to the auditory feedback signal-normalized PSAM in speakers who stutter, and there no longer was a between-group difference in this condition. Conclusions Combining our own data with human and animal neurophysiological evidence from other laboratories, we interpret the overall findings as suggesting that (a) speech movement planning modulates auditory processing in a manner that may optimize its tuning characteristics for monitoring feedback during speech production and, (b) in conditions with typical auditory feedback, adults who stutter do not appropriately modulate the auditory system prior to speech onset. Lack of modulation of speakers who stutter may lead to maladaptive feedback-driven movement corrections that manifest themselves as repetitive movements or postural fixations.

RevDate: 2019-11-01

Plummer AR, PF Reidy (2018)

Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses.

Journal of phonetics, 71:355-375.

Low-dimensional representations of speech data, such as formant values extracted by linear predictive coding analysis or spectral moments computed from whole spectra viewed as probability distributions, have been instrumental in both phonetic and phonological analyses over the last few decades. In this paper, we present a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low-dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information as well as culture-specific socio-indexical information that is expressed by talkers of a given speech community. We demonstrate the basic mechanics of the framework by carrying out an analysis of children's productions of sibilant fricatives relative to those of adults in their speech community using the phoneigen package - a publicly available implementation of the framework. We focus the demonstration on enumerating the steps for constructing manifolds from data and then using them to map the data to a low-dimensional space, explicating how manifold structure affects the learned low-dimensional representations, and comparing the use of these representations against standard acoustic features in a phonetic analysis. We conclude with a discussion of the framework's underlying assumptions, its broader modeling potential, and its position relative to recent advances in the field of representation learning.

RevDate: 2019-09-20

Jain S, NP Nataraja (2019)

The Relationship between Temporal Integration and Temporal Envelope Perception in Noise by Males with Mild Sensorineural Hearing Loss.

The journal of international advanced otology, 15(2):257-262.

OBJECTIVES: A surge of literature indicated that temporal integration and temporal envelope perception contribute largely to the perception of speech. A review of literature showed that the perception of speech with temporal integration and temporal envelope perception in noise might be affected due to sensorineural hearing loss but to a varying degree. Because the temporal integration and temporal envelope share similar physiological processing at the cochlear level, the present study was aimed to identify the relationship between temporal integration and temporal envelope perception in noise by individuals with mild sensorineural hearing loss.

MATERIALS AND METHODS: Thirty adult males with mild sensorineural hearing loss and thirty age- and gender-matched normal-hearing individuals volunteered for being the participants of the study. The temporal integration was measured using synthetic consonant-vowel-consonant syllables, varied for onset, offset, and onset-offset of second and third formant frequencies of the vowel following and preceding consonants in six equal steps, thus forming a six-step onset, offset, and onset-offset continuum, each. The duration of the transition was kept short (40 ms) in one set of continua and long (80 ms) in another. Temporal integration scores were calculated as the differences in the identification of the categorical boundary between short- and long-transition continua. Temporal envelope perception was measured using sentences processed in quiet, 0 dB, and -5 dB signal-to-noise ratios at 4, 8, 16, and 32 contemporary frequency channels, and the temporal envelope was extracted for each sentence using the Hilbert transformation.

RESULTS: A significant effect of hearing loss was observed on temporal integration, but not on temporal envelope perception. However, when the temporal integration abilities were controlled, the variable effect of hearing loss on temporal envelope perception was noted.

CONCLUSION: It was important to measure the temporal integration to accurately account for the envelope perception by individuals with normal hearing and those with hearing loss.

RevDate: 2019-08-18

Cartei V, Garnham A, Oakhill J, et al (2019)

Children can control the expression of masculinity and femininity through the voice.

Royal Society open science, 6(7):190656 pii:rsos190656.

Pre-pubertal boys and girls speak with acoustically different voices despite the absence of a clear anatomical dimorphism in the vocal apparatus, suggesting that a strong component of the expression of gender through the voice is behavioural. Initial evidence for this hypothesis was found in a previous study showing that children can alter their voice to sound like a boy or like a girl. However, whether they can spontaneously modulate these voice components within their own gender in order to vary the expression of their masculinity and femininity remained to be investigated. Here, seventy-two English-speaking children aged 6-10 were asked to give voice to child characters varying in masculine and feminine stereotypicality to investigate whether primary school children spontaneously adjust their sex-related cues in the voice-fundamental frequency (F0) and formant spacing (ΔF)-along gender stereotypical lines. Boys and girls masculinized their voice, by lowering F0 and ΔF, when impersonating stereotypically masculine child characters of the same sex. Girls and older boys also feminized their voice, by raising their F0 and ΔF, when impersonating stereotypically feminine same-sex child characters. These findings reveal that children have some knowledge of the sexually dimorphic acoustic cues underlying the expression of gender, and are capable of controlling them to modulate gender-related attributes, paving the way for the use of the voice as an implicit, objective measure of the development of gender stereotypes and behaviour.

RevDate: 2019-11-06

Dorman MF, Natale SC, Zeitler DM, et al (2019)

Looking for Mickey Mouse™ But Finding a Munchkin: The Perceptual Effects of Frequency Upshifts for Single-Sided Deaf, Cochlear Implant Patients.

Journal of speech, language, and hearing research : JSLHR, 62(9):3493-3499.

Purpose Our aim was to make audible for normal-hearing listeners the Mickey Mouse™ sound quality of cochlear implants (CIs) often found following device activation. Method The listeners were 3 single-sided deaf patients fit with a CI and who had 6 months or less of CI experience. Computed tomography imaging established the location of each electrode contact in the cochlea and allowed an estimate of the place frequency of the tissue nearest each electrode. For the most apical electrodes, this estimate ranged from 650 to 780 Hz. To determine CI sound quality, a clean signal (a sentence) was presented to the CI ear via a direct connect cable and candidate, and CI-like signals were presented to the ear with normal hearing via an insert receiver. The listeners rated the similarity of the candidate signals to the sound of the CI on a 1- to 10-point scale, with 10 being a complete match. Results To make the match to CI sound quality, all 3 patients need an upshift in formant frequencies (300-800 Hz) and a metallic sound quality. Two of the 3 patients also needed an upshift in voice pitch (10-80 Hz) and a muffling of sound quality. Similarity scores ranged from 8 to 9.7. Conclusion The formant frequency upshifts, fundamental frequency upshifts, and metallic sound quality experienced by the listeners can be linked to the relatively basal locations of the electrode contacts and short duration experience with their devices. The perceptual consequence was not the voice quality of Mickey Mouse™ but rather that of Munchkins in The Wizard of Oz for whom both formant frequencies and voice pitch were upshifted. Supplemental Material https://doi.org/10.23641/asha.9341651.

RevDate: 2019-08-10

Knight EJ, SF Austin (2019)

The Effect of Head Flexion/Extension on Acoustic Measures of Singing Voice Quality.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30117-1 [Epub ahead of print].

A study was undertaken to identify the effect of head flexion/extension on singing voice quality. The amplitude of the fundamental frequency (F0) and the singing power ratio (SPR), an indirect measure of Singer's Formant activity, were measured. F0 and SPR scores at four experimental head positions were compared with the subjects' scores at their habitual positions. Three vowels and three pitch levels were tested. F0 amplitudes and low-frequency partials in general were greater with neck extension, while SPR increased with neck flexion. No effect of pitch or vowel was found. Gains in SPR appear to be the result of damping low-frequency partials rather than amplifying those in the Singer's Formant region. Raising the amplitude of F0 is an important resonance tool for female voices in the high range, and may be of benefit to other voice types in resonance, loudness, and laryngeal function.

RevDate: 2019-08-08

Alho K, Żarnowiec K, Gorina-Careta N, et al (2019)

Phonological Task Enhances the Frequency-Following Response to Deviant Task-Irrelevant Speech Sounds.

Frontiers in human neuroscience, 13:245.

In electroencephalography (EEG) measurements, processing of periodic sounds in the ascending auditory pathway generates the frequency-following response (FFR) phase-locked to the fundamental frequency (F0) and its harmonics of a sound. We measured FFRs to the steady-state (vowel) part of syllables /ba/ and /aw/ occurring in binaural rapid streams of speech sounds as frequently repeating standard syllables or as infrequent (p = 0.2) deviant syllables among standard /wa/ syllables. Our aim was to study whether concurrent active phonological processing affects early processing of irrelevant speech sounds reflected by FFRs to these sounds. To this end, during syllable delivery, our healthy adult participants performed tasks involving written letters delivered on a computer screen in a rapid stream. The stream consisted of vowel letters written in red, infrequently occurring consonant letters written in the same color, and infrequently occurring vowel letters written in blue. In the phonological task, the participants were instructed to press a response key to the consonant letters differing phonologically but not in color from the frequently occurring red vowels, whereas in the non-phonological task, they were instructed to respond to the vowel letters written in blue differing only in color from the frequently occurring red vowels. We observed that the phonological task enhanced responses to deviant /ba/ syllables but not responses to deviant /aw/ syllables. This suggests that active phonological task performance may enhance processing of such small changes in irrelevant speech sounds as the 30-ms difference in the initial formant-transition time between the otherwise identical syllables /ba/ and /wa/ used in the present study.

RevDate: 2019-08-02

Birkholz P, Gabriel F, Kürbis S, et al (2019)

How the peak glottal area affects linear predictive coding-based formant estimates of vowels.

The Journal of the Acoustical Society of America, 146(1):223.

The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.

RevDate: 2019-08-02

Patel RR, Lulich SM, A Verdi (2019)

Vocal tract shape and acoustic adjustments of children during phonation into narrow flow-resistant tubes.

The Journal of the Acoustical Society of America, 146(1):352.

The goal of the study is to quantify the salient vocal tract acoustic, subglottal acoustic, and vocal tract physiological characteristics during phonation into a narrow flow-resistant tube with 2.53 mm inner diameter and 124 mm length in typically developing vocally healthy children using simultaneous microphone, accelerometer, and 3D/4D ultrasound recordings. Acoustic measurements included fundamental frequency (fo), first formant frequency (F1), second formant frequency (F2), first subglottal resonance (FSg1), and peak-to-peak amplitude ratio (Pvt:Psg). Physiological measurements included posterior tongue height (D1), tongue dorsum height (D2), tongue tip height (D3), tongue length (D4), oral cavity width (D5), hyoid elevation (D6), pharynx width (D7). All measurements were made on eight boys and ten girls (6-9 years) during sustained /o:/ production at typical pitch and loudness, with and without flow-resistant tube. Phonation with the flow-resistant tube resulted in a significant decrease in F1, F2, and Pvt:Psg and a significant increase in D2, D3, and FSg1. A statistically significant gender effect was observed for D1, with D1 higher in boys. These findings agree well with reported findings from adults, suggesting common acoustic and articulatory mechanisms for narrow flow-resistant tube phonation. Theoretical implications of the findings are discussed.

RevDate: 2019-08-05

Wadamori N (2019)

Evaluation of a photoacoustic bone-conduction vibration system.

The Review of scientific instruments, 90(7):074905.

This article proposes a bone conduction vibrator that is based on a phenomenon by which audible sound can be perceived when vibrations are produced using a laser beam that is synchronized to the sound and these vibrations are then transmitted to an auricular cartilage. To study this phenomenon, we measured the vibrations using a rubber sheet with similar properties to those of soft tissue in combination with an acceleration sensor. We also calculated the force level of the sound based on the mechanical impedance and the acceleration in the proposed system. We estimated the formant frequencies of specific vibrations that were synchronized to five Japanese vowels using this phenomenon. We found that the vibrations produced in the rubber sheet caused audible sound generation when the photoacoustic bone conduction vibration system was used. It is expected that a force level that is equal to the reference equivalent threshold force level can be achieved at light intensities that lie below the safety limit for human skin exposure by selecting an irradiation wavelength at which a high degree of optical absorption occurs. It is demonstrated that clear sounds can be transmitted to the cochlea using the proposed system, while the effects of acoustic and electric noise in the environment are barred. Improvements in the vibratory force levels realized using this system will enable the development of a novel hearing aid that will provide an alternative to conventional bone conduction hearing aids.

RevDate: 2019-07-26

Kaneko M, Sugiyama Y, Mukudai S, et al (2019)

Effect of Voice Therapy Using Semioccluded Vocal Tract Exercises in Singers and Nonsingers With Dysphonia.

Journal of voice : official journal of the Voice Foundation pii:S0892-1997(19)30210-3 [Epub ahead of print].

OBJECTIVES: Voice therapy with semioccluded vocal tract exercises (SOVTE) has a long history of use in singers and nonsingers with dysphonia. SOVTE with increased vocal tract impedance leads to increased vocal efficiency and economy. Although there is a growing body of research on the physiological impact of SOVTE, and growing clinical sentiment about its therapeutic benefits, empirical data describing its potential efficacy in singers and nonsingers are lacking. The objective of the current study is to evaluate vocal tract function and voice quality in singers and nonsingers with dysphonia after undergoing SOVTE.

METHODS: Patients who were diagnosed with functional dysphonia, vocal fold nodules and age-related atrophy were assessed (n = 8 singers, n = 8 nonsingers). Stroboscopic examination, aerodynamic assessment, acoustic analysis, formant frequency, and self-assessments were evaluated before and after performing SOVTE.

RESULTS: In the singer group, expiratory lung pressure, jitter, shimmer, and self-assessment significantly improved after SOVTE. In addition, formant frequency (first, second, third, and fourth), and the standard deviation (SD) of the first, second, and third formant frequency significantly improved. In the nonsinger group, expiratory lung pressure, jitter, shimmer, and Voice Handicap Index-10 significantly improved after SOVTE. However, no significant changes were observed in formant frequency.

CONCLUSIONS: These results suggest that SOVTE may improve voice quality in singers and nonsingers with dysphonia, and SOVTE may be more effective at adjusting the vocal tract function in singers with dysphonia compared to nonsingers.

RevDate: 2019-07-23

Myers S (2019)

An Acoustic Study of Sandhi Vowel Hiatus in Luganda.

Language and speech [Epub ahead of print].

In Luganda (Bantu, Uganda), a sequence of vowels in successive syllables (V.V) is not allowed. If the first vowel is high, the two vowels are joined together in a diphthong (e.g., i + a → i͜a). If the first vowel is non-high, it is deleted with compensatory lengthening of the second vowel in the sequence (e.g., e + a → aː). This paper presents an acoustic investigation of inter-word V#V sequences in Luganda. It was found that the vowel interval in V#V sequences is longer than that in V#C sequences. When the first vowel in V#V is non-high, the formant frequency of the outcome is determined by the second vowel in the sequence. When the first vowel is high, on the other hand, the sequence is realized as a diphthong, with the transition between the two formant patterns taking up most of the duration. The durational patterns within these diphthongs provide evidence against the transcription-based claim that these sequences are reorganized so that the length lies in the second vowel (/i#V/ → [jVː]). The findings bring into question a canonical case of compensatory lengthening conditioned by glide formation.

RevDate: 2019-07-15

Longo L, Di Stadio A, Ralli M, et al (2019)

Voice Parameter Changes in Professional Musician-Singers Singing with and without an Instrument: The Effect of Body Posture.

Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics (IALP) pii:000501202 [Epub ahead of print].

BACKGROUND AND AIM: The impact of body posture on vocal emission is well known. Postural changes may increase muscular resistance in tracts of the phono-articulatory apparatus and lead to voice disorders. This work aimed to assess whether and to which extent body posture during singing and playing a musical instrument impacts voice performance in professional musicians.

SUBJECTS AND METHODS: Voice signals were recorded from 17 professional musicians (pianists and guitarists) while they were singing and while they were singing and playing a musical instrument simultaneously. Metrics were extracted from their voice spectrogram using the Multi-Dimensional Voice Program (MDVP) and included jitter, shift in fundamental voice frequency (sF0), shimmer, change in peak amplitude, noise to harmonic ratio, Voice Turbulence Index, Soft Phonation Index (SPI), Frequency Tremor Intensity Index, Amplitude Tremor Intensity Index, and maximum phonatory time (MPT). Statistical analysis was performed using two-tailed t tests, one-way ANOVA, and χ2 tests. Subjects' body posture was visually assessed following the recommendations of the Italian Society of Audiology and Phoniatrics. Thirty-seven voice signals were collected, 17 during singing and 20 during singing and playing a musical instrument.

RESULTS: Data showed that playing an instrument while singing led to an impairment of the "singer formant" and to a decrease in jitter, sF0, shimmer, SPI, and MPT. However, statistical analysis showed that none of the MDVP metrics changed significantly when subjects played an instrument compared to when they did not. Shoulder and back position affected voice features as measured by the MDVP metrics, while head and neck position did not. In particular, playing the guitar decreased the amplitude of the "singer formant" and increased noise, causing a typical "raucous rock voice."

CONCLUSIONS: Voice features may be affected by the use of the instrument the musicians play while they sing. Body posture selected by the musician while playing the instrument may affect expiration and phonation.

RevDate: 2020-01-03

Whitfield JA, DD Mehta (2019)

Examination of Clear Speech in Parkinson Disease Using Measures of Working Vowel Space.

Journal of speech, language, and hearing research : JSLHR, 62(7):2082-2098.

Purpose The purpose of the current study was to characterize clear speech production for speakers with and without Parkinson disease (PD) using several measures of working vowel space computed from frequently sampled formant trajectories. Method The 1st 2 formant frequencies were tracked for a reading passage that was produced using habitual and clear speaking styles by 15 speakers with PD and 15 healthy control speakers. Vowel space metrics were calculated from the distribution of frequently sampled formant frequency tracks, including vowel space hull area, articulatory-acoustic vowel space, and multiple vowel space density (VSD) measures based on different percentile contours of the formant density distribution. Results Both speaker groups exhibited significant increases in the articulatory-acoustic vowel space and VSD10, the area of the outermost (10th percentile) contour of the formant density distribution, from habitual to clear styles. These clarity-related vowel space increases were significantly smaller for speakers with PD than controls. Both groups also exhibited a significant increase in vowel space hull area; however, this metric was not sensitive to differences in the clear speech response between groups. Relative to healthy controls, speakers with PD exhibited a significantly smaller VSD90, the area of the most central (90th percentile), densely populated region of the formant space. Conclusions Using vowel space metrics calculated from formant traces of the reading passage, the current work suggests that speakers with PD do indeed reach the more peripheral regions of the vowel space during connected speech but spend a larger percentage of the time in more central regions of formant space than healthy speakers. Additionally, working vowel space metrics based on the distribution of formant data suggested that speakers with PD exhibited less of a clarity-related increase in formant space than controls, a trend that was not observed for perimeter-based measures of vowel space area.

RevDate: 2019-12-18

Chiu YF, Forrest K, T Loux (2019)

Relationship Between F2 Slope and Intelligibility in Parkinson's Disease: Lexical Effects and Listening Environment.

American journal of speech-language pathology, 28(2S):887-894.

Purpose There is a complex relationship between speech production and intelligibility of speech. The current study sought to evaluate the interaction of the factors of lexical characteristics, listening environment, and the 2nd formant transition (F2 slope) on intelligibility of speakers with Parkinson's disease (PD). Method Twelve speakers with PD and 12 healthy controls read sentences that included words with the diphthongs /aɪ/, /ɔɪ/, and /aʊ/. The F2 slope of the diphthong transition was measured and averaged across the 3 diphthongs for each speaker. Young adult listeners transcribed the sentences to assess intelligibility of words with high and low word frequency and high and low neighborhood density in quiet and noisy listening conditions. The average F2 slope and intelligibility scores were entered into regression models to examine their relationship. Results F2 slope was positively related to intelligibility in speakers with PD in both listening conditions with a stronger relationship in noise than in quiet. There was no significant relationship between F2 slope and intelligibility of healthy speakers. In the quiet condition, F2 slope was only correlated with intelligibility in less-frequent words produced by the PD group. In the noise condition, F2 slope was related to intelligibility in high- and low-frequency words and high-density words in PD. Conclusions The relationship between F2 slope and intelligibility in PD was affected by lexical factors and listening conditions. F2 slope was more strongly related to intelligibility in noise than in quiet for speakers with PD. This relationship was absent in highly frequent words presented in quiet and those with fewer lexical neighbors.

RevDate: 2020-01-03

Bauerly KR, Jones RM, C Miller (2019)

Effects of Social Stress on Autonomic, Behavioral, and Acoustic Parameters in Adults Who Stutter.

Journal of speech, language, and hearing research : JSLHR, 62(7):2185-2202.

Purpose The purpose of this study was to assess changes in autonomic, behavioral, and acoustic measures in response to social stress in adults who stutter (AWS) compared to adults who do not stutter (ANS). Method Participants completed the State-Trait Anxiety Inventory (Speilberger, Gorsuch, Luschene, Vagg, & Jacobs, 1983). In order to provoke social stress, participants were required to complete a modified version of the Trier Social Stress Test (TSST-M, Kirschbaum, Pirke, & Hellhammer, 1993), which included completing a nonword reading task and then preparing and delivering a speech to what was perceived as a group of professionals trained in public speaking. Autonomic nervous system changes were assessed by measuring skin conductance levels, heart rate, and respiratory sinus arrhythmia (RSA). Behavioral changes during speech production were measured in errors, percentage of syllable stuttered, percentage of other disfluencies, and speaking rate. Acoustic changes were measured using 2nd formant frequency fluctuations. In order to make comparisons of speech with and without social-cognitive stress, measurements were collected while participants completed a speaking task before and during TSST-M conditions. Results AWS showed significantly higher levels of self-reported state and trait anxiety compared to ANS. Autonomic nervous system changes revealed similar skin conductance level and heart rate across pre-TSST-M and TSST-M conditions; however, RSA levels were significantly higher in AWS compared to ANS across conditions. There were no differences found between groups for speaking rate, fundamental frequency, and percentage of other disfluencies when speaking with or without social stress. However, acoustic analysis revealed higher levels of 2nd formant frequency fluctuations in the AWS compared to the controls under pre-TSST-M conditions, followed by a decline to a level that resembled controls when speaking under the TSST-M condition. Discussion Results suggest that AWS, compared to ANS, engage higher levels of parasympathetic control (i.e., RSA) during speaking, regardless of stress level. Higher levels of self-reported state and trait anxiety support this view point and suggest that anxiety may have an indirect role on articulatory variability in AWS.


RJR Experience and Expertise


Robbins holds BS, MS, and PhD degrees in the life sciences. He served as a tenured faculty member in the Zoology and Biological Science departments at Michigan State University. He is currently exploring the intersection between genomics, microbial ecology, and biodiversity — an area that promises to transform our understanding of the biosphere.


Robbins has extensive experience in college-level education: At MSU he taught introductory biology, genetics, and population genetics. At JHU, he was an instructor for a special course on biological database design. At FHCRC, he team-taught a graduate-level course on the history of genetics. At Bellevue College he taught medical informatics.


Robbins has been involved in science administration at both the federal and the institutional levels. At NSF he was a program officer for database activities in the life sciences, at DOE he was a program officer for information infrastructure in the human genome project. At the Fred Hutchinson Cancer Research Center, he served as a vice president for fifteen years.


Robbins has been involved with information technology since writing his first Fortran program as a college student. At NSF he was the first program officer for database activities in the life sciences. At JHU he held an appointment in the CS department and served as director of the informatics core for the Genome Data Base. At the FHCRC he was VP for Information Technology.


While still at Michigan State, Robbins started his first publishing venture, founding a small company that addressed the short-run publishing needs of instructors in very large undergraduate classes. For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project, a web site dedicated to the digital publishing of critical works in science, especially classical genetics.


Robbins is well-known for his speaking abilities and is often called upon to provide keynote or plenary addresses at international meetings. For example, in July, 2012, he gave a well-received keynote address at the Global Biodiversity Informatics Congress, sponsored by GBIF and held in Copenhagen. The slides from that talk can be seen HERE.


Robbins is a skilled meeting facilitator. He prefers a participatory approach, with part of the meeting involving dynamic breakout groups, created by the participants in real time: (1) individuals propose breakout groups; (2) everyone signs up for one (or more) groups; (3) the groups with the most interested parties then meet, with reports from each group presented and discussed in a subsequent plenary session.


Robbins has been engaged with photography and design since the 1960s, when he worked for a professional photography laboratory. He now prefers digital photography and tools for their precision and reproducibility. He designed his first web site more than 20 years ago and he personally designed and implemented this web site. He engages in graphic design as a hobby.

Order from Amazon

This is a must read book for anyone with an interest in invasion biology. The full title of the book lays out the author's premise — The New Wild: Why Invasive Species Will Be Nature's Salvation. Not only is species movement not bad for ecosystems, it is the way that ecosystems respond to perturbation — it is the way ecosystems heal. Even if you are one of those who is absolutely convinced that invasive species are actually "a blight, pollution, an epidemic, or a cancer on nature", you should read this book to clarify your own thinking. True scientific understanding never comes from just interacting with those with whom you already agree. R. Robbins

963 Red Tail Lane
Bellingham, WA 98226


E-mail: RJR8222@gmail.com

Collection of publications by R J Robbins

Reprints and preprints of publications, slide presentations, instructional materials, and data compilations written or prepared by Robert Robbins. Most papers deal with computational biology, genome informatics, using information technology to support biomedical research, and related matters.

Research Gate page for R J Robbins

ResearchGate is a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. According to a study by Nature and an article in Times Higher Education , it is the largest academic social network in terms of active users.

Curriculum Vitae for R J Robbins

short personal version

Curriculum Vitae for R J Robbins

long standard version

RJR Picks from Around the Web (updated 11 MAY 2018 )